--- title: Bioinformatic analysis of complex, high-throughput genomic and epigenomic data in the context of $\mathsf{CD4}^{+}$ T-cell differentiation and diagnosis and treatment of transplant rejection author: | Ryan C. Thompson \ Su Lab \ The Scripps Research Institute date: October 24, 2019 theme: Boadilla aspectratio: 169 fontsize: 14pt --- ## Organ transplants are a life-saving treatment ::: incremental * 36,528 transplants performed in the USA in 2018[^organdonor] * 100 transplants every day! * Over 113,000 people on the national transplant waiting list as of July 2019 ::: [^organdonor]: [organdonor.gov](https://www.organdonor.gov/statistics-stories/statistics.html) ## Organ donation statistics for the USA in 2018[^organdonor] \centering ![](graphics/presentation/transplants-organ-CROP.pdf) ## Graft rejection is an adaptive immune response ::: incremental * The host's adaptive immune system identifies and attacks cells bearing non-self antigens * An allograft contains differnet genetic variants from the host, resulting in protein-coding differences * Left unchecked, the host immune system eventually notices these alloantigens and begins attacking (rejecting) the graft * Rejection is the major long-term threat to organ allografts ::: ## Allograft rejection remains a major long-term problem ![Kidney allograft survival rates in children by transplant year[^kim-marks]](graphics/presentation/kidney-graft-survival.png){ height=65% } [^kim-marks]:[ Kim & Marks. "Long-term outcomes of children after solid organ transplantation". In: Clinics (2014)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3884158/?report=classic) ## Rejection is treated with immune suppressive drugs ::: incremental * To prevent rejection, a graft recipient must take immune suppressive drugs for the rest of their life * The graft is periodically checked for signs of rejection, and immune suppression dosage is adjusted accordingly * Immune suppression is a delicate balance: too much leads to immune compromise; too little leads to rejection. ::: ## My thesis topics ### Topic 1: Immune memory Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation ### Topic 2: Diagnostics for rejection Improving array-based diagnostics for transplant rejection by optimizing data preprocessing ### Topic 3: Blood profiling during treatment Globin-blocking for more effective blood RNA-seq analysis in primate animal model for experimental graft rejection treatment ## Today's focus ### \Large Topic 1: Immune memory \Large Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation ## Memory cells: faster, stronger, and more independent ![Naïve and memory T-cell responses to activation](graphics/presentation/T-cells-A-SVG.png) ## Memory cells: faster, stronger, and more independent ![Naïve and memory T-cell responses to activation](graphics/presentation/T-cells-B-SVG.png) ## Memory cells: faster, stronger, and more independent ![Naïve and memory T-cell responses to activation](graphics/presentation/T-cells-C-SVG.png) ## Memory cells: faster, stronger, and more independent ![Naïve and memory T-cell responses to activation](graphics/presentation/T-cells-D-SVG.png) ## Memory cells are a problem for immune suppression \large Compared to naïve cells, memory cells: \normalsize * respond to a lower antigen concentration * respond more strongly at any given antigen concentration * require less co-stimulation * are somewhat independent of some types of co-stimulation required by naïve cells * evolve over time to respond even more strongly to their antigen ## Memory cells are a problem for immune suppression \large Result: \normalsize * Memory cells require progressively higher doses of immune suppresive drugs * Dosage cannot be increased indefinitely without compromising the immune system's ability to fight infection ## We need a better understanding of immune memory * Cell surface markers of naïve and memory $\mathsf{CD4}^{+}$ T-cells are fairly well-characterized * But internal mechanisms that allow memory cells to respond differently to the same stimulus (antigen presentation) are not well-understood . . . * A reasonable hypothesis is that some of these mechanisms are epigenetic: using histone marks or DNA methylation to regulate the expression of certain genes * We can test this hypothesis by measuring gene expression (using RNA-seq) and histone methylation (using ChIP-seq) in naïve and memory T-cells before and after activation ## Experimental design * Separately isolate naïve and memory $\mathsf{CD4}^{+}$ T-cells from 4 donors * Activate with CD3/CD28 beads * Take samples at 4 time points: Day 0 (pre-activation), Day 1 (early activation), Day 5 (peak activation), and Day 14 (post-activation) * Do RNA-seq + ChIP-seq for 3 histone marks (H3K4me2, H3K4me3, & H3K27me3) for each sample. Data generated by Sarah Lamere, published in GEO as [GSE73214](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73214) ## A few intermediate analysis steps are required ![Flowchart of workflow for data analysis](graphics/CD4-csaw/rulegraphs/rulegraph-all-RASTER100.png) ## Histone modifications occur on consecutive histones ![ChIP-seq coverage in IL2 gene[^lamerethesis]](graphics/presentation/LaMere-thesis-fig3.9-SVG-CROP.png){ height=65% } [^lamerethesis]: Sarah LaMere. "Dynamic epigenetic regulation of CD4 T cell activation and memory formation". PhD thesis. TSRI, 2015. ## Histone modifications occur on consecutive histones ![Strand cross-correlation plots](graphics/presentation/CCF-plots-A-SVG.png) ## Histone modifications occur on consecutive histones ![Strand cross-correlation plots](graphics/presentation/CCF-plots-B-SVG.png) ## Histone modifications occur on consecutive histones ![Strand cross-correlation plots](graphics/presentation/CCF-plots-C-SVG.png) ## SICER identifies enriched regions across the genome ![Finding "islands" of coverage with SICER[^sicer]](graphics/presentation/SICER-fig1-SVG.png) [^sicer]: [Zang et al. “A clustering approach for identification of enriched domains from histone modification ChIP-Seq data”. In: Bioinformatics 25.15 (2009)](https://doi.org/10.1093/bioinformatics/btp340) ## IDR identifies *reproducible* enriched regions ![Example irreproducible discovery rate[^idr] score consistency plot](graphics/presentation/IDR-example-CROP-RASTER.png){ height=65% } [^idr]: [Li et al. “Measuring reproducibility of high-throughput experiments”. In: AOAS (2011)](https://doi.org/10.1214/11-AOAS466) ## Finding enriched regions across the genome ![Peak-calling summary statistics](graphics/presentation/RCT-thesis-table2.2-SVG-CROP.png) ## Each histone mark has an "effective promoter radius" ![Enrichment of peaks near promoters](graphics/CD4-csaw/Promoter-Peak-Distance-Profile-PAGE1-CROP.pdf) ## Peaks in promoters correlate with gene expression ![Expression distributions of genes with and without promoter peaks](graphics/presentation/FPKM-by-Peak-Violin-Plots-A-SVG.png) ## Peaks in promoters correlate with gene expression ![Expression distributions of genes with and without promoter peaks](graphics/presentation/FPKM-by-Peak-Violin-Plots-B-SVG.png) ## Peaks in promoters correlate with gene expression ![Expression distributions of genes with and without promoter peaks](graphics/presentation/FPKM-by-Peak-Violin-Plots-C-SVG.png) ## Peaks in promoters correlate with gene expression ![Expression distributions of genes with and without promoter peaks](graphics/presentation/FPKM-by-Peak-Violin-Plots-D-SVG.png) ## Peaks in promoters correlate with gene expression ![Expression distributions of genes with and without promoter peaks](graphics/presentation/FPKM-by-Peak-Violin-Plots-Z-SVG.png) ## The story so far * H3K4me2, H3K4me3, and H3K27me3 occur on many consecutive histones in broad regions across the genome * These enriched regions occur more commonly within a certain radius of gene promoters * This "effective promoter radius" is consistent across all samples for a given histone mark, but differs between histone marks * Presence or absence of a peak within this radius is correlated with gene expression . . . Next: Does the position of a histone modification within a gene promoter matter to that gene's expression, or is it merely the presence or absence anywhere within the promoter? ## H3K4me2 promoter neighborhood K-means clusters ![Cluster means for H3K4me2](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-clusters-CROP.png) ## H3K4me2 promoter neighborhood cluster PCA ::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-clusters-CROP.png) ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-PCA-CROP.png) ::: ::::: ## H3K4me2 promoter neighborhood cluster expression ::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-clusters-CROP.png) ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-expression-CROP-ROT90.png) ::: ::::: ## H3K4me3 promoter neighborhood K-means clusters ![Cluster means for H3K4me3](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-clusters-CROP.png) ## H3K4me3 promoter neighborhood K-means clusters ::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me3](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-clusters-CROP.png) ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-PCA-CROP.png) ::: ::::: ## H3K4me3 promoter neighborhood cluster expression ::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me3](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-clusters-CROP.png) ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-expression-CROP-ROT90.png) ::: ::::: ## H3K27me3 promoter neighborhood K-means clusters ![Cluster means for H3K27me3](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-clusters-CROP.png) ## H3K27me3 promoter neighborhood K-means clusters ::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K27me3](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-clusters-CROP.png) ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-PCA-CROP.png) ::: ::::: ## H3K27me3 promoter neighborhood cluster expression ::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K27me3](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-clusters-CROP.png) ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-expression-CROP-ROT90.png) ::: ::::: ## What have we learned? ### H3K4me2 & H3K4me3 * Peak closer to promoter $\Rightarrow$ more likely gene is highly expressed * Slightly asymmetric in favor of peaks downstream of TSS . . . ### H3K27me3 * Depletion of H3K27me3 at TSS associated with elevated gene expression * Enrichment of H3K27me3 upstream of TSS even more strongly associated with elevated expression * Other coverage profiles not associated with elevated expression ## Differential modification disappears by Day 14 ![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-A-SVG-CROP.png) ## Differential modification disappears by Day 14 ![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-B-SVG-CROP.png) ## Convergence at Day 14 H3K4me2 ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me2-promoter-PCA-group-CROP.png) ## Convergence at Day 14 H3K4me3 ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me3-promoter-PCA-group-CROP.png) ## Convergence at Day 14 H3K27me3 ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K27me3-promoter-PCA-group-CROP.png) ## Convergence at Day 14 RNA-seq (PC 2 & 3) ![(Insert figure legend)](graphics/CD4-csaw/RNA-seq/PCA-final-23-CROP.png) ## MOFA identifies shared variation across all 4 data sets ![(Insert figure legend)](graphics/presentation/MOFA-varExplained-matrix-A-CROP.png) ## MOFA identifies shared variation across all 4 data sets ![(Insert figure legend)](graphics/presentation/MOFA-varExplained-matrix-B-CROP.png) ## MOFA shared variation captures convergence pattern ![(Insert figure legend)](graphics/CD4-csaw/MOFA-LF-scatter-small.png) ## What have we learned? * Almost no differential modification observed between naïve and memory at Day 14, despite plenty of differential modification at earlier time points. * RNA-seq data and all 3 histone marks' ChIP-seq data all show "convergence" between naïve and memory by Day 14 in the first 2 or 3 principal coordinates. * MOFA captures this convergence pattern in one of the latent factors, indicating that this is a shared pattern across all 4 data sets. ## Takeaway 1: Each histone mark has an "effective promoter radius" * H3K4me2, H3K4me3, and H3K27me3 ChIP-seq reads are enriched in broad regions across the genome, representing areas where the histone modification is present * These enriched regions occur more commonly within a certain radius of gene promoters * This "effective promoter radius" is specific to each histone mark * Presence or absence of a peak within this radius is correlated with gene expression ## Takeaway 2: Peak position within the promoter is important * H3K4me2 and H3K4me3 peaks are more strongly associated with elevated gene expression the closer they are to the TSS, with a slight bias toward downstream peaks. * H3K27me3 depletion at the TSS and enrichement upstream are both associated with elevated expression, while other patterns are not. * In all histone marks, position of modification within promoter appears to be an important factor in association with gene expression ## Takeaway 3: Expression & epigenetic state both converge at Day 14 * At Day 14, almost no differential modification observed between naïve and memory cells * Naïve and memory converge visually in PCoA plots * Convergence is a shared pattern of variation across all 3 histone marks and gene expression * This is consistent with the hypothesis that the naïve cells have differentiated into a more memory-like phenotype by day 14.