% Bioinformatic analysis of complex, high-throughput genomic and epigenomic data in the context of $\mathsf{CD4}^{+}$ T-cell differentiation and diagnosis and treatment of transplant rejection % Ryan C. Thompson \ Su Lab \ The Scripps Research Institute % October 24, 2019 ## Organ transplants are a life-saving treatment for many \Large * 36,528 transplants performed in the USA in 2018[^1] . . . * 100 transplants every day! . . . * Over 113,000 people on the national transplant waiting list as of July 2019 [^1]: Source: https://www.organdonor.gov/statistics-stories/statistics.html ## Organ transplants are a life-saving treatment for many ![Organ donation statistics for the USA in 2018[^2]](graphics/presentation/transplants-organ-CROP-RASTER.png){ height=70% } [^2]: Source: https://www.organdonor.gov/statistics-stories/statistics.html ## Rejection is an adaptive immune response against a graft * The host's adaptive immune system identifies and attacks cells bearing non-self antigens . . . * An allograft contains differnet genetic variants from the host, resulting in protein-coding differences . . . * Left unchecked, the host immune system eventually notices these alloantigens and begins attacking (rejecting) the graft . . . * Rejection is the major long-term threat to organ allografts ## Allograft rejection remains a major long-term problem ![Kidney allograft survival rates in children by transplant year[^3]](graphics/presentation/kidney-graft-survival.png){ height=65% } [^3]: Kim & Marks. "Long-term outcomes of children after solid organ transplantation". In: Clinics (2014) ## Rejection is treated with immune suppressive drugs * To prevent rejection, a graft recipient must take immune suppressive drugs for the rest of their life * The graft is periodically checked for signs of rejection, and immune suppression dosage is adjusted accordingly * Immune suppression is a delicate balance: too much leads to immune compromise; too little leads to rejection. . . . * Both diagnosis and treatment present significant challenges * Immune memory is the major contributor to long-term rejection ## My thesis topics ### Chapter 2 Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation ### Chapter 3 Improving array-based diagnostics for transplant rejection by optimizing data preprocessing ### Chapter 4 Globin-blocking for more effective blood RNA-seq analysis in primate animal model for experimental graft rejection treatment ## Today's focus ### Chapter 2 \Large Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation ## Memory cells: faster, stronger, and more independent ![Memory T-cells proliferate and respond more quickly](graphics/presentation/T-cells-SVG.png) ## Memory cells are a problem for immune suppression \large Compared to naïve cells, memory cells: \normalsize * respond to a lower antigen concentration * respond more strongly at any given antigen concentration * require less co-stimulation * are somewhat independent of some types of co-stimulation required by naïve cells * evolve over time to respond even more strongly to their antigen . . . \large Result: \normalsize * Memory cells require progressively higher doses of immune suppresive drugs * Dosage cannot be increased indefinitely without compromising the immune system's ability to fight infection ## We need a better understanding of immune memory * Cell surface markers of naïve and memory $\mathsf{CD4}^{+}$ T-cells are fairly well-characterized * But internal mechanisms that allow memory cells to respond differently to the same stimulus (antigen presentation) are not well-understood . . . * A reasonable hypothesis is that some of these mechanisms are epigenetic: using histone marks or DNA methylation to regulate the expression of certain genes * We can test this hypothesis by measuring gene expression (using RNA-seq) and histone methylation (using ChIP-seq) in naïve and memory T-cells before and after activation ## Experimental design * Separately isolate naïve and memory $\mathsf{CD4}^{+}$ T-cells from 4 donors * Activate with CD3/CD28 beads * Take samples at 4 time points: Day 0 (pre-activation), Day 1 (early activation), Day 5 (peak activation), and Day 14 (post-activation) * Do RNA-seq + ChIP-seq for 3 histone marks (H3K4me2, H3K4me3, & H3K27me3) for each sample. Data generated by Sarah Lamere, published in GEO as [GSE73214](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73214) ## A few intermediate analysis steps are required ![Flowchart of workflow for data analysis](graphics/CD4-csaw/rulegraphs/rulegraph-all-RASTER300.png) ## Histone modifications are observed on consecutive histones ![Strand cross-correlation plots](graphics/CD4-csaw/csaw/CCF-plots-PAGE2-CROP-RASTER.png) ## Histone modifications are observed on consecutive histones ![ChIP-seq coverage in IL2 gene[^5]](graphics/presentation/LaMere-thesis-fig3.9-SVG-CROP.png){ height=70% } [^5]: Sarah LaMere. "Dynamic epigenetic regulation of CD4 T cell activation and memory formation". PhD thesis. TSRI, 2015. ## Finding enriched regions across the genome * Scan across the genome looking for regions with read coverage above background level in each donor using SICER peak caller * Use Irreducible Discovery Rate framework to identify peaks that are called consistently across multiple donors ![Peak-calling summary statistics](graphics/presentation/RCT-thesis-table2.2-SVG-CROP.png) ## Each histone mark has an "effective promoter radius" ![Enrichment of peaks near promoters](graphics/CD4-csaw/Promoter-Peak-Distance-Profile-PAGE1-CROP-RASTER.png) ## Peaks in promoters are correlated with gene expression ![Expression distributions of genes with and without promoter peaks](graphics/CD4-csaw/FPKM-by-Peak-Violin-Plots-CROP-RASTER.png) ## The story so far * H3K4me2, H3K4me3, and H3K27me3 occur on many consecutive histones in broad regions across the genome * These enriched regions occur more commonly within a certain radius of gene promoters * This "effective promoter radius" is consistent across all samples for a given histone mark, but differs between histone marks * Presence or absence of a peak within this radius is correlated with gene expression . . . Next: Does the position of a histone modification within a gene promoter matter to that gene's expression, or is it merely the presence or absence anywhere within the promoter? ## H3K4me2 promoter neighborhood K-means clusters ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-clusters-CROP.png) ## H3K4me2 promoter neighborhood cluster PCA ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-PCA-CROP.png) ## H3K4me2 promoter neighborhood cluster expression ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me2-neighborhood-expression-CROP.png) ## H3K4me3 promoter neighborhood K-means clusters ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-clusters-CROP.png) ## H3K4me3 promoter neighborhood cluster PCA ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-PCA-CROP.png) ## H3K4me3 promoter neighborhood cluster expression ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me3-neighborhood-expression-CROP.png) ## H3K27me3 promoter neighborhood K-means clusters ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-clusters-CROP.png) ## H3K27me3 promoter neighborhood cluster PCA ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-PCA-CROP.png) ## H3K27me3 promoter neighborhood cluster expression ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K27me3-neighborhood-expression-CROP.png) ## What have we learned? ### H3K4me2 & H3K4me3 * Peak closer to promoter $\Rightarrow$ more likely gene is highly expressed * Slightly asymmetric in favor of peaks downstream of TSS . . . ### H3K27me3 * Depletion of H3K27me3 at TSS associated with elevated gene expression * Enrichment of H3K27me3 upstream of TSS even more strongly associated with elevated expression * Other coverage profiles not associated with elevated expression ## Differential modification disappears by Day 14 ![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-SVG-CROP.png) ## Convergence at Day 14 H3K4me2 ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me2-promoter-PCA-group-CROP.png) ## Convergence at Day 14 H3K4me3 ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K4me3-promoter-PCA-group-CROP.png) ## Convergence at Day 14 H3K27me3 ![(Insert figure legend)](graphics/CD4-csaw/ChIP-seq/H3K27me3-promoter-PCA-group-CROP.png) ## Convergence at Day 14 RNA-seq (PC 2 & 3) ![(Insert figure legend)](graphics/CD4-csaw/RNA-seq/PCA-final-23-CROP.png) ## MOFA identifies shared variation across all 4 data sets ![(Insert figure legend)](graphics/CD4-csaw/MOFA-varExplaiend-matrix-CROP.png) ## MOFA shared variation captures convergence pattern ![(Insert figure legend)](graphics/CD4-csaw/MOFA-LF-scatter-small.png) ## What have we learned? * Almost no differential modification observed between naïve and memory at Day 14, despite plenty of differential modification at earlier time points. * RNA-seq data and all 3 histone marks' ChIP-seq data all show "convergence" between naïve and memory by Day 14 in the first 2 or 3 principal coordinates. * MOFA captures this convergence pattern in one of the latent factors, indicating that this is a shared pattern across all 4 data sets. ## Takeaway 1: Each histone mark has an "effective promoter radius" * H3K4me2, H3K4me3, and H3K27me3 ChIP-seq reads are enriched in broad regions across the genome, representing areas where the histone modification is present * These enriched regions occur more commonly within a certain radius of gene promoters * This "effective promoter radius" is specific to each histone mark * Presence or absence of a peak within this radius is correlated with gene expression ## Takeaway 2: Peak position within the promoter is important * H3K4me2 and H3K4me3 peaks are more strongly associated with elevated gene expression the closer they are to the TSS, with a slight bias toward downstream peaks. * H3K27me3 depletion at the TSS and enrichement upstream are both associated with elevated expression, while other patterns are not. * In all histone marks, position of modification within promoter appears to be an important factor in association with gene expression ## Takeaway 3: Expression & epigenetic state both converge at Day 14 * At Day 14, almost no differential modification observed between naïve and memory cells * Naïve and memory converge visually in PCoA plots * Convergence is a shared pattern of variation across all 3 histone marks and gene expression * This is consistent with the hypothesis that the naïve cells have differentiated into a more memory-like phenotype by day 14.