--- title: Bioinformatic analysis of complex, high-throughput genomic and epigenomic data in the context of $\mathsf{CD4}^{+}$ T-cell differentiation and diagnosis and treatment of transplant rejection author: | Ryan C. Thompson \ Su Lab \ The Scripps Research Institute date: October 24, 2019 theme: Boadilla aspectratio: 169 fontsize: 14pt --- ## Organ transplants are a life-saving treatment ::: incremental * 36,528 transplants performed in the USA in 2018[^organdonor] * 100 transplants every day! * Over 113,000 people on the national transplant waiting list as of July 2019 ::: [^organdonor]: [organdonor.gov](https://www.organdonor.gov/statistics-stories/statistics.html) ## Organ donation statistics for the USA in 2018[^organdonor] \centering ![](graphics/presentation/transplants-organ-CROP.pdf) ## Types of grafts A graft is categorized based on the relationship between donor and recipient: . . . ::: incremental * **Autograft:** Donor and recipient are the *same individual* * **Allograft:** Donor and recipient are *different individuals* of the *same species* * **Xenograft:** Donor and recipient are *different species* ::: ## Recipient T-cells reject allogenic MHCs :::::::::: {.columns} ::: {.column width="55%"} :::: incremental * TCR binds to both antigen *and* MHC surface \vspace{10pt} * HLA genes encoding MHC proteins are highly polymorphic \vspace{10pt} * Variants in donor MHC can trigger the same T-cell response as a foreign antigen :::: ::: ::: {.column width="40%"} ![TCR binding to self (right) and allogenic (left) MHC\footnotemark](graphics/presentation/tcr_mhc.jpg){ height=70% } ::: :::::::::: \footnotetext[3]{\href{https://doi.org/10.1016/j.cell.2007.01.048}{Colf, Bankovich, et al. "How a Single T Cell Receptor Recognizes Both Self and Foreign MHC". In: Cell (2007)}} ## Allograft rejection is a major long-term problem ![Kidney allograft survival rates in children by transplant year[^kim-marks]](graphics/presentation/kidney-graft-survival.png){ height=65% } [^kim-marks]: [Kim & Marks (2014)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3884158/?report=classic) ## Rejection is treated with immune suppressive drugs ::: incremental * Graft recipient must take immune suppressive drugs indefinitely * Graft is monitored for rejection and dosage adjusted over time * Immune suppression is a delicate balance: too much and too little are both problematic. ::: ## Memory cells: faster, stronger, and more independent ![Naïve T-cell activated by APC](graphics/presentation/T-cells-A-SVG.png) ## Memory cells: faster, stronger, and more independent ![Naïve T-cell differentiates and proliferates into effector T-cells](graphics/presentation/T-cells-B-SVG.png) ## Memory cells: faster, stronger, and more independent ![Post-infection, some effectors cells remain as memory cells](graphics/presentation/T-cells-C-SVG.png) ## Memory cells: faster, stronger, and more independent ![Memory T-cells respond more strongly to activation](graphics/presentation/T-cells-D-SVG.png) ::: notes Compared to naïve cells, memory cells: * respond to a lower antigen concentration * respond more strongly at any given antigen concentration * require less co-stimulation * are somewhat independent of some types of co-stimulation required by naïve cells * evolve over time to respond even more strongly to their antigen Result: * Memory cells require progressively higher doses of immune suppresive drugs * Dosage cannot be increased indefinitely without compromising the immune system's ability to fight infection ::: ## 3 problems relating to transplant rejection ### 1. How are memory cells different from naïve? \onslide<2->{Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation} ### 2. How can we diagnose rejection noninvasively? \onslide<3->{Improving array-based diagnostics for transplant rejection by optimizing data preprocessing} ### 3. How can we evaluate effects of a rejection treatment? \onslide<4->{Globin-blocking for more effective blood RNA-seq analysis in primate animal model for experimental graft rejection treatment} ## Today's focus ### \Large 1. How are memory cells different from naïve? \Large Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation ## We need a better understanding of immune memory * Cell surface markers fairly well-characterized * But internal mechanisms poorly understood . . . \vfill \large **Hypothesis:** Epigenetic regulation of gene expression through histone modification is involved in $\mathsf{CD4}^{+}$ T-cell activation and memory. ## Which histone marks are we looking at? . . . ::: incremental * **H3K4me3:** "activating" mark associated with active transcription * **H3K4me2:** Correlated with H3K4me3, hypothesized "poised" state * **H3K27me3:** "repressive" mark associated with inactive genes ::: . . . \vfill All involved in T-cell differentiation, but activation dynamics unexplored ## ChIP-seq measures DNA bound to marked histones[^chipseq] \centering ![](graphics/presentation/NRG-chipseq.png){ height=70% } [^chipseq]: [Furey (2012)](http://www.nature.com/articles/nrg3306) ## Experimental design ::: incremental * Separately isolate naïve and memory $\mathsf{CD4}^{+}$ T-cells from 4 donors * Activate with CD3/CD28 beads * Sample at 4 time points: Day 0 (pre-activation), Day 1 (early activation), Day 5 (peak activation), and Day 14 (post-activation) * RNA-seq + ChIP-seq of 3 histone marks (H3K4me2, H3K4me3, & H3K27me3) for each sample. ::: Data generated by Sarah Lamere, published in GEO as [GSE73214](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73214) ## Time points capture phases of immune response \centering ![](graphics/presentation/immune-response.png) ## A few intermediate analysis steps are required \centering ![](graphics/CD4-csaw/rulegraphs/rulegraph-all-RASTER100.png) ## Histone modifications occur on consecutive histones ![ChIP-seq coverage in IL2 gene[^lamerethesis]](graphics/presentation/LaMere-thesis-fig3.9-SVG-CROP.png){ height=65% } [^lamerethesis]: Sarah LaMere. Ph.D. thesis (2015). ## Histone modifications occur on consecutive histones \begin{figure} \centering \only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-A-SVG.png}} \only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-B-SVG.png}} \only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-C-SVG.png}} \caption{Strand cross-correlation plots show histone-sized wave pattern} \end{figure} ## SICER identifies enriched regions across the genome ![Finding "islands" of coverage with SICER[^sicer]](graphics/presentation/SICER-fig1-SVG.png) [^sicer]: [Zang et al. (2009)](https://doi.org/10.1093/bioinformatics/btp340) ## IDR identifies *reproducible* enriched regions ![Example irreproducible discovery rate[^idr] score consistency plot](graphics/presentation/IDR-example-CROP-RASTER.png){ height=65% } [^idr]: [Li et al. (2011)](https://doi.org/10.1214/11-AOAS466) ## Finding enriched regions across the genome ![Peak-calling summary statistics](graphics/presentation/RCT-thesis-table2.2-SVG-CROP.png) ## Each histone mark has an "effective promoter radius" ![Enrichment of peaks near promoters](graphics/CD4-csaw/Promoter-Peak-Distance-Profile-PAGE1-CROP.pdf) ## Peaks in promoters correlate with gene expression \begin{figure} \centering \only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-A-SVG.png}} \only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-B-SVG.png}} \only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-C-SVG.png}} \only<4>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-D-SVG.png}} \only<5>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-Z-SVG.png}} \caption{Expression distributions of genes with and without promoter peaks} \end{figure} ## The story so far * H3K4me2, H3K4me3, and H3K27me3 occur on many consecutive histones in broad regions across the genome * These enriched regions occur more commonly within a certain radius of gene promoters * This "effective promoter radius" is consistent across all samples for a given histone mark, but differs between histone marks * Presence or absence of a peak within this radius is correlated with gene expression . . . Next: Does the position of a histone modification within a gene promoter matter to that gene's expression, or is it merely the presence or absence anywhere within the promoter? ## H3K4me2 promoter neighborhood K-means clusters ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ## H3K4me2 promoter neighborhood K-means clusters :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ::: :::::::::: ## H3K4me2 cluster PCA shows a semicircular "fan" :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/presentation/H3K4me2-neighborhood-PCA-CROP.png){ height=70% } ::: :::::::::: ## H3K4me2 near TSS correlates with expression :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/presentation/H3K4me2-neighborhood-expression-CROP-ROT90.png){ height=70% } ::: :::::::::: ## H3K4me3 pattern is similar to H3K4me2 :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me3](graphics/presentation/H3K4me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/presentation/H3K4me3-neighborhood-PCA-CROP.png){ height=70% } ::: :::::::::: ## H3K4me3 pattern is similar to H3K4me2 :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me3](graphics/presentation/H3K4me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/presentation/H3K4me3-neighborhood-expression-CROP-ROT90.png){ height=70% } ::: :::::::::: ## H3K27me3 clusters organize into 3 opposed pairs :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K27me3](graphics/presentation/H3K27me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/presentation/H3K27me3-neighborhood-PCA-CROP.png){ height=70% } ::: :::::::::: ## Specific H3K27me3 profiles show elevated expression :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K27me3](graphics/presentation/H3K27me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/presentation/H3K27me3-neighborhood-expression-CROP-ROT90.png){ height=70% } ::: :::::::::: ## Summary of promoter relative coverage findings ### H3K4me2 & H3K4me3 * Peak closer to promoter $\Rightarrow$ higher gene expression * Slightly asymmetric in favor of peaks downstream of TSS . . . ### H3K27me3 * Depletion of H3K27me3 at TSS associated with elevated gene expression * Enrichment of H3K27me3 upstream of TSS even more strongly associated with elevated expression * Other coverage profiles not associated with elevated expression ## Differential modification disappears by Day 14 ![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-A-SVG-CROP.png) ## Differential modification disappears by Day 14 ![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-B-SVG-CROP.png) ## Promoter H3K4me2 levels converge at Day 14 \centering ![](graphics/CD4-csaw/ChIP-seq/H3K4me2-promoter-PCA-group-CROP.png) ## Promoter H3K4me3 levels converge at Day 14 \centering ![](graphics/CD4-csaw/ChIP-seq/H3K4me3-promoter-PCA-group-CROP.png) ## Promoter H3K27me3 levels converge at Day 14? \centering ![](graphics/CD4-csaw/ChIP-seq/H3K27me3-promoter-PCA-group-CROP.png) ## Expression converges at Day 14 (in PC 2 & 3) \centering ![](graphics/CD4-csaw/RNA-seq/PCA-final-23-CROP.png) ## But the data isn't really that clean... :::::::::: {.columns} ::: {.column width="50%"} ![H3K4me2](graphics/CD4-csaw/ChIP-seq/H3K4me2-PCA-raw-CROP.png) ::: ::: {.column width="50%"} ![H3K4me3](graphics/CD4-csaw/ChIP-seq/H3K4me3-PCA-raw-CROP.png) ::: :::::::::: ## But the data isn't really that clean... :::::::::: {.columns} ::: {.column width="50%"} ![H3K27me3](graphics/CD4-csaw/ChIP-seq/H3K27me3-PCA-raw-CROP.png) ::: ::: {.column width="50%"} ![RNA-seq](graphics/CD4-csaw/RNA-seq/PCA-no-batchsub-CROP.png) ::: :::::::::: ## MOFA identifies cross-dataset patterns of variation ![MOFA factor analysis schematic[^mofa]](graphics/presentation/MOFA-fig1A-SVG.png) [^mofa]: [Argelaguet, Velten, et. al. (2018)](https://onlinelibrary.wiley.com/doi/abs/10.15252/msb.20178124) ## MOFA LFs explain variation in all 4 data sets \centering ![Variance explained in each data set by each LF](graphics/presentation/MOFA-varExplained-matrix-A-CROP.png){ height=70% } ## 3 LFs are shared across all 4 data sets \centering ![LFs 1, 4, and 5 explain variation in all 4 data sets](graphics/presentation/MOFA-varExplained-matrix-B-CROP.png){ height=70% } ## MOFA LF5 captures convergence pattern ![LF1 & LF4: time point effect; LF5: convergence](graphics/CD4-csaw/MOFA-LF-scatter-small.png){ height=70% } ## What have we learned? * Almost no differential modification observed between naïve and memory at Day 14, despite plenty of differential modification at earlier time points. * RNA-seq data and all 3 histone marks' ChIP-seq data all show "convergence" between naïve and memory by Day 14 in the first 2 or 3 principal coordinates. * MOFA captures this convergence pattern in one of the latent factors, indicating that this is a shared pattern across all 4 data sets. ## Takeaway 1: Each histone mark has an "effective promoter radius" * H3K4me2, H3K4me3, and H3K27me3 ChIP-seq reads are enriched in broad regions across the genome, representing areas where the histone modification is present * These enriched regions occur more commonly within a certain radius of gene promoters * This "effective promoter radius" is specific to each histone mark * Presence or absence of a peak within this radius is correlated with gene expression ## Takeaway 2: Peak position within the promoter is important * H3K4me2 and H3K4me3 peaks are more strongly associated with elevated gene expression the closer they are to the TSS, with a slight bias toward downstream peaks. * H3K27me3 depletion at the TSS and enrichement upstream are both associated with elevated expression, while other patterns are not. * In all histone marks, position of modification within promoter appears to be an important factor in association with gene expression ## Takeaway 3: Expression & epigenetic state both converge at Day 14 * At Day 14, almost no differential modification observed between naïve and memory cells * Naïve and memory converge visually in PCoA plots * Convergence is a shared pattern of variation across all 3 histone marks and gene expression * This is consistent with the hypothesis that the naïve cells have differentiated into a more memory-like phenotype by day 14.