--- title: Bioinformatic analysis of complex, high-throughput genomic and epigenomic data in the context of $\mathsf{CD4}^{+}$ T-cell differentiation and diagnosis and treatment of transplant rejection author: | Ryan C. Thompson \ Su Lab \ The Scripps Research Institute date: October 24, 2019 theme: Boadilla aspectratio: 169 fontsize: 14pt --- ## Organ transplants are a life-saving treatment ::: incremental * 36,528 transplants performed in the USA in 2018[^organdonor] * 100 transplants every day! * Over 113,000 people on the national transplant waiting list as of July 2019 ::: [^organdonor]: [organdonor.gov](https://www.organdonor.gov/statistics-stories/statistics.html) ## Organ donation statistics for the USA in 2018[^organdonor] \centering ![](graphics/presentation/transplants-organ-CROP.pdf) ## Types of grafts A graft is categorized based on the relationship between donor and recipient: . . . ::: incremental * **Autograft:** Donor and recipient are the *same individual* * **Allograft:** Donor and recipient are *different individuals* of the *same species* * **Xenograft:** Donor and recipient are *different species* ::: ## Recipient T-cells reject allogenic MHCs :::::::::: {.columns} ::: {.column width="55%"} :::: incremental * TCR binds to both antigen *and* MHC surface \vspace{10pt} * HLA genes encoding MHC proteins are highly polymorphic \vspace{10pt} * Variants in donor MHC can trigger the same T-cell response as a foreign antigen :::: ::: ::: {.column width="40%"} ![TCR binding to self (right) and allogenic (left) MHC\footnotemark](graphics/presentation/tcr_mhc.jpg){ height=70% } ::: :::::::::: \footnotetext[3]{\href{https://doi.org/10.1016/j.cell.2007.01.048}{Colf, Bankovich, et al. "How a Single T Cell Receptor Recognizes Both Self and Foreign MHC". In: Cell (2007)}} ## Allograft rejection is a major long-term problem ![Kidney allograft survival rates in children by transplant year[^kim-marks]](graphics/presentation/kidney-graft-survival.png){ height=65% } [^kim-marks]: [Kim & Marks (2014)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3884158/?report=classic) ## Rejection is treated with immune suppressive drugs ::: incremental * Graft recipient must take immune suppressive drugs indefinitely * Graft is monitored for rejection and dosage adjusted over time * Immune suppression is a delicate balance: too much and too little are both problematic. ::: ## Memory cells: faster, stronger, and more independent ![Naïve T-cell activated by APC](graphics/presentation/T-cells-A-SVG.png) ## Memory cells: faster, stronger, and more independent ![Naïve T-cell differentiates and proliferates into effector T-cells](graphics/presentation/T-cells-B-SVG.png) ## Memory cells: faster, stronger, and more independent ![Post-infection, some effectors cells remain as memory cells](graphics/presentation/T-cells-C-SVG.png) ## Memory cells: faster, stronger, and more independent ![Memory T-cells respond more strongly to activation](graphics/presentation/T-cells-D-SVG.png) ::: notes Compared to naïve cells, memory cells: * respond to a lower antigen concentration * respond more strongly at any given antigen concentration * require less co-stimulation * are somewhat independent of some types of co-stimulation required by naïve cells * evolve over time to respond even more strongly to their antigen Result: * Memory cells require progressively higher doses of immune suppresive drugs * Dosage cannot be increased indefinitely without compromising the immune system's ability to fight infection ::: ## 3 problems relating to transplant rejection ### 1. How are memory cells different from naïve? \onslide<2->{Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation} ### 2. How can we diagnose rejection noninvasively? \onslide<3->{Improving array-based diagnostics for transplant rejection by optimizing data preprocessing} ### 3. How can we evaluate effects of a rejection treatment? \onslide<4->{Globin-blocking for more effective blood RNA-seq analysis in primate animal model for experimental graft rejection treatment} ## Today's focus ### \Large 1. How are memory cells different from naïve? \Large Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve and memory $\mathsf{CD4}^{+}$ T-cell activation ## We need a better understanding of immune memory * Cell surface markers fairly well-characterized * But internal mechanisms poorly understood . . . \vfill \large **Hypothesis:** Epigenetic regulation of gene expression through histone modification is involved in $\mathsf{CD4}^{+}$ T-cell activation and memory. ## Which histone marks are we looking at? . . . ::: incremental * **H3K4me3:** "activating" mark associated with active transcription * **H3K4me2:** Correlated with H3K4me3, hypothesized "poised" state * **H3K27me3:** "repressive" mark associated with inactive genes ::: . . . \vfill All involved in T-cell differentiation, but activation dynamics unexplored ## ChIP-seq measures DNA bound to marked histones[^chipseq] \centering ![](graphics/presentation/NRG-chipseq.png){ height=70% } [^chipseq]: [Furey (2012)](http://www.nature.com/articles/nrg3306) ## Experimental design \centering ![](graphics/presentation/expdesign-CROP.pdf){ height=70% } \footnotesize Data generated by Sarah Lamere, published in GEO as [GSE73214](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73214) ## Time points capture phases of immune response \centering ![](graphics/presentation/immune-response.png) ## A few intermediate analysis steps are required \centering ![](graphics/CD4-csaw/rulegraphs/rulegraph-all-RASTER100.png) ## Questions to focus on ::: incremental 1. How do we define the "promoter region" for each gene? \vspace{10pt} 2. How do these histone marks behave in promoter regions? \vspace{10pt} 3. What can these histone marks tell us about T-cell activation and differentiation? ::: ## First question \centering \LARGE How do we define the "promoter region" for each gene? ## Histone modifications occur on consecutive histones ![ChIP-seq coverage in IL2 gene[^lamerethesis]](graphics/presentation/LaMere-thesis-fig3.9-SVG-CROP.png){ height=65% } [^lamerethesis]: Sarah LaMere. Ph.D. thesis (2015). ## Histone modifications occur on consecutive histones \begin{figure} \centering \only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-A-SVG.png}} \only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-B-SVG.png}} \only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-C-SVG.png}} \caption{Strand cross-correlation plots show histone-sized wave pattern} \end{figure} ## SICER identifies enriched regions across the genome ![Finding "islands" of coverage with SICER[^sicer]](graphics/presentation/SICER-fig1-SVG.png) [^sicer]: [Zang et al. (2009)](https://doi.org/10.1093/bioinformatics/btp340) ## IDR identifies *reproducible* enriched regions ![Example irreproducible discovery rate[^idr] score consistency plot](graphics/presentation/IDR-example-CROP-RASTER.png){ height=65% } [^idr]: [Li et al. (2011)](https://doi.org/10.1214/11-AOAS466) ## Finding enriched regions across the genome ![Peak-calling summary statistics](graphics/presentation/RCT-thesis-table2.2-SVG-CROP.png) ## Each histone mark has an "effective promoter radius" ![Enrichment of peaks near promoters](graphics/presentation/Promoter-Peak-Distance-Profile-SVG.pdf) ## Peaks in promoters correlate with gene expression \begin{figure} \centering \only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-A-SVG.png}} \only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-B-SVG.png}} \only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-C-SVG.png}} \only<4>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-D-SVG.png}} \only<5>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-Z-SVG.png}} \caption{Expression distributions of genes with and without promoter peaks} \end{figure} ## First question \centering \LARGE How do we define the "promoter region" for each gene? ## Answer: Define the promoter region empirically! :::::::::: {.columns} ::: {.column width="50%"} * H3K4me2, H3K4me3, and H3K27me3 occur in broad regions across the genome * Enriched regions occur more commonly near promoters * Each histone mark has its own "effective promoter radius" * Presence or absence of a peak within this radius is correlated with gene expression ::: ::: {.column width="50%"} \centering \only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-A-SVG.png}} \only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/Promoter-Peak-Distance-Profile-SVG.pdf}} \only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-A-SVG.png}} ::: :::::::::: ## Next question \centering \LARGE How do these histone marks behave in promoter regions? ::: notes Does the position of a histone modification within a gene promoter matter to that gene's expression, or is it merely the presence or absence anywhere within the promoter? ::: ## H3K4me2 promoter neighborhood K-means clusters ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ## H3K4me2 promoter neighborhood K-means clusters :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ::: :::::::::: ## H3K4me2 cluster PCA shows a semicircular "fan" :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/presentation/H3K4me2-neighborhood-PCA-CROP.png){ height=70% } ::: :::::::::: ## H3K4me2 near TSS correlates with expression :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/presentation/H3K4me2-neighborhood-expression-CROP-ROT90.png){ height=70% } ::: :::::::::: ## H3K4me3 pattern is similar to H3K4me2 :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me3](graphics/presentation/H3K4me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/presentation/H3K4me3-neighborhood-PCA-CROP.png){ height=70% } ::: :::::::::: ## H3K4me3 pattern is similar to H3K4me2 :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K4me3](graphics/presentation/H3K4me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/presentation/H3K4me3-neighborhood-expression-CROP-ROT90.png){ height=70% } ::: :::::::::: ## H3K27me3 clusters organize into 3 opposed pairs :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K27me3](graphics/presentation/H3K27me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![PCA plot of promoters](graphics/presentation/H3K27me3-neighborhood-PCA-CROP.png){ height=70% } ::: :::::::::: ## Specific H3K27me3 profiles show elevated expression :::::::::: {.columns} ::: {.column width="50%"} ![Cluster means for H3K27me3](graphics/presentation/H3K27me3-neighborhood-clusters-CROP.png){ height=70% } ::: ::: {.column width="50%"} ![Cluster expression distributions](graphics/presentation/H3K27me3-neighborhood-expression-CROP-ROT90.png){ height=70% } ::: :::::::::: ## Current question \centering \LARGE How do these histone marks behave in promoter regions? ## Answer: Presence and position both matter! ### H3K4me2 & H3K4me3 * Peak closer to promoter $\Rightarrow$ higher gene expression * Slightly asymmetric in favor of peaks downstream of TSS . . . ### H3K27me3 * Depletion of H3K27me3 at TSS $\Rightarrow$ elevated gene expression * Enrichment of H3K27me3 upstream of TSS $\Rightarrow$ *more* elevated expression * Other coverage profiles: no association ## Last question \centering \LARGE What can these histone marks tell us about T-cell activation and differentiation? ## Differential modification disappears by Day 14 ![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-A-SVG-CROP.pdf) ## Differential modification disappears by Day 14 ![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-B-SVG-CROP.pdf) ## Promoter H3K4me2 levels converge at Day 14 \centering ![](graphics/CD4-csaw/ChIP-seq/H3K4me2-promoter-PCA-group-CROP.png) ## Promoter H3K4me3 levels converge at Day 14 \centering ![](graphics/CD4-csaw/ChIP-seq/H3K4me3-promoter-PCA-group-CROP.png) ## Promoter H3K27me3 levels converge at Day 14? \centering ![](graphics/CD4-csaw/ChIP-seq/H3K27me3-promoter-PCA-group-CROP.png) ## Expression converges at Day 14 (in PC 2 & 3) \centering ![](graphics/CD4-csaw/RNA-seq/PCA-final-23-CROP.png) ## But the data isn't really that clean... :::::::::: {.columns} ::: {.column width="50%"} ![H3K4me2](graphics/CD4-csaw/ChIP-seq/H3K4me2-PCA-raw-CROP.png) ::: ::: {.column width="50%"} ![H3K4me3](graphics/CD4-csaw/ChIP-seq/H3K4me3-PCA-raw-CROP.png) ::: :::::::::: ## But the data isn't really that clean... :::::::::: {.columns} ::: {.column width="50%"} ![H3K27me3](graphics/CD4-csaw/ChIP-seq/H3K27me3-PCA-raw-CROP.png) ::: ::: {.column width="50%"} ![RNA-seq](graphics/CD4-csaw/RNA-seq/PCA-no-batchsub-CROP.png) ::: :::::::::: ## MOFA: cross-dataset factor analysis ![MOFA factor analysis schematic[^mofa]](graphics/presentation/MOFA-fig1A-SVG.png){ height=70% } [^mofa]: [Argelaguet, Velten, et. al. (2018)](https://onlinelibrary.wiley.com/doi/abs/10.15252/msb.20178124) ## Some factors are shared while others are not \centering ![Variance explained in each data set by each LF](graphics/presentation/MOFA-varExplained-matrix-A-CROP.png){ height=70% } ## 3 factors are shared across all 4 data sets \centering ![LFs 1, 4, and 5 explain variation in all 4 data sets](graphics/presentation/MOFA-varExplained-matrix-B-CROP.png){ height=70% } ## MOFA LF5 captures convergence pattern ![LF1 & LF4: time point effect; LF5: convergence](graphics/presentation/MOFA-LF-scatter-wide.png) ## Last question \centering \LARGE What can these histone marks tell us about T-cell activation and differentiation? ## Answer: Epigenetic convergence between naïve and memory! * Almost no differential histone modification observed between naïve and memory at Day 14, despite plenty of differential modification at earlier time points. * Expression and 3 histone marks all show "convergence" between naïve and memory by Day 14 in the first 2 or 3 principal coordinates. * MOFA captures this convergence pattern in a single latent factor, indicating that this is a shared pattern across all 4 data sets. ## Questions to focus on ### How do we define the "promoter region" for each gene? Define empirically using peak-to-promoter distances; validate by correlation with expression. . . . ### How do these histone marks behave in promoter regions? Location matters! Specific coverage patterns correlated with elevated expression. . . . ### What can we learn about T-cell activation and differentiation? Epigenetic & expression state of naïve and memory converges late after activation, consistent with naïve differentiation into memory. ## Further conclusions & future directions * "Effective promoter region" is a valid concept but "radius" oversimplifies: seek a better definition * Coverage profiles were only examined in naïve day 0 samples: further analysis could incorporate time and cell type * Coverage profile normalization induces degeneracy: adapt a better normalization from peak callers like SICER * Unimodal distribution of promoter coverage profiles is unexpected ## Further conclusions & future directions * Experiment was not designed to directly test the epigenetic convergence hypothesis: future experiments could include cultured but un-activated controls * High correlation between H3K4me3 and H3K4me2 is curious given they are mutually exclusive: design experiments to determine the degree of actual co-occurrence ## Implications for transplant biology ::: incremental * Epigenetic regulation through histone methylation is surely involved in immune memory * Can we stop memory cells from forming by perturbing histone methylation? * Can we disrupt memory cell function during rejection by perturbing histone methylation? * Can we suggest druggable targets for better immune suppression by looking at epigenetically upregulated genes in memory cells? ::: ## Acknowledgements * My mentors, past and present: Drs. Terry Gaasterland, Daniel Salomon, and Andrew Su * My committee: Drs. Nicholas Schork, Ali Torkamani, Michael Petrascheck, and Luc Teyton. * My many collaborators in the Salomon Lab * The Scripps Genomics Core * My parents, John & Chris Thompson ## {.plain} \centering \huge Questions?