---
title: Bioinformatic analysis of complex, high-throughput genomic and epigenomic data in the context of $\mathsf{CD4}^{+}$ T-cell differentiation and diagnosis and treatment of transplant rejection
author: |
    Ryan C. Thompson \
    Su Lab \
    The Scripps Research Institute
date: October 24, 2019
theme: Boadilla
aspectratio: 169
fontsize: 14pt
---

## Organ transplants are a life-saving treatment

::: incremental

* 36,528 transplants performed in the USA in 2018[^organdonor]

* 100 transplants every day!

* Over 113,000 people on the national transplant waiting list as of
  July 2019

:::

[^organdonor]: [organdonor.gov](https://www.organdonor.gov/statistics-stories/statistics.html)

## Organ donation statistics for the USA in 2018[^organdonor]

\centering

![](graphics/presentation/transplants-organ-CROP.pdf)

## Types of grafts

A graft is categorized based on the relationship between donor and recipient:

. . .

::: incremental

* **Autograft:** Donor and recipient are the *same individual*

* **Allograft:** Donor and recipient are *different individuals* of
  the *same species*

* **Xenograft:** Donor and recipient are *different species*

:::

## Recipient T-cells reject allogenic MHCs

:::::::::: {.columns}

::: {.column width="55%"}

:::: incremental

* TCR binds to both antigen *and* MHC surface \vspace{10pt}

* HLA genes encoding MHC proteins are highly polymorphic \vspace{10pt}

* Variants in donor MHC can trigger the same T-cell response as a
  foreign antigen

::::

:::

::: {.column width="40%"}
<!-- ![\footnotesize Janeway's Immunobio- logy (2012), Fig. 9.19](graphics/presentation/janeway-fig9.19-TCR.png){ height=70% } -->
![TCR binding to self (right) and allogenic (left) MHC\footnotemark](graphics/presentation/tcr_mhc.jpg){ height=70% }
:::

::::::::::

\footnotetext[3]{\href{https://doi.org/10.1016/j.cell.2007.01.048}{Colf, Bankovich, et al. "How a Single T Cell Receptor Recognizes Both Self and Foreign MHC". In: Cell (2007)}}

## Allograft rejection is a major long-term problem

![Kidney allograft survival rates in children by transplant year[^kim-marks]](graphics/presentation/kidney-graft-survival.png){ height=65% }

[^kim-marks]: [Kim & Marks (2014)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3884158/?report=classic)

## Rejection is treated with immune suppressive drugs

<!-- TODO: Need a graphic, or maybe a table of common drugs +
mechanisms, or a diagram for periodic checking. -->

::: incremental

* Graft recipient must take immune suppressive drugs indefinitely

* Graft is monitored for rejection and dosage adjusted over time

* Immune suppression is a delicate balance: too much and too little
  are both problematic.

:::

## Memory cells: faster, stronger, and more independent

![Naïve T-cell activated by APC](graphics/presentation/T-cells-A-SVG.png)

## Memory cells: faster, stronger, and more independent

![Naïve T-cell differentiates and proliferates into effector T-cells](graphics/presentation/T-cells-B-SVG.png)

## Memory cells: faster, stronger, and more independent

![Post-infection, some effectors cells remain as memory cells](graphics/presentation/T-cells-C-SVG.png)

## Memory cells: faster, stronger, and more independent

![Memory T-cells respond more strongly to activation](graphics/presentation/T-cells-D-SVG.png)

::: notes

Compared to naïve cells, memory cells:

* respond to a lower antigen concentration
* respond more strongly at any given antigen concentration
* require less co-stimulation
* are somewhat independent of some types of co-stimulation required by
  naïve cells
* evolve over time to respond even more strongly to their antigen

Result:

* Memory cells require progressively higher doses of immune suppresive
  drugs
* Dosage cannot be increased indefinitely without compromising the
  immune system's ability to fight infection

:::

## 3 problems relating to transplant rejection

### 1. How are memory cells different from naïve?

\onslide<2->{Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve
and memory $\mathsf{CD4}^{+}$ T-cell activation}

### 2. How can we diagnose rejection noninvasively?

\onslide<3->{Improving array-based diagnostics for transplant rejection by
optimizing data preprocessing}

### 3. How can we evaluate effects of a rejection treatment?

\onslide<4->{Globin-blocking for more effective blood RNA-seq analysis in primate
animal model for experimental graft rejection treatment}

## Today's focus

### \Large 1. How are memory cells different from naïve?

\Large

Genome-wide epigenetic analysis of H3K4 and H3K27 methylation in naïve
and memory $\mathsf{CD4}^{+}$ T-cell activation

## We need a better understanding of immune memory

* Cell surface markers fairly well-characterized

* But internal mechanisms poorly understood

. . .

\vfill

\large

**Hypothesis:** Epigenetic regulation of gene expression through
histone modification is involved in $\mathsf{CD4}^{+}$ T-cell
activation and memory.
  
## Which histone marks are we looking at?

. . .

::: incremental

* **H3K4me3:** "activating" mark associated with active transcription

* **H3K4me2:** Correlated with H3K4me3, hypothesized "poised" state

* **H3K27me3:** "repressive" mark associated with inactive genes

:::

. . .

\vfill

All involved in T-cell differentiation, but activation dynamics
unexplored

## ChIP-seq measures DNA bound to marked histones[^chipseq]

\centering

![](graphics/presentation/NRG-chipseq.png){ height=70% }

[^chipseq]: [Furey (2012)](http://www.nature.com/articles/nrg3306)

## Experimental design

\centering

![](graphics/presentation/expdesign-CROP.pdf){ height=70% }

\footnotesize

Data generated by Sarah Lamere, published in GEO as
[GSE73214](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73214)

## Time points capture phases of immune response

\centering

![](graphics/presentation/immune-response.png)

## A few intermediate analysis steps are required

\centering

![](graphics/CD4-csaw/rulegraphs/rulegraph-all-RASTER100.png)

## Questions to focus on

::: incremental

1. How do we define the "promoter region" for each gene? \vspace{10pt}
2. How do these histone marks behave in promoter regions? \vspace{10pt}
3. What can these histone marks tell us about T-cell activation and
   differentiation?

:::

## First question

\centering \LARGE

How do we define the "promoter region" for each gene?

## Histone modifications occur on consecutive histones

![ChIP-seq coverage in IL2 gene[^lamerethesis]](graphics/presentation/LaMere-thesis-fig3.9-SVG-CROP.png){ height=65% }

[^lamerethesis]: Sarah LaMere. Ph.D. thesis (2015).

## Histone modifications occur on consecutive histones

\begin{figure}
\centering
\only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-A-SVG.png}}
\only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-B-SVG.png}}
\only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-C-SVG.png}}
\caption{Strand cross-correlation plots show histone-sized wave pattern}
\end{figure}

## SICER identifies enriched regions across the genome

![Finding "islands" of coverage with SICER[^sicer]](graphics/presentation/SICER-fig1-SVG.png)

[^sicer]: [Zang et al. (2009)](https://doi.org/10.1093/bioinformatics/btp340)

## IDR identifies *reproducible* enriched regions

![Example irreproducible discovery rate[^idr] score consistency plot](graphics/presentation/IDR-example-CROP-RASTER.png){ height=65% }

[^idr]: [Li et al. (2011)](https://doi.org/10.1214/11-AOAS466)

## Finding enriched regions across the genome

![Peak-calling summary statistics](graphics/presentation/RCT-thesis-table2.2-SVG-CROP.png)

## Each histone mark has an "effective promoter radius"

![Enrichment of peaks near promoters](graphics/presentation/Promoter-Peak-Distance-Profile-SVG.pdf)

## Peaks in promoters correlate with gene expression

\begin{figure}
\centering
\only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-A-SVG.png}}
\only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-B-SVG.png}}
\only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-C-SVG.png}}
\only<4>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-D-SVG.png}}
\only<5>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-Z-SVG.png}}
\caption{Expression distributions of genes with and without promoter peaks}
\end{figure}

## First question

\centering \LARGE

How do we define the "promoter region" for each gene?

## Answer: Define the promoter region empirically!

<!-- TODO: Left column: text; right column: flip through relevant image -->

:::::::::: {.columns}
::: {.column width="50%"}

* H3K4me2, H3K4me3, and H3K27me3 occur in broad regions across the
  genome
* Enriched regions occur more commonly near promoters
* Each histone mark has its own "effective promoter radius"
* Presence or absence of a peak within this radius is correlated with
  gene expression

:::

::: {.column width="50%"}
\centering
\only<1>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/CCF-plots-A-SVG.png}}
\only<2>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/Promoter-Peak-Distance-Profile-SVG.pdf}}
\only<3>{\includegraphics[width=\textwidth,height=0.7\textheight]{graphics/presentation/FPKM-by-Peak-Violin-Plots-A-SVG.png}}
:::
::::::::::

## Next question

\centering \LARGE

How do these histone marks behave in promoter regions?

::: notes

Does the position of a histone modification within a gene promoter
matter to that gene's expression, or is it merely the presence or
absence anywhere within the promoter?

:::

## H3K4me2 promoter neighborhood K-means clusters

![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% }

## H3K4me2 promoter neighborhood K-means clusters

:::::::::: {.columns}
::: {.column width="50%"}
![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% }
:::
::: {.column width="50%"}
<!-- This space intentionally left blank -->
:::
::::::::::

## H3K4me2 cluster PCA shows a semicircular "fan"

:::::::::: {.columns}
::: {.column width="50%"}
![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% }
:::
::: {.column width="50%"}
![PCA plot of promoters](graphics/presentation/H3K4me2-neighborhood-PCA-CROP.png){ height=70% }
:::
::::::::::

## H3K4me2 near TSS correlates with expression

:::::::::: {.columns}
::: {.column width="50%"}
![Cluster means for H3K4me2](graphics/presentation/H3K4me2-neighborhood-clusters-CROP.png){ height=70% }
:::
::: {.column width="50%"}
![Cluster expression distributions](graphics/presentation/H3K4me2-neighborhood-expression-CROP-ROT90.png){ height=70% }
:::
::::::::::

## H3K4me3 pattern is similar to H3K4me2

:::::::::: {.columns}
::: {.column width="50%"}
![Cluster means for H3K4me3](graphics/presentation/H3K4me3-neighborhood-clusters-CROP.png){ height=70% }
:::
::: {.column width="50%"}
![PCA plot of promoters](graphics/presentation/H3K4me3-neighborhood-PCA-CROP.png){ height=70% }
:::
::::::::::

## H3K4me3 pattern is similar to H3K4me2

:::::::::: {.columns}
::: {.column width="50%"}
![Cluster means for H3K4me3](graphics/presentation/H3K4me3-neighborhood-clusters-CROP.png){ height=70% }
:::
::: {.column width="50%"}
![Cluster expression distributions](graphics/presentation/H3K4me3-neighborhood-expression-CROP-ROT90.png){ height=70% }
:::
::::::::::

## H3K27me3 clusters organize into 3 opposed pairs

:::::::::: {.columns}
::: {.column width="50%"}
![Cluster means for H3K27me3](graphics/presentation/H3K27me3-neighborhood-clusters-CROP.png){ height=70% }
:::
::: {.column width="50%"}
![PCA plot of promoters](graphics/presentation/H3K27me3-neighborhood-PCA-CROP.png){ height=70% }
:::
::::::::::

## Specific H3K27me3 profiles show elevated expression

:::::::::: {.columns}
::: {.column width="50%"}
![Cluster means for H3K27me3](graphics/presentation/H3K27me3-neighborhood-clusters-CROP.png){ height=70% }
:::
::: {.column width="50%"}
![Cluster expression distributions](graphics/presentation/H3K27me3-neighborhood-expression-CROP-ROT90.png){ height=70% }
:::
::::::::::

## Current question

\centering \LARGE

How do these histone marks behave in promoter regions?

## Answer: Presence and position both matter!

### H3K4me2 & H3K4me3

* Peak closer to promoter $\Rightarrow$ higher gene expression
* Slightly asymmetric in favor of peaks downstream of TSS

. . .

### H3K27me3

* Depletion of H3K27me3 at TSS $\Rightarrow$ elevated gene expression
* Enrichment of H3K27me3 upstream of TSS $\Rightarrow$ *more* elevated
  expression
* Other coverage profiles: no association

## Last question

\centering \LARGE

What can these histone marks tell us about T-cell activation and
differentiation?

## Differential modification disappears by Day 14

![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-A-SVG-CROP.pdf)

## Differential modification disappears by Day 14

![Differential modification between naïve and memory samples at each time point](graphics/presentation/RCT-thesis-table2.4-B-SVG-CROP.pdf)

## Promoter H3K4me2 levels converge at Day 14

\centering

![](graphics/CD4-csaw/ChIP-seq/H3K4me2-promoter-PCA-group-CROP.png)

## Promoter H3K4me3 levels converge at Day 14

\centering

![](graphics/CD4-csaw/ChIP-seq/H3K4me3-promoter-PCA-group-CROP.png)

## Promoter H3K27me3 levels converge at Day 14?

\centering

![](graphics/CD4-csaw/ChIP-seq/H3K27me3-promoter-PCA-group-CROP.png)

## Expression converges at Day 14 (in PC 2 & 3)

\centering

![](graphics/CD4-csaw/RNA-seq/PCA-final-23-CROP.png)

## But the data isn't really that clean...

:::::::::: {.columns}
::: {.column width="50%"}
![H3K4me2](graphics/CD4-csaw/ChIP-seq/H3K4me2-PCA-raw-CROP.png)
:::
::: {.column width="50%"}
![H3K4me3](graphics/CD4-csaw/ChIP-seq/H3K4me3-PCA-raw-CROP.png)
:::
::::::::::

## But the data isn't really that clean...

:::::::::: {.columns}
::: {.column width="50%"}
![H3K27me3](graphics/CD4-csaw/ChIP-seq/H3K27me3-PCA-raw-CROP.png)
:::
::: {.column width="50%"}
![RNA-seq](graphics/CD4-csaw/RNA-seq/PCA-no-batchsub-CROP.png)
:::
::::::::::

## MOFA: cross-dataset factor analysis

![MOFA factor analysis schematic[^mofa]](graphics/presentation/MOFA-fig1A-SVG.png){ height=70% }

[^mofa]: [Argelaguet, Velten, et. al. (2018)](https://onlinelibrary.wiley.com/doi/abs/10.15252/msb.20178124)

## Some factors are shared while others are not

\centering

![Variance explained in each data set by each LF](graphics/presentation/MOFA-varExplained-matrix-A-CROP.png){ height=70% }

## 3 factors are shared across all 4 data sets

\centering

![LFs 1, 4, and 5 explain variation in all 4 data sets](graphics/presentation/MOFA-varExplained-matrix-B-CROP.png){ height=70% }

## MOFA LF5 captures convergence pattern

![LF1 & LF4: time point effect; LF5: convergence](graphics/presentation/MOFA-LF-scatter-wide.png)

<!-- { height=70% } -->

## Last question

\centering \LARGE

What can these histone marks tell us about T-cell activation and
differentiation?

## Answer: Epigenetic convergence between naïve and memory!

* Almost no differential histone modification observed between naïve and
  memory at Day 14, despite plenty of differential modification at
  earlier time points.
* Expression and 3 histone marks all show "convergence" between naïve
  and memory by Day 14 in the first 2 or 3 principal coordinates.
* MOFA captures this convergence pattern in a single latent factor,
  indicating that this is a shared pattern across all 4 data sets.

<!-- ## Slide -->

<!-- ![(Insert figure legend)](graphics/CD4-csaw/LaMere2016_fig8.pdf) -->


## Questions to focus on

### How do we define the "promoter region" for each gene?

Define empirically using peak-to-promoter distances; validate by
correlation with expression.

. . .

### How do these histone marks behave in promoter regions?

Location matters! Specific coverage patterns correlated with elevated
expression.

. . .

### What can we learn about T-cell activation and differentiation?

Epigenetic & expression state of naïve and memory converges late after
activation, consistent with naïve differentiation into memory.

## Further conclusions & future directions

* "Effective promoter region" is a valid concept but "radius"
  oversimplifies: seek a better definition

* Coverage profiles were only examined in naïve day 0 samples: further
  analysis could incorporate time and cell type

* Coverage profile normalization induces degeneracy: adapt a better
  normalization from peak callers like SICER

* Unimodal distribution of promoter coverage profiles is unexpected

## Further conclusions & future directions

* Experiment was not designed to directly test the epigenetic
  convergence hypothesis: future experiments could include cultured
  but un-activated controls

* High correlation between H3K4me3 and H3K4me2 is curious given they
  are mutually exclusive: design experiments to determine the degree
  of actual co-occurrence
  
## Implications for transplant biology

::: incremental

* Epigenetic regulation through histone methylation is surely involved
  in immune memory

* Can we stop memory cells from forming by perturbing histone
  methylation?

* Can we disrupt memory cell function during rejection by perturbing
  histone methylation?

* Can we suggest druggable targets for better immune suppression by
  looking at epigenetically upregulated genes in memory cells?

:::

## Acknowledgements

* My mentors, past and present: Drs. Terry Gaasterland, Daniel
  Salomon, and Andrew Su

* My committee: Drs. Nicholas Schork, Ali Torkamani, Michael
  Petrascheck, and Luc Teyton.

* My many collaborators in the Salomon Lab

* The Scripps Genomics Core

* My parents, John & Chris Thompson

## {.plain}

\centering

\huge

Questions?