浏览代码

Add some more future directions

Ryan C. Thompson 5 年之前
父节点
当前提交
786a1cc6ef
共有 1 个文件被更改,包括 738 次插入665 次删除
  1. 738 665
      thesis.lyx

+ 738 - 665
thesis.lyx

@@ -6542,6 +6542,413 @@ Is this needed?
 \end_inset
 \end_inset
 
 
 
 
+\end_layout
+
+\begin_layout Section
+Future Directions
+\end_layout
+
+\begin_layout Standard
+The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
+ ways a preliminary study that suggests a multitude of new avenues of investigat
+ion.
+ Here we consider a selection of such avenues.
+\end_layout
+
+\begin_layout Subsection
+Improve on the idea of an effective promoter radius
+\end_layout
+
+\begin_layout Standard
+This study introduced the concept of an 
+\begin_inset Quotes eld
+\end_inset
+
+effective promoter radius
+\begin_inset Quotes erd
+\end_inset
+
+ specific to each histone mark based on distince from the TSS within which
+ an excess of peaks was called for that mark.
+ This concept was then used to guide further analyses throughout the study.
+ However, while the effective promoter radius was useful in those analyses,
+ it is both limited in theory and shown in practice to be a possible oversimplif
+ication.
+ First, the effective promoter radii used in this study were chosen based
+ on manual inspection of the TSS-to-peak distance distributions in Figure
+ 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:near-promoter-peak-enrich"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, selecting round numbers of analyst convenience (Table 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "tab:effective-promoter-radius"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+).
+ It would be better to define an algorithm that selects a more precise radius
+ based on the features of the graph.
+ One possible way to do this would be to randomly rearrange the called peaks
+ throughout the genome many (while preserving the distribution of peak widths)
+ and re-generate the same plot as in Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:near-promoter-peak-enrich"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+.
+ This would yield a better 
+\begin_inset Quotes eld
+\end_inset
+
+background
+\begin_inset Quotes erd
+\end_inset
+
+ distribution that demonstrates the degree of near-TSS enrichment that would
+ be expected by random chance.
+ The effective promoter radius could be defined as the point where the true
+ distribution diverges from the randomized background distribution.
+ 
+\end_layout
+
+\begin_layout Standard
+Furthermore, the above definition of effective promoter radius has the significa
+nt limitation of being based on the peak calling method.
+ It is thus very sensitive to the choice of peak caller and significance
+ threshold for calling peaks, as well as the degree of saturation in the
+ sequencing.
+ Calling peaks from ChIP-seq samples with insufficient coverage depth, with
+ the wrong peak caller, or with a different significance threshold could
+ give a drastically different number of called peaks, and hence a drastically
+ different distribution of peak-to-TSS distances.
+ To address this, it is desirable to develop a better method of determining
+ the effective promoter radius that relies only on the distribution of read
+ coverage around the TSS, independent of the peak calling.
+ Furthermore, as demonstrated by the upstream-downstream asymmetries observed
+ in Figures 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me2-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, and 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K27me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, this definition should determine a different radius for the upstream and
+ downstream directions.
+ At this point, it may be better to rename this concept 
+\begin_inset Quotes eld
+\end_inset
+
+effective promoter extent
+\begin_inset Quotes erd
+\end_inset
+
+ and avoid the word 
+\begin_inset Quotes eld
+\end_inset
+
+radius
+\begin_inset Quotes erd
+\end_inset
+
+, since a radius implies a symmetry about the TSS that is not supported
+ by the data.
+\end_layout
+
+\begin_layout Standard
+Beyond improving the definition of effective promoter extent, functional
+ validation is necessary to show that this measure of near-TSS enrichment
+ has biological meaning.
+ Figures 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me2-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+ and 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+ already provide a very limited functional validation of the chosen promoter
+ extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
+ this region are most strongly correlated with elevated gene expression.
+ However, there are other ways to show functional relevance of the promoter
+ extent.
+ For example, correlations could be computed between read counts in peaks
+ nearby gene promoters and the expression level of those genes, and these
+ correlations could be plotted against the distance of the peak upstream
+ or downstream of the gene's TSS.
+ If the promoter extent truly defines a 
+\begin_inset Quotes eld
+\end_inset
+
+sphere of influence
+\begin_inset Quotes erd
+\end_inset
+
+ within which a histone mark is involved with the regulation of a gene,
+ then the correlations for peaks within this extent should be significantly
+ higher than those further upstream or downstream.
+ Peaks within these extents may also be more likely to show differential
+ modification than those outside genic regions of the genome.
+\end_layout
+
+\begin_layout Subsection
+Design experiments to focus on post-activation convergence of naive & memory
+ cells
+\end_layout
+
+\begin_layout Standard
+In this study, a convergence between naive and memory cells was observed
+ in both the pattern of gene expression and in epigenetic state of the 3
+ histone marks studied, consistent with the hypothesis that any naive cells
+ remaining 14 days after activation have differentiated into memory cells,
+ and that both gene expression and these histone marks are involved in this
+ differentiation.
+ However, the current study was not designed with this specific hypothesis
+ in mind, and it therefore has some deficiencies with regard to testing
+ it.
+ The memory CD4 samples at day 14 do not resemble the memory samples at
+ day 0, indicating that in the specific model of activation used for this
+ experiment, the cells are not guaranteed to return to their original pre-activa
+tion state, or perhaps this process takes substantially longer than 14 days.
+ This is a challenge for the convergence hypothesis because the ideal comparison
+ to prove that naive cells are converging to a resting memory state would
+ be to compare the final naive time point to the Day 0 memory samples, but
+ this comparison is only meaningful if memory cells generally return to
+ the same 
+\begin_inset Quotes eld
+\end_inset
+
+resting
+\begin_inset Quotes erd
+\end_inset
+
+ state that they started at.
+\end_layout
+
+\begin_layout Standard
+To better study the convergence hypothesis, a new experiment should be designed
+ using a model system for T-cell activation that is known to allow cells
+ to return as closely as possible to their pre-activation state.
+ Alternatively, if it is not possible to find or design such a model system,
+ the same cell cultures could be activated serially multiple times, and
+ sequenced after each activation cycle right before the next activation.
+ It is likely that several activations in the same model system will settle
+ into a cylical pattern, converging to a consistent 
+\begin_inset Quotes eld
+\end_inset
+
+resting
+\begin_inset Quotes erd
+\end_inset
+
+ state after each activation, even if this state is different from the initial
+ resting state at Day 0.
+ If so, it will be possible to compare the final states of both naive and
+ memory cells to show that they converge despite different initial conditions.
+\end_layout
+
+\begin_layout Standard
+In addition, if naive-to-memory convergence is a general pattern, it should
+ also be detectable in other epigenetic marks, including other histone marks
+ and DNA methylation.
+ An experiment should be designed studying a large number of epigenetic
+ marks known or suspected to be involved in regulation of gene expression,
+ assaying all of these at the same pre- and post-activation time points.
+ Multi-dataset factor analysis methods like MOFA can then be used to identify
+ coordinated patterns of regulation shared across many epigenetic marks.
+ If possible, some 
+\begin_inset Quotes eld
+\end_inset
+
+negative control
+\begin_inset Quotes erd
+\end_inset
+
+ marks should be included that are known 
+\emph on
+not
+\emph default
+ to be involved in T-cell activation or memory formation.
+ Of course, CD4 T-cells are not the only adaptive immune cells with memory.
+ A similar study could be designed for CD8 T-cells, B-cells, and even specific
+ subsets of CD4 T-cells.
+\end_layout
+
+\begin_layout Subsection
+Follow up on hints of interesting patterns in promoter relative coverage
+ profiles
+\end_layout
+
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+I think I might need to write up the negative results for the Promoter CpG
+ and defined pattern analysis before writing this section.
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Itemize
+Also find better normalizations: maybe borrow from MACS/SICER background
+ correction methods?
+\end_layout
+
+\begin_layout Itemize
+For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
+ = peak position.
+ Then correlate with expression.
+\end_layout
+
+\begin_layout Itemize
+Current analysis only at Day 0.
+ Need to study across time points.
+\end_layout
+
+\begin_layout Itemize
+Integrating data across so many dimensions is a significant analysis challenge
+\end_layout
+
+\begin_layout Subsection
+Investigate causes of high correlation between mutually exclusive histone
+ marks
+\end_layout
+
+\begin_layout Standard
+The high correlation between coverage depth observed between H3K4me2 and
+ H3K4me3 is both expected and unexpected.
+ Since both marks are associated with elevated gene transcription, a positive
+ correlation between them is not surprising.
+ However, these two marks represent different post-translational modifications
+ of the 
+\emph on
+same
+\emph default
+ lysine residue on the histone H3 polypeptide, which means that they cannot
+ both be present on the same H3 subunit.
+ Thus, the high correlation between them has several potential explanations.
+ One possible reason is cell population heterogeneity: perhaps some genomic
+ loci are frequently marked with H3K4me2 in some cells, while in other cells
+ the same loci are marked with H3K4me3.
+ Another possibility is allele-specific modifications: the loci are marked
+ in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
+ allele.
+ Lastly, since each histone octamer contains 2 H3 subunits, it is possible
+ that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
+ represents a distinct epigenetic state with a different function than either
+ double H3K4me2 or double H3K4me3.
+ 
+\end_layout
+
+\begin_layout Standard
+These three hypotheses could be disentangled by single-cell ChIP-seq.
+ If the correlation between these two histone marks persists even within
+ the reads for each individual cell, then cell population heterogeneity
+ cannot explain the correlation.
+ Allele-specific modification can be tested for by looking at the correlation
+ between read coverage of the two histone marks at heterozygous loci.
+ If the correlation between read counts for opposite loci is low, then this
+ is consistent with allele-specific modification.
+ Finally if the modifications do not separate by either cell or allele,
+ the colocation of these two marks is most likely occurring at the level
+ of individual histones, with the heterogenously modified histone representing
+ a distinct state.
+ 
+\end_layout
+
+\begin_layout Standard
+However, another experiment would be required to show direct evidence of
+ such a heterogeneously modified state.
+ Specifically a 
+\begin_inset Quotes eld
+\end_inset
+
+double ChIP
+\begin_inset Quotes erd
+\end_inset
+
+ experiment would need to be performed, where the input DNA is first subjected
+ to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
+ then the enriched material is collected, with proteins still bound, and
+ immunoprecipitated 
+\emph on
+again
+\emph default
+ using the anti-H3K4me3 antibody.
+ If this yields significant numbers of non-artifactual reads in the same
+ regions as the individual pulldowns of the two marks, this is strong evidence
+ that the two marks are occurring on opposite H3 subunits of the same histones.
+\end_layout
+
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+Try to see if double ChIP-seq is actually feasible, and if not, come up
+ with some other idea for directly detecting the mixed mod state.
+ Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
+on.
+ That's one possible angle.
+\end_layout
+
+\end_inset
+
+
 \end_layout
 \end_layout
 
 
 \begin_layout Chapter
 \begin_layout Chapter
@@ -11223,7 +11630,7 @@ researcher degree of freedom
  on the choice of batch size based on vague selection criteria and instinct,
  on the choice of batch size based on vague selection criteria and instinct,
  which can unintentionally inproduce bias if the researcher chooses a batch
  which can unintentionally inproduce bias if the researcher chooses a batch
  size based on what seems to yield the most favorable downstream results
  size based on what seems to yield the most favorable downstream results
-  
+ 
 \begin_inset CommandInset citation
 \begin_inset CommandInset citation
 LatexCommand cite
 LatexCommand cite
 key "Simmons2011"
 key "Simmons2011"
@@ -11278,6 +11685,26 @@ noprefix "false"
  parameter's estimation.
  parameter's estimation.
 \end_layout
 \end_layout
 
 
+\begin_layout Subsection
+methyl array stuff
+\end_layout
+
+\begin_layout Standard
+The current study has showed that DNA methylation, as assayed by Illumina
+ 450k methylation arrays, has some potential for diagnosing transplant dysfuncti
+ons, including rejection.
+\end_layout
+
+\begin_layout Itemize
+Eliminate the need for SVA, since it can't be applied in ML context.
+ 
+\end_layout
+
+\begin_layout Itemize
+Alternatively, use SVA to identify and discard probes with strong SV association
+s prior to training.
+\end_layout
+
 \begin_layout Chapter
 \begin_layout Chapter
 Globin-blocking for more effective blood RNA-seq analysis in primate animal
 Globin-blocking for more effective blood RNA-seq analysis in primate animal
  model
  model
@@ -13229,188 +13656,12 @@ Globin-Blocking
 \begin_layout Plain Layout
 \begin_layout Plain Layout
 
 
 \series bold
 \series bold
-Up
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-231
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-515
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-2
-\end_layout
-
-\end_inset
-</cell>
-</row>
-<row>
-<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\series bold
-NS
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-160
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-11235
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-136
-\end_layout
-
-\end_inset
-</cell>
-</row>
-<row>
-<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\series bold
-Down
+Up
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
 </cell>
 </cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
 \begin_inset Text
 \begin_inset Text
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
@@ -13427,12 +13678,12 @@ Down
 \uwave off
 \uwave off
 \noun off
 \noun off
 \color none
 \color none
-0
+231
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
 </cell>
 </cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
 \begin_inset Text
 \begin_inset Text
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
@@ -13449,12 +13700,12 @@ Down
 \uwave off
 \uwave off
 \noun off
 \noun off
 \color none
 \color none
-548
+515
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
 </cell>
 </cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
 \begin_inset Text
 \begin_inset Text
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
@@ -13471,575 +13722,411 @@ Down
 \uwave off
 \uwave off
 \noun off
 \noun off
 \color none
 \color none
-127
+2
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
 </cell>
 </cell>
 </row>
 </row>
-</lyxtabular>
-
-\end_inset
-
-
-\end_layout
-
-\begin_layout Plain Layout
-\begin_inset Caption Standard
-
-\begin_layout Plain Layout
-
-\series bold
-\begin_inset Argument 1
-status open
+<row>
+<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
-Comparison of significantly differentially expressed genes with and without
- globin blocking.
-\end_layout
-
-\end_inset
-
-
-\begin_inset CommandInset label
-LatexCommand label
-name "tab:Comparison-of-significant"
-
-\end_inset
-
-Comparison of significantly differentially expressed genes with and without
- globin blocking.
 
 
-\series default
- Up, Down: Genes significantly up/down-regulated in post-transplant samples
- relative to pre-transplant samples, with a false discovery rate of 10%
- or less.
- NS: Non-significant genes (false discovery rate greater than 10%).
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
-
-
-\end_layout
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
 
 
+\series bold
+NS
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
+\begin_layout Plain Layout
 
 
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+160
 \end_layout
 \end_layout
 
 
-\begin_layout Standard
-To compare performance on differential gene expression tests, we took subsets
- of both the GB and non-GB libraries with exactly one pre-transplant and
- one post-transplant sample for each animal that had paired samples available
- for analysis (N=7 animals, N=14 samples in each subset).
- The same test for pre- vs.
- post-transplant differential gene expression was performed on the same
- 7 pairs of samples from GB libraries and non-GB libraries, in each case
- using an FDR of 10% as the threshold of significance.
- Out of 12954 genes that passed the detection threshold in both subsets,
- 358 were called significantly differentially expressed in the same direction
- in both sets; 1063 were differentially expressed in the GB set only; 296
- were differentially expressed in the non-GB set only; 2 genes were called
- significantly up in the GB set but significantly down in the non-GB set;
- and the remaining 11235 were not called differentially expressed in either
- set.
- These data are summarized in Table 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "tab:Comparison-of-significant"
-plural "false"
-caps "false"
-noprefix "false"
-
-\end_inset
-
-.
- The differences in BCV calculated by EdgeR for these subsets of samples
- were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
-\end_layout
-
-\begin_layout Standard
-The key point is that the GB data results in substantially more differentially
- expressed calls than the non-GB data.
- Since there is no gold standard for this dataset, it is impossible to be
- certain whether this is due to under-calling of differential expression
- in the non-GB samples or over-calling in the GB samples.
- However, given that both datasets are derived from the same biological
- samples and have nearly equal BCVs, it is more likely that the larger number
- of DE calls in the GB samples are genuine detections that were enabled
- by the higher sequencing depth and measurement precision of the GB samples.
- Note that the same set of genes was considered in both subsets, so the
- larger number of differentially expressed gene calls in the GB data set
- reflects a greater sensitivity to detect significant differential gene
- expression and not simply the larger total number of detected genes in
- GB samples described earlier.
-\end_layout
-
-\begin_layout Section
-Discussion
-\end_layout
-
-\begin_layout Standard
-The original experience with whole blood gene expression profiling on DNA
- microarrays demonstrated that the high concentration of globin transcripts
- reduced the sensitivity to detect genes with relatively low expression
- levels, in effect, significantly reducing the sensitivity.
- To address this limitation, commercial protocols for globin reduction were
- developed based on strategies to block globin transcript amplification
- during labeling or physically removing globin transcripts by affinity bead
- methods 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "Winn2010"
-literal "false"
-
-\end_inset
-
-.
- More recently, using the latest generation of labeling protocols and arrays,
- it was determined that globin reduction was no longer necessary to obtain
- sufficient sensitivity to detect differential transcript expression 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "NuGEN2010"
-literal "false"
-
-\end_inset
-
-.
- However, we are not aware of any publications using these currently available
- protocols the with latest generation of microarrays that actually compare
- the detection sensitivity with and without globin reduction.
- However, in practice this has now been adopted generally primarily driven
- by concerns for cost control.
- The main objective of our work was to directly test the impact of globin
- gene transcripts and a new globin blocking protocol for application to
- the newest generation of differential gene expression profiling determined
- using next generation sequencing.
- 
-\end_layout
-
-\begin_layout Standard
-The challenge of doing global gene expression profiling in cynomolgus monkeys
- is that the current available arrays were never designed to comprehensively
- cover this genome and have not been updated since the first assemblies
- of the cynomolgus genome were published.
- Therefore, we determined that the best strategy for peripheral blood profiling
- was to do deep RNA-seq and inform the workflow using the latest available
- genome assembly and annotation 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "Wilson2013"
-literal "false"
-
-\end_inset
-
-.
- However, it was not immediately clear whether globin reduction was necessary
- for RNA-seq or how much improvement in efficiency or sensitivity to detect
- differential gene expression would be achieved for the added cost and work.
- 
-\end_layout
-
-\begin_layout Standard
-We only found one report that demonstrated that globin reduction significantly
- improved the effective read yields for sequencing of human peripheral blood
- cell RNA using a DeepSAGE protocol 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "Mastrokolias2012"
-literal "false"
-
 \end_inset
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
-.
- The approach to DeepSAGE involves two different restriction enzymes that
- purify and then tag small fragments of transcripts at specific locations
- and thus, significantly reduces the complexity of the transcriptome.
- Therefore, we could not determine how DeepSAGE results would translate
- to the common strategy in the field for assaying the entire transcript
- population by whole-transcriptome 3’-end RNA-seq.
- Furthermore, if globin reduction is necessary, we also needed a globin
- reduction method specific to cynomolgus globin sequences that would work
- an organism for which no kit is available off the shelf.
-\end_layout
-
-\begin_layout Standard
-As mentioned above, the addition of globin blocking oligos has a very small
- impact on measured expression levels of gene expression.
- However, this is a non-issue for the purposes of differential expression
- testing, since a systematic change in a gene in all samples does not affect
- relative expression levels between samples.
- However, we must acknowledge that simple comparisons of gene expression
- data obtained by GB and non-GB protocols are not possible without additional
- normalization.
- 
-\end_layout
-
-\begin_layout Standard
-More importantly, globin blocking not only nearly doubles the yield of usable
- reads, it also increases inter-sample correlation and sensitivity to detect
- differential gene expression relative to the same set of samples profiled
- without blocking.
- In addition, globin blocking does not add a significant amount of random
- noise to the data.
- Globin blocking thus represents a cost-effective way to squeeze more data
- and statistical power out of the same blood samples and the same amount
- of sequencing.
- In conclusion, globin reduction greatly increases the yield of useful RNA-seq
- reads mapping to the rest of the genome, with minimal perturbations in
- the relative levels of non-globin genes.
- Based on these results, globin transcript reduction using sequence-specific,
- complementary blocking oligonucleotides is recommended for all deep RNA-seq
- of cynomolgus and other nonhuman primate blood samples.
-\end_layout
+\begin_layout Plain Layout
 
 
-\begin_layout Chapter
-Future Directions
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+11235
 \end_layout
 \end_layout
 
 
-\begin_layout Standard
-\begin_inset Flex TODO Note (inline)
-status open
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
-Consider putting each chapter's future directions with that chapter instead
- of in a separate one.
- Check instructions to see if this is allowed/appropriate.
+
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+136
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
+</cell>
+</row>
+<row>
+<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
+\begin_layout Plain Layout
 
 
 \end_layout
 \end_layout
 
 
-\begin_layout Section*
-Ch2
-\end_layout
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
-\begin_layout Standard
-The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
- ways a preliminary study that suggests a multitude of new avenues of investigat
-ion.
- Here we consider a selection of such avenues.
-\end_layout
+\begin_layout Plain Layout
 
 
-\begin_layout Subsection*
-Improving on the effective promoter radius
+\series bold
+Down
 \end_layout
 \end_layout
 
 
-\begin_layout Standard
-This study introduced the concept of an 
-\begin_inset Quotes eld
 \end_inset
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
-effective promoter radius
-\begin_inset Quotes erd
-\end_inset
+\begin_layout Plain Layout
 
 
- specific to each histone mark based on distince from the TSS within which
- an excess of peaks was called for that mark.
- This concept was then used to guide further analyses throughout the study.
- However, while the effective promoter radius was useful in those analyses,
- it is both limited in theory and shown in practice to be a possible oversimplif
-ication.
- First, the effective promoter radii used in this study were chosen based
- on manual inspection of the TSS-to-peak distance distributions in Figure
- 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:near-promoter-peak-enrich"
-plural "false"
-caps "false"
-noprefix "false"
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+0
+\end_layout
 
 
 \end_inset
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 
-, selecting round numbers of analyst convenience (Table 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "tab:effective-promoter-radius"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
+
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+548
+\end_layout
 
 
 \end_inset
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
 
 
-).
- It would be better to define an algorithm that selects a more precise radius
- based on the features of the graph.
- One possible way to do this would be to randomly rearrange the called peaks
- throughout the genome many (while preserving the distribution of peak widths)
- and re-generate the same plot as in Figure 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:near-promoter-peak-enrich"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
 
 
-\end_inset
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+127
+\end_layout
 
 
-.
- This would yield a better 
-\begin_inset Quotes eld
 \end_inset
 \end_inset
+</cell>
+</row>
+</lyxtabular>
 
 
-background
-\begin_inset Quotes erd
 \end_inset
 \end_inset
 
 
- distribution that demonstrates the degree of near-TSS enrichment that would
- be expected by random chance.
- The effective promoter radius could be defined as the point where the true
- distribution diverges from the randomized background distribution.
- 
+
 \end_layout
 \end_layout
 
 
-\begin_layout Standard
-Furthermore, the above definition of effective promoter radius has the significa
-nt limitation of being based on the peak calling method.
- It is thus very sensitive to the choice of peak caller and significance
- threshold for calling peaks, as well as the degree of saturation in the
- sequencing.
- Calling peaks from ChIP-seq samples with insufficient coverage depth, with
- the wrong peak caller, or with a different significance threshold could
- give a drastically different number of called peaks, and hence a drastically
- different distribution of peak-to-TSS distances.
- To address this, it is desirable to develop a better method of determining
- the effective promoter radius that relies only on the distribution of read
- coverage around the TSS, independent of the peak calling.
- Furthermore, as demonstrated by the upstream-downstream asymmetries observed
- in Figures 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K4me2-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
+\begin_inset Caption Standard
 
 
-\end_inset
+\begin_layout Plain Layout
 
 
-, 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K4me3-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
+\series bold
+\begin_inset Argument 1
+status open
+
+\begin_layout Plain Layout
+Comparison of significantly differentially expressed genes with and without
+ globin blocking.
+\end_layout
 
 
 \end_inset
 \end_inset
 
 
-, and 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K27me3-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
 
 
-\end_inset
+\begin_inset CommandInset label
+LatexCommand label
+name "tab:Comparison-of-significant"
 
 
-, this definition should determine a different radius for the upstream and
- downstream directions.
- At this point, it may be better to rename this concept 
-\begin_inset Quotes eld
 \end_inset
 \end_inset
 
 
-effective promoter extent
-\begin_inset Quotes erd
-\end_inset
+Comparison of significantly differentially expressed genes with and without
+ globin blocking.
 
 
- and avoid the word 
-\begin_inset Quotes eld
-\end_inset
+\series default
+ Up, Down: Genes significantly up/down-regulated in post-transplant samples
+ relative to pre-transplant samples, with a false discovery rate of 10%
+ or less.
+ NS: Non-significant genes (false discovery rate greater than 10%).
+\end_layout
 
 
-radius
-\begin_inset Quotes erd
 \end_inset
 \end_inset
 
 
-, since a radius implies a symmetry about the TSS that is not supported
- by the data.
+
 \end_layout
 \end_layout
 
 
-\begin_layout Standard
-Beyond improving the definition of effective promoter extent, functional
- validation is necessary to show that this measure of near-TSS enrichment
- has biological meaning.
- Figures 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K4me2-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
+
+\end_layout
 
 
 \end_inset
 \end_inset
 
 
- and 
+
+\end_layout
+
+\begin_layout Standard
+To compare performance on differential gene expression tests, we took subsets
+ of both the GB and non-GB libraries with exactly one pre-transplant and
+ one post-transplant sample for each animal that had paired samples available
+ for analysis (N=7 animals, N=14 samples in each subset).
+ The same test for pre- vs.
+ post-transplant differential gene expression was performed on the same
+ 7 pairs of samples from GB libraries and non-GB libraries, in each case
+ using an FDR of 10% as the threshold of significance.
+ Out of 12954 genes that passed the detection threshold in both subsets,
+ 358 were called significantly differentially expressed in the same direction
+ in both sets; 1063 were differentially expressed in the GB set only; 296
+ were differentially expressed in the non-GB set only; 2 genes were called
+ significantly up in the GB set but significantly down in the non-GB set;
+ and the remaining 11235 were not called differentially expressed in either
+ set.
+ These data are summarized in Table 
 \begin_inset CommandInset ref
 \begin_inset CommandInset ref
 LatexCommand ref
 LatexCommand ref
-reference "fig:H3K4me3-neighborhood"
+reference "tab:Comparison-of-significant"
 plural "false"
 plural "false"
 caps "false"
 caps "false"
 noprefix "false"
 noprefix "false"
 
 
 \end_inset
 \end_inset
 
 
- already provide a very limited functional validation of the chosen promoter
- extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
- this region are most strongly correlated with elevated gene expression.
- However, there are other ways to show functional relevance of the promoter
- extent.
- For example, correlations could be computed between read counts in peaks
- nearby gene promoters and the expression level of those genes, and these
- correlations could be plotted against the distance of the peak upstream
- or downstream of the gene's TSS.
- If the promoter extent truly defines a 
-\begin_inset Quotes eld
-\end_inset
-
-sphere of influence
-\begin_inset Quotes erd
-\end_inset
+.
+ The differences in BCV calculated by EdgeR for these subsets of samples
+ were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
+\end_layout
 
 
- within which a histone mark is involved with the regulation of a gene,
- then the correlations for peaks within this extent should be significantly
- higher than those further upstream or downstream.
- Peaks within these extents may also be more likely to show differential
- modification than those outside genic regions of the genome.
+\begin_layout Standard
+The key point is that the GB data results in substantially more differentially
+ expressed calls than the non-GB data.
+ Since there is no gold standard for this dataset, it is impossible to be
+ certain whether this is due to under-calling of differential expression
+ in the non-GB samples or over-calling in the GB samples.
+ However, given that both datasets are derived from the same biological
+ samples and have nearly equal BCVs, it is more likely that the larger number
+ of DE calls in the GB samples are genuine detections that were enabled
+ by the higher sequencing depth and measurement precision of the GB samples.
+ Note that the same set of genes was considered in both subsets, so the
+ larger number of differentially expressed gene calls in the GB data set
+ reflects a greater sensitivity to detect significant differential gene
+ expression and not simply the larger total number of detected genes in
+ GB samples described earlier.
 \end_layout
 \end_layout
 
 
-\begin_layout Subsection*
-Post-activation convergence of naive & memory cells
+\begin_layout Section
+Discussion
 \end_layout
 \end_layout
 
 
 \begin_layout Standard
 \begin_layout Standard
-In this study, a convergence between naive and memory cells was observed
- in both the pattern of gene expression and in epigenetic state of the 3
- histone marks studied.
-\end_layout
+The original experience with whole blood gene expression profiling on DNA
+ microarrays demonstrated that the high concentration of globin transcripts
+ reduced the sensitivity to detect genes with relatively low expression
+ levels, in effect, significantly reducing the sensitivity.
+ To address this limitation, commercial protocols for globin reduction were
+ developed based on strategies to block globin transcript amplification
+ during labeling or physically removing globin transcripts by affinity bead
+ methods 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Winn2010"
+literal "false"
 
 
-\begin_layout Itemize
-N-to-M convergence deserves further study of some kind
-\end_layout
+\end_inset
 
 
-\begin_deeper
-\begin_layout Itemize
-maybe serial activation & rest cycles for naive and memory, showing a cyclical
- pattern returning to the same state again and again after the first activation
-\end_layout
+.
+ More recently, using the latest generation of labeling protocols and arrays,
+ it was determined that globin reduction was no longer necessary to obtain
+ sufficient sensitivity to detect differential transcript expression 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "NuGEN2010"
+literal "false"
 
 
-\end_deeper
-\begin_layout Itemize
-Study other epigenetic marks in more contexts, including looking for similar
- convergence patterns.
- Use MOFA to identify coordinated patterns.
-\end_layout
+\end_inset
 
 
-\begin_deeper
-\begin_layout Itemize
-DNA methylation, histone marks, chromatin accessibility & conformation in
- CD4 T-cells
+.
+ However, we are not aware of any publications using these currently available
+ protocols the with latest generation of microarrays that actually compare
+ the detection sensitivity with and without globin reduction.
+ However, in practice this has now been adopted generally primarily driven
+ by concerns for cost control.
+ The main objective of our work was to directly test the impact of globin
+ gene transcripts and a new globin blocking protocol for application to
+ the newest generation of differential gene expression profiling determined
+ using next generation sequencing.
+ 
 \end_layout
 \end_layout
 
 
-\begin_layout Itemize
-Also look at other types of lymphocytes: CD8 T-cells, B-cells, NK cells
-\end_layout
+\begin_layout Standard
+The challenge of doing global gene expression profiling in cynomolgus monkeys
+ is that the current available arrays were never designed to comprehensively
+ cover this genome and have not been updated since the first assemblies
+ of the cynomolgus genome were published.
+ Therefore, we determined that the best strategy for peripheral blood profiling
+ was to do deep RNA-seq and inform the workflow using the latest available
+ genome assembly and annotation 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Wilson2013"
+literal "false"
 
 
-\end_deeper
-\begin_layout Subsection*
-Promoter positional coverage: follow up on hints of interesting patterns
-\end_layout
+\end_inset
 
 
-\begin_layout Itemize
-Also find better normalizations: maybe borrow from MACS/SICER background
- correction methods?
+.
+ However, it was not immediately clear whether globin reduction was necessary
+ for RNA-seq or how much improvement in efficiency or sensitivity to detect
+ differential gene expression would be achieved for the added cost and work.
+ 
 \end_layout
 \end_layout
 
 
-\begin_layout Itemize
-For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
- = peak position.
- Then correlate with expression.
-\end_layout
+\begin_layout Standard
+We only found one report that demonstrated that globin reduction significantly
+ improved the effective read yields for sequencing of human peripheral blood
+ cell RNA using a DeepSAGE protocol 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Mastrokolias2012"
+literal "false"
 
 
-\begin_layout Itemize
-Current analysis only at Day 0.
- Need to study across time points.
-\end_layout
+\end_inset
 
 
-\begin_layout Subsection*
-H3K4me correlation
+.
+ The approach to DeepSAGE involves two different restriction enzymes that
+ purify and then tag small fragments of transcripts at specific locations
+ and thus, significantly reduces the complexity of the transcriptome.
+ Therefore, we could not determine how DeepSAGE results would translate
+ to the common strategy in the field for assaying the entire transcript
+ population by whole-transcriptome 3’-end RNA-seq.
+ Furthermore, if globin reduction is necessary, we also needed a globin
+ reduction method specific to cynomolgus globin sequences that would work
+ an organism for which no kit is available off the shelf.
 \end_layout
 \end_layout
 
 
 \begin_layout Standard
 \begin_layout Standard
-The high correlation between coverage depth observed between H3K4me2 and
- H3K4me3 is both expected and unexpected.
- Since both marks are associated with elevated gene transcription, a positive
- correlation between them is not surprising.
- However, these two marks represent different post-translational modifications
- of the 
-\emph on
-same
-\emph default
- lysine residue on the histone H3 polypeptide, which means that they cannot
- both be present on the same H3 subunit.
- Thus, the high correlation between them has several potential explanations.
- One possible reason is cell population heterogeneity: perhaps some genomic
- loci are frequently marked with H3K4me2 in some cells, while in other cells
- the same loci are marked with H3K4me3.
- Another possibility is allele-specific modifications: the loci are marked
- in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
- allele.
- Lastly, since each histone octamer contains 2 H3 subunits, it is possible
- that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
- represents a distinct epigenetic state with a different function than either
- double H3K4me2 or double H3K4me3.
+As mentioned above, the addition of globin blocking oligos has a very small
+ impact on measured expression levels of gene expression.
+ However, this is a non-issue for the purposes of differential expression
+ testing, since a systematic change in a gene in all samples does not affect
+ relative expression levels between samples.
+ However, we must acknowledge that simple comparisons of gene expression
+ data obtained by GB and non-GB protocols are not possible without additional
+ normalization.
  
  
 \end_layout
 \end_layout
 
 
 \begin_layout Standard
 \begin_layout Standard
-These three hypotheses could be disentangled by single-cell ChIP-seq.
- If the correlation between these two histone marks persists even within
- the reads for each individual cell, then cell population heterogeneity
- cannot explain the correlation.
- Allele-specific modification can be tested for by looking at the correlation
- between read coverage of the two histone marks at heterozygous loci.
- If the correlation between read counts for opposite loci is low, then this
- is consistent with allele-specific modification.
- Finally if the modifications do not separate by either cell or allele,
- the colocation of these two marks is most likely occurring at the level
- of individual histones, with the heterogenously modified histone representing
- a distinct state.
- 
+More importantly, globin blocking not only nearly doubles the yield of usable
+ reads, it also increases inter-sample correlation and sensitivity to detect
+ differential gene expression relative to the same set of samples profiled
+ without blocking.
+ In addition, globin blocking does not add a significant amount of random
+ noise to the data.
+ Globin blocking thus represents a cost-effective way to squeeze more data
+ and statistical power out of the same blood samples and the same amount
+ of sequencing.
+ In conclusion, globin reduction greatly increases the yield of useful RNA-seq
+ reads mapping to the rest of the genome, with minimal perturbations in
+ the relative levels of non-globin genes.
+ Based on these results, globin transcript reduction using sequence-specific,
+ complementary blocking oligonucleotides is recommended for all deep RNA-seq
+ of cynomolgus and other nonhuman primate blood samples.
 \end_layout
 \end_layout
 
 
-\begin_layout Standard
-However, another experiment would be required to show direct evidence of
- such a heterogeneously modified state.
- Specifically a 
-\begin_inset Quotes eld
-\end_inset
-
-double ChIP
-\begin_inset Quotes erd
-\end_inset
-
- experiment would need to be performed, where the input DNA is first subjected
- to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
- then the enriched material is collected, with proteins still bound, and
- immunoprecipitated 
-\emph on
-again
-\emph default
- using the anti-H3K4me3 antibody.
- If this yields significant numbers of non-artifactual reads in the same
- regions as the individual pulldowns of the two marks, this is strong evidence
- that the two marks are occurring on opposite H3 subunits of the same histones.
+\begin_layout Section
+Future Directions
 \end_layout
 \end_layout
 
 
 \begin_layout Standard
 \begin_layout Standard
@@ -14047,11 +14134,9 @@ again
 status open
 status open
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
-Try to see if double ChIP-seq is actually feasible, and if not, come up
- with some other idea for directly detecting the mixed mod state.
- Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
-on.
- That's one possible angle.
+I've already done a good bit of work outside just this globin blocking thing,
+ so I'm not sure what to put for future directions.
+ Does it inculde the other stuff I've done but not published?
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset
@@ -14059,20 +14144,8 @@ on.
 
 
 \end_layout
 \end_layout
 
 
-\begin_layout Section*
-Ch3
-\end_layout
-
-\begin_layout Itemize
-Use CV or bootstrap to better evaluate classifiers
-\end_layout
-
-\begin_layout Itemize
-fRMAtools could be adapted to not require equal-sized groups
-\end_layout
-
-\begin_layout Section*
-Ch4
+\begin_layout Chapter
+Future Directions
 \end_layout
 \end_layout
 
 
 \begin_layout Standard
 \begin_layout Standard
@@ -14080,9 +14153,9 @@ Ch4
 status open
 status open
 
 
 \begin_layout Plain Layout
 \begin_layout Plain Layout
-I've already done a good bit of work outside just this globin blocking thing,
- so I'm not sure what to put for future directions.
- Does it inculde the other stuff I've done but not published?
+If there are any chapter-independent future directions, put them here.
+ Otherwise, delete this section.
+ Check in the directions if this is OK.
 \end_layout
 \end_layout
 
 
 \end_inset
 \end_inset