浏览代码

Add some more future directions

Ryan C. Thompson 5 年之前
父节点
当前提交
786a1cc6ef
共有 1 个文件被更改,包括 738 次插入665 次删除
  1. 738 665
      thesis.lyx

+ 738 - 665
thesis.lyx

@@ -6542,6 +6542,413 @@ Is this needed?
 \end_inset
 
 
+\end_layout
+
+\begin_layout Section
+Future Directions
+\end_layout
+
+\begin_layout Standard
+The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
+ ways a preliminary study that suggests a multitude of new avenues of investigat
+ion.
+ Here we consider a selection of such avenues.
+\end_layout
+
+\begin_layout Subsection
+Improve on the idea of an effective promoter radius
+\end_layout
+
+\begin_layout Standard
+This study introduced the concept of an 
+\begin_inset Quotes eld
+\end_inset
+
+effective promoter radius
+\begin_inset Quotes erd
+\end_inset
+
+ specific to each histone mark based on distince from the TSS within which
+ an excess of peaks was called for that mark.
+ This concept was then used to guide further analyses throughout the study.
+ However, while the effective promoter radius was useful in those analyses,
+ it is both limited in theory and shown in practice to be a possible oversimplif
+ication.
+ First, the effective promoter radii used in this study were chosen based
+ on manual inspection of the TSS-to-peak distance distributions in Figure
+ 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:near-promoter-peak-enrich"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, selecting round numbers of analyst convenience (Table 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "tab:effective-promoter-radius"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+).
+ It would be better to define an algorithm that selects a more precise radius
+ based on the features of the graph.
+ One possible way to do this would be to randomly rearrange the called peaks
+ throughout the genome many (while preserving the distribution of peak widths)
+ and re-generate the same plot as in Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:near-promoter-peak-enrich"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+.
+ This would yield a better 
+\begin_inset Quotes eld
+\end_inset
+
+background
+\begin_inset Quotes erd
+\end_inset
+
+ distribution that demonstrates the degree of near-TSS enrichment that would
+ be expected by random chance.
+ The effective promoter radius could be defined as the point where the true
+ distribution diverges from the randomized background distribution.
+ 
+\end_layout
+
+\begin_layout Standard
+Furthermore, the above definition of effective promoter radius has the significa
+nt limitation of being based on the peak calling method.
+ It is thus very sensitive to the choice of peak caller and significance
+ threshold for calling peaks, as well as the degree of saturation in the
+ sequencing.
+ Calling peaks from ChIP-seq samples with insufficient coverage depth, with
+ the wrong peak caller, or with a different significance threshold could
+ give a drastically different number of called peaks, and hence a drastically
+ different distribution of peak-to-TSS distances.
+ To address this, it is desirable to develop a better method of determining
+ the effective promoter radius that relies only on the distribution of read
+ coverage around the TSS, independent of the peak calling.
+ Furthermore, as demonstrated by the upstream-downstream asymmetries observed
+ in Figures 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me2-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, and 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K27me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, this definition should determine a different radius for the upstream and
+ downstream directions.
+ At this point, it may be better to rename this concept 
+\begin_inset Quotes eld
+\end_inset
+
+effective promoter extent
+\begin_inset Quotes erd
+\end_inset
+
+ and avoid the word 
+\begin_inset Quotes eld
+\end_inset
+
+radius
+\begin_inset Quotes erd
+\end_inset
+
+, since a radius implies a symmetry about the TSS that is not supported
+ by the data.
+\end_layout
+
+\begin_layout Standard
+Beyond improving the definition of effective promoter extent, functional
+ validation is necessary to show that this measure of near-TSS enrichment
+ has biological meaning.
+ Figures 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me2-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+ and 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+ already provide a very limited functional validation of the chosen promoter
+ extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
+ this region are most strongly correlated with elevated gene expression.
+ However, there are other ways to show functional relevance of the promoter
+ extent.
+ For example, correlations could be computed between read counts in peaks
+ nearby gene promoters and the expression level of those genes, and these
+ correlations could be plotted against the distance of the peak upstream
+ or downstream of the gene's TSS.
+ If the promoter extent truly defines a 
+\begin_inset Quotes eld
+\end_inset
+
+sphere of influence
+\begin_inset Quotes erd
+\end_inset
+
+ within which a histone mark is involved with the regulation of a gene,
+ then the correlations for peaks within this extent should be significantly
+ higher than those further upstream or downstream.
+ Peaks within these extents may also be more likely to show differential
+ modification than those outside genic regions of the genome.
+\end_layout
+
+\begin_layout Subsection
+Design experiments to focus on post-activation convergence of naive & memory
+ cells
+\end_layout
+
+\begin_layout Standard
+In this study, a convergence between naive and memory cells was observed
+ in both the pattern of gene expression and in epigenetic state of the 3
+ histone marks studied, consistent with the hypothesis that any naive cells
+ remaining 14 days after activation have differentiated into memory cells,
+ and that both gene expression and these histone marks are involved in this
+ differentiation.
+ However, the current study was not designed with this specific hypothesis
+ in mind, and it therefore has some deficiencies with regard to testing
+ it.
+ The memory CD4 samples at day 14 do not resemble the memory samples at
+ day 0, indicating that in the specific model of activation used for this
+ experiment, the cells are not guaranteed to return to their original pre-activa
+tion state, or perhaps this process takes substantially longer than 14 days.
+ This is a challenge for the convergence hypothesis because the ideal comparison
+ to prove that naive cells are converging to a resting memory state would
+ be to compare the final naive time point to the Day 0 memory samples, but
+ this comparison is only meaningful if memory cells generally return to
+ the same 
+\begin_inset Quotes eld
+\end_inset
+
+resting
+\begin_inset Quotes erd
+\end_inset
+
+ state that they started at.
+\end_layout
+
+\begin_layout Standard
+To better study the convergence hypothesis, a new experiment should be designed
+ using a model system for T-cell activation that is known to allow cells
+ to return as closely as possible to their pre-activation state.
+ Alternatively, if it is not possible to find or design such a model system,
+ the same cell cultures could be activated serially multiple times, and
+ sequenced after each activation cycle right before the next activation.
+ It is likely that several activations in the same model system will settle
+ into a cylical pattern, converging to a consistent 
+\begin_inset Quotes eld
+\end_inset
+
+resting
+\begin_inset Quotes erd
+\end_inset
+
+ state after each activation, even if this state is different from the initial
+ resting state at Day 0.
+ If so, it will be possible to compare the final states of both naive and
+ memory cells to show that they converge despite different initial conditions.
+\end_layout
+
+\begin_layout Standard
+In addition, if naive-to-memory convergence is a general pattern, it should
+ also be detectable in other epigenetic marks, including other histone marks
+ and DNA methylation.
+ An experiment should be designed studying a large number of epigenetic
+ marks known or suspected to be involved in regulation of gene expression,
+ assaying all of these at the same pre- and post-activation time points.
+ Multi-dataset factor analysis methods like MOFA can then be used to identify
+ coordinated patterns of regulation shared across many epigenetic marks.
+ If possible, some 
+\begin_inset Quotes eld
+\end_inset
+
+negative control
+\begin_inset Quotes erd
+\end_inset
+
+ marks should be included that are known 
+\emph on
+not
+\emph default
+ to be involved in T-cell activation or memory formation.
+ Of course, CD4 T-cells are not the only adaptive immune cells with memory.
+ A similar study could be designed for CD8 T-cells, B-cells, and even specific
+ subsets of CD4 T-cells.
+\end_layout
+
+\begin_layout Subsection
+Follow up on hints of interesting patterns in promoter relative coverage
+ profiles
+\end_layout
+
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+I think I might need to write up the negative results for the Promoter CpG
+ and defined pattern analysis before writing this section.
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Itemize
+Also find better normalizations: maybe borrow from MACS/SICER background
+ correction methods?
+\end_layout
+
+\begin_layout Itemize
+For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
+ = peak position.
+ Then correlate with expression.
+\end_layout
+
+\begin_layout Itemize
+Current analysis only at Day 0.
+ Need to study across time points.
+\end_layout
+
+\begin_layout Itemize
+Integrating data across so many dimensions is a significant analysis challenge
+\end_layout
+
+\begin_layout Subsection
+Investigate causes of high correlation between mutually exclusive histone
+ marks
+\end_layout
+
+\begin_layout Standard
+The high correlation between coverage depth observed between H3K4me2 and
+ H3K4me3 is both expected and unexpected.
+ Since both marks are associated with elevated gene transcription, a positive
+ correlation between them is not surprising.
+ However, these two marks represent different post-translational modifications
+ of the 
+\emph on
+same
+\emph default
+ lysine residue on the histone H3 polypeptide, which means that they cannot
+ both be present on the same H3 subunit.
+ Thus, the high correlation between them has several potential explanations.
+ One possible reason is cell population heterogeneity: perhaps some genomic
+ loci are frequently marked with H3K4me2 in some cells, while in other cells
+ the same loci are marked with H3K4me3.
+ Another possibility is allele-specific modifications: the loci are marked
+ in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
+ allele.
+ Lastly, since each histone octamer contains 2 H3 subunits, it is possible
+ that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
+ represents a distinct epigenetic state with a different function than either
+ double H3K4me2 or double H3K4me3.
+ 
+\end_layout
+
+\begin_layout Standard
+These three hypotheses could be disentangled by single-cell ChIP-seq.
+ If the correlation between these two histone marks persists even within
+ the reads for each individual cell, then cell population heterogeneity
+ cannot explain the correlation.
+ Allele-specific modification can be tested for by looking at the correlation
+ between read coverage of the two histone marks at heterozygous loci.
+ If the correlation between read counts for opposite loci is low, then this
+ is consistent with allele-specific modification.
+ Finally if the modifications do not separate by either cell or allele,
+ the colocation of these two marks is most likely occurring at the level
+ of individual histones, with the heterogenously modified histone representing
+ a distinct state.
+ 
+\end_layout
+
+\begin_layout Standard
+However, another experiment would be required to show direct evidence of
+ such a heterogeneously modified state.
+ Specifically a 
+\begin_inset Quotes eld
+\end_inset
+
+double ChIP
+\begin_inset Quotes erd
+\end_inset
+
+ experiment would need to be performed, where the input DNA is first subjected
+ to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
+ then the enriched material is collected, with proteins still bound, and
+ immunoprecipitated 
+\emph on
+again
+\emph default
+ using the anti-H3K4me3 antibody.
+ If this yields significant numbers of non-artifactual reads in the same
+ regions as the individual pulldowns of the two marks, this is strong evidence
+ that the two marks are occurring on opposite H3 subunits of the same histones.
+\end_layout
+
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+Try to see if double ChIP-seq is actually feasible, and if not, come up
+ with some other idea for directly detecting the mixed mod state.
+ Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
+on.
+ That's one possible angle.
+\end_layout
+
+\end_inset
+
+
 \end_layout
 
 \begin_layout Chapter
@@ -11223,7 +11630,7 @@ researcher degree of freedom
  on the choice of batch size based on vague selection criteria and instinct,
  which can unintentionally inproduce bias if the researcher chooses a batch
  size based on what seems to yield the most favorable downstream results
-  
+ 
 \begin_inset CommandInset citation
 LatexCommand cite
 key "Simmons2011"
@@ -11278,6 +11685,26 @@ noprefix "false"
  parameter's estimation.
 \end_layout
 
+\begin_layout Subsection
+methyl array stuff
+\end_layout
+
+\begin_layout Standard
+The current study has showed that DNA methylation, as assayed by Illumina
+ 450k methylation arrays, has some potential for diagnosing transplant dysfuncti
+ons, including rejection.
+\end_layout
+
+\begin_layout Itemize
+Eliminate the need for SVA, since it can't be applied in ML context.
+ 
+\end_layout
+
+\begin_layout Itemize
+Alternatively, use SVA to identify and discard probes with strong SV association
+s prior to training.
+\end_layout
+
 \begin_layout Chapter
 Globin-blocking for more effective blood RNA-seq analysis in primate animal
  model
@@ -13229,188 +13656,12 @@ Globin-Blocking
 \begin_layout Plain Layout
 
 \series bold
-Up
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-231
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-515
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-2
-\end_layout
-
-\end_inset
-</cell>
-</row>
-<row>
-<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\series bold
-NS
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-160
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-11235
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\family roman
-\series medium
-\shape up
-\size normal
-\emph off
-\bar no
-\strikeout off
-\xout off
-\uuline off
-\uwave off
-\noun off
-\color none
-136
-\end_layout
-
-\end_inset
-</cell>
-</row>
-<row>
-<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\end_layout
-
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\begin_layout Plain Layout
-
-\series bold
-Down
+Up
 \end_layout
 
 \end_inset
 </cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
 \begin_inset Text
 
 \begin_layout Plain Layout
@@ -13427,12 +13678,12 @@ Down
 \uwave off
 \noun off
 \color none
-0
+231
 \end_layout
 
 \end_inset
 </cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
 \begin_inset Text
 
 \begin_layout Plain Layout
@@ -13449,12 +13700,12 @@ Down
 \uwave off
 \noun off
 \color none
-548
+515
 \end_layout
 
 \end_inset
 </cell>
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
 \begin_inset Text
 
 \begin_layout Plain Layout
@@ -13471,575 +13722,411 @@ Down
 \uwave off
 \noun off
 \color none
-127
+2
 \end_layout
 
 \end_inset
 </cell>
 </row>
-</lyxtabular>
-
-\end_inset
-
-
-\end_layout
-
-\begin_layout Plain Layout
-\begin_inset Caption Standard
-
-\begin_layout Plain Layout
-
-\series bold
-\begin_inset Argument 1
-status open
+<row>
+<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 \begin_layout Plain Layout
-Comparison of significantly differentially expressed genes with and without
- globin blocking.
-\end_layout
-
-\end_inset
-
-
-\begin_inset CommandInset label
-LatexCommand label
-name "tab:Comparison-of-significant"
-
-\end_inset
-
-Comparison of significantly differentially expressed genes with and without
- globin blocking.
 
-\series default
- Up, Down: Genes significantly up/down-regulated in post-transplant samples
- relative to pre-transplant samples, with a false discovery rate of 10%
- or less.
- NS: Non-significant genes (false discovery rate greater than 10%).
 \end_layout
 
 \end_inset
-
-
-\end_layout
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
 \begin_layout Plain Layout
 
+\series bold
+NS
 \end_layout
 
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
+\begin_layout Plain Layout
 
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+160
 \end_layout
 
-\begin_layout Standard
-To compare performance on differential gene expression tests, we took subsets
- of both the GB and non-GB libraries with exactly one pre-transplant and
- one post-transplant sample for each animal that had paired samples available
- for analysis (N=7 animals, N=14 samples in each subset).
- The same test for pre- vs.
- post-transplant differential gene expression was performed on the same
- 7 pairs of samples from GB libraries and non-GB libraries, in each case
- using an FDR of 10% as the threshold of significance.
- Out of 12954 genes that passed the detection threshold in both subsets,
- 358 were called significantly differentially expressed in the same direction
- in both sets; 1063 were differentially expressed in the GB set only; 296
- were differentially expressed in the non-GB set only; 2 genes were called
- significantly up in the GB set but significantly down in the non-GB set;
- and the remaining 11235 were not called differentially expressed in either
- set.
- These data are summarized in Table 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "tab:Comparison-of-significant"
-plural "false"
-caps "false"
-noprefix "false"
-
-\end_inset
-
-.
- The differences in BCV calculated by EdgeR for these subsets of samples
- were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
-\end_layout
-
-\begin_layout Standard
-The key point is that the GB data results in substantially more differentially
- expressed calls than the non-GB data.
- Since there is no gold standard for this dataset, it is impossible to be
- certain whether this is due to under-calling of differential expression
- in the non-GB samples or over-calling in the GB samples.
- However, given that both datasets are derived from the same biological
- samples and have nearly equal BCVs, it is more likely that the larger number
- of DE calls in the GB samples are genuine detections that were enabled
- by the higher sequencing depth and measurement precision of the GB samples.
- Note that the same set of genes was considered in both subsets, so the
- larger number of differentially expressed gene calls in the GB data set
- reflects a greater sensitivity to detect significant differential gene
- expression and not simply the larger total number of detected genes in
- GB samples described earlier.
-\end_layout
-
-\begin_layout Section
-Discussion
-\end_layout
-
-\begin_layout Standard
-The original experience with whole blood gene expression profiling on DNA
- microarrays demonstrated that the high concentration of globin transcripts
- reduced the sensitivity to detect genes with relatively low expression
- levels, in effect, significantly reducing the sensitivity.
- To address this limitation, commercial protocols for globin reduction were
- developed based on strategies to block globin transcript amplification
- during labeling or physically removing globin transcripts by affinity bead
- methods 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "Winn2010"
-literal "false"
-
-\end_inset
-
-.
- More recently, using the latest generation of labeling protocols and arrays,
- it was determined that globin reduction was no longer necessary to obtain
- sufficient sensitivity to detect differential transcript expression 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "NuGEN2010"
-literal "false"
-
-\end_inset
-
-.
- However, we are not aware of any publications using these currently available
- protocols the with latest generation of microarrays that actually compare
- the detection sensitivity with and without globin reduction.
- However, in practice this has now been adopted generally primarily driven
- by concerns for cost control.
- The main objective of our work was to directly test the impact of globin
- gene transcripts and a new globin blocking protocol for application to
- the newest generation of differential gene expression profiling determined
- using next generation sequencing.
- 
-\end_layout
-
-\begin_layout Standard
-The challenge of doing global gene expression profiling in cynomolgus monkeys
- is that the current available arrays were never designed to comprehensively
- cover this genome and have not been updated since the first assemblies
- of the cynomolgus genome were published.
- Therefore, we determined that the best strategy for peripheral blood profiling
- was to do deep RNA-seq and inform the workflow using the latest available
- genome assembly and annotation 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "Wilson2013"
-literal "false"
-
-\end_inset
-
-.
- However, it was not immediately clear whether globin reduction was necessary
- for RNA-seq or how much improvement in efficiency or sensitivity to detect
- differential gene expression would be achieved for the added cost and work.
- 
-\end_layout
-
-\begin_layout Standard
-We only found one report that demonstrated that globin reduction significantly
- improved the effective read yields for sequencing of human peripheral blood
- cell RNA using a DeepSAGE protocol 
-\begin_inset CommandInset citation
-LatexCommand cite
-key "Mastrokolias2012"
-literal "false"
-
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
 
-.
- The approach to DeepSAGE involves two different restriction enzymes that
- purify and then tag small fragments of transcripts at specific locations
- and thus, significantly reduces the complexity of the transcriptome.
- Therefore, we could not determine how DeepSAGE results would translate
- to the common strategy in the field for assaying the entire transcript
- population by whole-transcriptome 3’-end RNA-seq.
- Furthermore, if globin reduction is necessary, we also needed a globin
- reduction method specific to cynomolgus globin sequences that would work
- an organism for which no kit is available off the shelf.
-\end_layout
-
-\begin_layout Standard
-As mentioned above, the addition of globin blocking oligos has a very small
- impact on measured expression levels of gene expression.
- However, this is a non-issue for the purposes of differential expression
- testing, since a systematic change in a gene in all samples does not affect
- relative expression levels between samples.
- However, we must acknowledge that simple comparisons of gene expression
- data obtained by GB and non-GB protocols are not possible without additional
- normalization.
- 
-\end_layout
-
-\begin_layout Standard
-More importantly, globin blocking not only nearly doubles the yield of usable
- reads, it also increases inter-sample correlation and sensitivity to detect
- differential gene expression relative to the same set of samples profiled
- without blocking.
- In addition, globin blocking does not add a significant amount of random
- noise to the data.
- Globin blocking thus represents a cost-effective way to squeeze more data
- and statistical power out of the same blood samples and the same amount
- of sequencing.
- In conclusion, globin reduction greatly increases the yield of useful RNA-seq
- reads mapping to the rest of the genome, with minimal perturbations in
- the relative levels of non-globin genes.
- Based on these results, globin transcript reduction using sequence-specific,
- complementary blocking oligonucleotides is recommended for all deep RNA-seq
- of cynomolgus and other nonhuman primate blood samples.
-\end_layout
+\begin_layout Plain Layout
 
-\begin_layout Chapter
-Future Directions
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+11235
 \end_layout
 
-\begin_layout Standard
-\begin_inset Flex TODO Note (inline)
-status open
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
 
 \begin_layout Plain Layout
-Consider putting each chapter's future directions with that chapter instead
- of in a separate one.
- Check instructions to see if this is allowed/appropriate.
+
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+136
 \end_layout
 
 \end_inset
+</cell>
+</row>
+<row>
+<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
+\begin_layout Plain Layout
 
 \end_layout
 
-\begin_layout Section*
-Ch2
-\end_layout
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
-\begin_layout Standard
-The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
- ways a preliminary study that suggests a multitude of new avenues of investigat
-ion.
- Here we consider a selection of such avenues.
-\end_layout
+\begin_layout Plain Layout
 
-\begin_layout Subsection*
-Improving on the effective promoter radius
+\series bold
+Down
 \end_layout
 
-\begin_layout Standard
-This study introduced the concept of an 
-\begin_inset Quotes eld
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
-effective promoter radius
-\begin_inset Quotes erd
-\end_inset
+\begin_layout Plain Layout
 
- specific to each histone mark based on distince from the TSS within which
- an excess of peaks was called for that mark.
- This concept was then used to guide further analyses throughout the study.
- However, while the effective promoter radius was useful in those analyses,
- it is both limited in theory and shown in practice to be a possible oversimplif
-ication.
- First, the effective promoter radii used in this study were chosen based
- on manual inspection of the TSS-to-peak distance distributions in Figure
- 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:near-promoter-peak-enrich"
-plural "false"
-caps "false"
-noprefix "false"
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+0
+\end_layout
 
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
 
-, selecting round numbers of analyst convenience (Table 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "tab:effective-promoter-radius"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
+
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+548
+\end_layout
 
 \end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
 
-).
- It would be better to define an algorithm that selects a more precise radius
- based on the features of the graph.
- One possible way to do this would be to randomly rearrange the called peaks
- throughout the genome many (while preserving the distribution of peak widths)
- and re-generate the same plot as in Figure 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:near-promoter-peak-enrich"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
 
-\end_inset
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\xout off
+\uuline off
+\uwave off
+\noun off
+\color none
+127
+\end_layout
 
-.
- This would yield a better 
-\begin_inset Quotes eld
 \end_inset
+</cell>
+</row>
+</lyxtabular>
 
-background
-\begin_inset Quotes erd
 \end_inset
 
- distribution that demonstrates the degree of near-TSS enrichment that would
- be expected by random chance.
- The effective promoter radius could be defined as the point where the true
- distribution diverges from the randomized background distribution.
- 
+
 \end_layout
 
-\begin_layout Standard
-Furthermore, the above definition of effective promoter radius has the significa
-nt limitation of being based on the peak calling method.
- It is thus very sensitive to the choice of peak caller and significance
- threshold for calling peaks, as well as the degree of saturation in the
- sequencing.
- Calling peaks from ChIP-seq samples with insufficient coverage depth, with
- the wrong peak caller, or with a different significance threshold could
- give a drastically different number of called peaks, and hence a drastically
- different distribution of peak-to-TSS distances.
- To address this, it is desirable to develop a better method of determining
- the effective promoter radius that relies only on the distribution of read
- coverage around the TSS, independent of the peak calling.
- Furthermore, as demonstrated by the upstream-downstream asymmetries observed
- in Figures 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K4me2-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
+\begin_inset Caption Standard
 
-\end_inset
+\begin_layout Plain Layout
 
-, 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K4me3-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
+\series bold
+\begin_inset Argument 1
+status open
+
+\begin_layout Plain Layout
+Comparison of significantly differentially expressed genes with and without
+ globin blocking.
+\end_layout
 
 \end_inset
 
-, and 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K27me3-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
 
-\end_inset
+\begin_inset CommandInset label
+LatexCommand label
+name "tab:Comparison-of-significant"
 
-, this definition should determine a different radius for the upstream and
- downstream directions.
- At this point, it may be better to rename this concept 
-\begin_inset Quotes eld
 \end_inset
 
-effective promoter extent
-\begin_inset Quotes erd
-\end_inset
+Comparison of significantly differentially expressed genes with and without
+ globin blocking.
 
- and avoid the word 
-\begin_inset Quotes eld
-\end_inset
+\series default
+ Up, Down: Genes significantly up/down-regulated in post-transplant samples
+ relative to pre-transplant samples, with a false discovery rate of 10%
+ or less.
+ NS: Non-significant genes (false discovery rate greater than 10%).
+\end_layout
 
-radius
-\begin_inset Quotes erd
 \end_inset
 
-, since a radius implies a symmetry about the TSS that is not supported
- by the data.
+
 \end_layout
 
-\begin_layout Standard
-Beyond improving the definition of effective promoter extent, functional
- validation is necessary to show that this measure of near-TSS enrichment
- has biological meaning.
- Figures 
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "fig:H3K4me2-neighborhood"
-plural "false"
-caps "false"
-noprefix "false"
+\begin_layout Plain Layout
+
+\end_layout
 
 \end_inset
 
- and 
+
+\end_layout
+
+\begin_layout Standard
+To compare performance on differential gene expression tests, we took subsets
+ of both the GB and non-GB libraries with exactly one pre-transplant and
+ one post-transplant sample for each animal that had paired samples available
+ for analysis (N=7 animals, N=14 samples in each subset).
+ The same test for pre- vs.
+ post-transplant differential gene expression was performed on the same
+ 7 pairs of samples from GB libraries and non-GB libraries, in each case
+ using an FDR of 10% as the threshold of significance.
+ Out of 12954 genes that passed the detection threshold in both subsets,
+ 358 were called significantly differentially expressed in the same direction
+ in both sets; 1063 were differentially expressed in the GB set only; 296
+ were differentially expressed in the non-GB set only; 2 genes were called
+ significantly up in the GB set but significantly down in the non-GB set;
+ and the remaining 11235 were not called differentially expressed in either
+ set.
+ These data are summarized in Table 
 \begin_inset CommandInset ref
 LatexCommand ref
-reference "fig:H3K4me3-neighborhood"
+reference "tab:Comparison-of-significant"
 plural "false"
 caps "false"
 noprefix "false"
 
 \end_inset
 
- already provide a very limited functional validation of the chosen promoter
- extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
- this region are most strongly correlated with elevated gene expression.
- However, there are other ways to show functional relevance of the promoter
- extent.
- For example, correlations could be computed between read counts in peaks
- nearby gene promoters and the expression level of those genes, and these
- correlations could be plotted against the distance of the peak upstream
- or downstream of the gene's TSS.
- If the promoter extent truly defines a 
-\begin_inset Quotes eld
-\end_inset
-
-sphere of influence
-\begin_inset Quotes erd
-\end_inset
+.
+ The differences in BCV calculated by EdgeR for these subsets of samples
+ were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
+\end_layout
 
- within which a histone mark is involved with the regulation of a gene,
- then the correlations for peaks within this extent should be significantly
- higher than those further upstream or downstream.
- Peaks within these extents may also be more likely to show differential
- modification than those outside genic regions of the genome.
+\begin_layout Standard
+The key point is that the GB data results in substantially more differentially
+ expressed calls than the non-GB data.
+ Since there is no gold standard for this dataset, it is impossible to be
+ certain whether this is due to under-calling of differential expression
+ in the non-GB samples or over-calling in the GB samples.
+ However, given that both datasets are derived from the same biological
+ samples and have nearly equal BCVs, it is more likely that the larger number
+ of DE calls in the GB samples are genuine detections that were enabled
+ by the higher sequencing depth and measurement precision of the GB samples.
+ Note that the same set of genes was considered in both subsets, so the
+ larger number of differentially expressed gene calls in the GB data set
+ reflects a greater sensitivity to detect significant differential gene
+ expression and not simply the larger total number of detected genes in
+ GB samples described earlier.
 \end_layout
 
-\begin_layout Subsection*
-Post-activation convergence of naive & memory cells
+\begin_layout Section
+Discussion
 \end_layout
 
 \begin_layout Standard
-In this study, a convergence between naive and memory cells was observed
- in both the pattern of gene expression and in epigenetic state of the 3
- histone marks studied.
-\end_layout
+The original experience with whole blood gene expression profiling on DNA
+ microarrays demonstrated that the high concentration of globin transcripts
+ reduced the sensitivity to detect genes with relatively low expression
+ levels, in effect, significantly reducing the sensitivity.
+ To address this limitation, commercial protocols for globin reduction were
+ developed based on strategies to block globin transcript amplification
+ during labeling or physically removing globin transcripts by affinity bead
+ methods 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Winn2010"
+literal "false"
 
-\begin_layout Itemize
-N-to-M convergence deserves further study of some kind
-\end_layout
+\end_inset
 
-\begin_deeper
-\begin_layout Itemize
-maybe serial activation & rest cycles for naive and memory, showing a cyclical
- pattern returning to the same state again and again after the first activation
-\end_layout
+.
+ More recently, using the latest generation of labeling protocols and arrays,
+ it was determined that globin reduction was no longer necessary to obtain
+ sufficient sensitivity to detect differential transcript expression 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "NuGEN2010"
+literal "false"
 
-\end_deeper
-\begin_layout Itemize
-Study other epigenetic marks in more contexts, including looking for similar
- convergence patterns.
- Use MOFA to identify coordinated patterns.
-\end_layout
+\end_inset
 
-\begin_deeper
-\begin_layout Itemize
-DNA methylation, histone marks, chromatin accessibility & conformation in
- CD4 T-cells
+.
+ However, we are not aware of any publications using these currently available
+ protocols the with latest generation of microarrays that actually compare
+ the detection sensitivity with and without globin reduction.
+ However, in practice this has now been adopted generally primarily driven
+ by concerns for cost control.
+ The main objective of our work was to directly test the impact of globin
+ gene transcripts and a new globin blocking protocol for application to
+ the newest generation of differential gene expression profiling determined
+ using next generation sequencing.
+ 
 \end_layout
 
-\begin_layout Itemize
-Also look at other types of lymphocytes: CD8 T-cells, B-cells, NK cells
-\end_layout
+\begin_layout Standard
+The challenge of doing global gene expression profiling in cynomolgus monkeys
+ is that the current available arrays were never designed to comprehensively
+ cover this genome and have not been updated since the first assemblies
+ of the cynomolgus genome were published.
+ Therefore, we determined that the best strategy for peripheral blood profiling
+ was to do deep RNA-seq and inform the workflow using the latest available
+ genome assembly and annotation 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Wilson2013"
+literal "false"
 
-\end_deeper
-\begin_layout Subsection*
-Promoter positional coverage: follow up on hints of interesting patterns
-\end_layout
+\end_inset
 
-\begin_layout Itemize
-Also find better normalizations: maybe borrow from MACS/SICER background
- correction methods?
+.
+ However, it was not immediately clear whether globin reduction was necessary
+ for RNA-seq or how much improvement in efficiency or sensitivity to detect
+ differential gene expression would be achieved for the added cost and work.
+ 
 \end_layout
 
-\begin_layout Itemize
-For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
- = peak position.
- Then correlate with expression.
-\end_layout
+\begin_layout Standard
+We only found one report that demonstrated that globin reduction significantly
+ improved the effective read yields for sequencing of human peripheral blood
+ cell RNA using a DeepSAGE protocol 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Mastrokolias2012"
+literal "false"
 
-\begin_layout Itemize
-Current analysis only at Day 0.
- Need to study across time points.
-\end_layout
+\end_inset
 
-\begin_layout Subsection*
-H3K4me correlation
+.
+ The approach to DeepSAGE involves two different restriction enzymes that
+ purify and then tag small fragments of transcripts at specific locations
+ and thus, significantly reduces the complexity of the transcriptome.
+ Therefore, we could not determine how DeepSAGE results would translate
+ to the common strategy in the field for assaying the entire transcript
+ population by whole-transcriptome 3’-end RNA-seq.
+ Furthermore, if globin reduction is necessary, we also needed a globin
+ reduction method specific to cynomolgus globin sequences that would work
+ an organism for which no kit is available off the shelf.
 \end_layout
 
 \begin_layout Standard
-The high correlation between coverage depth observed between H3K4me2 and
- H3K4me3 is both expected and unexpected.
- Since both marks are associated with elevated gene transcription, a positive
- correlation between them is not surprising.
- However, these two marks represent different post-translational modifications
- of the 
-\emph on
-same
-\emph default
- lysine residue on the histone H3 polypeptide, which means that they cannot
- both be present on the same H3 subunit.
- Thus, the high correlation between them has several potential explanations.
- One possible reason is cell population heterogeneity: perhaps some genomic
- loci are frequently marked with H3K4me2 in some cells, while in other cells
- the same loci are marked with H3K4me3.
- Another possibility is allele-specific modifications: the loci are marked
- in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
- allele.
- Lastly, since each histone octamer contains 2 H3 subunits, it is possible
- that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
- represents a distinct epigenetic state with a different function than either
- double H3K4me2 or double H3K4me3.
+As mentioned above, the addition of globin blocking oligos has a very small
+ impact on measured expression levels of gene expression.
+ However, this is a non-issue for the purposes of differential expression
+ testing, since a systematic change in a gene in all samples does not affect
+ relative expression levels between samples.
+ However, we must acknowledge that simple comparisons of gene expression
+ data obtained by GB and non-GB protocols are not possible without additional
+ normalization.
  
 \end_layout
 
 \begin_layout Standard
-These three hypotheses could be disentangled by single-cell ChIP-seq.
- If the correlation between these two histone marks persists even within
- the reads for each individual cell, then cell population heterogeneity
- cannot explain the correlation.
- Allele-specific modification can be tested for by looking at the correlation
- between read coverage of the two histone marks at heterozygous loci.
- If the correlation between read counts for opposite loci is low, then this
- is consistent with allele-specific modification.
- Finally if the modifications do not separate by either cell or allele,
- the colocation of these two marks is most likely occurring at the level
- of individual histones, with the heterogenously modified histone representing
- a distinct state.
- 
+More importantly, globin blocking not only nearly doubles the yield of usable
+ reads, it also increases inter-sample correlation and sensitivity to detect
+ differential gene expression relative to the same set of samples profiled
+ without blocking.
+ In addition, globin blocking does not add a significant amount of random
+ noise to the data.
+ Globin blocking thus represents a cost-effective way to squeeze more data
+ and statistical power out of the same blood samples and the same amount
+ of sequencing.
+ In conclusion, globin reduction greatly increases the yield of useful RNA-seq
+ reads mapping to the rest of the genome, with minimal perturbations in
+ the relative levels of non-globin genes.
+ Based on these results, globin transcript reduction using sequence-specific,
+ complementary blocking oligonucleotides is recommended for all deep RNA-seq
+ of cynomolgus and other nonhuman primate blood samples.
 \end_layout
 
-\begin_layout Standard
-However, another experiment would be required to show direct evidence of
- such a heterogeneously modified state.
- Specifically a 
-\begin_inset Quotes eld
-\end_inset
-
-double ChIP
-\begin_inset Quotes erd
-\end_inset
-
- experiment would need to be performed, where the input DNA is first subjected
- to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
- then the enriched material is collected, with proteins still bound, and
- immunoprecipitated 
-\emph on
-again
-\emph default
- using the anti-H3K4me3 antibody.
- If this yields significant numbers of non-artifactual reads in the same
- regions as the individual pulldowns of the two marks, this is strong evidence
- that the two marks are occurring on opposite H3 subunits of the same histones.
+\begin_layout Section
+Future Directions
 \end_layout
 
 \begin_layout Standard
@@ -14047,11 +14134,9 @@ again
 status open
 
 \begin_layout Plain Layout
-Try to see if double ChIP-seq is actually feasible, and if not, come up
- with some other idea for directly detecting the mixed mod state.
- Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
-on.
- That's one possible angle.
+I've already done a good bit of work outside just this globin blocking thing,
+ so I'm not sure what to put for future directions.
+ Does it inculde the other stuff I've done but not published?
 \end_layout
 
 \end_inset
@@ -14059,20 +14144,8 @@ on.
 
 \end_layout
 
-\begin_layout Section*
-Ch3
-\end_layout
-
-\begin_layout Itemize
-Use CV or bootstrap to better evaluate classifiers
-\end_layout
-
-\begin_layout Itemize
-fRMAtools could be adapted to not require equal-sized groups
-\end_layout
-
-\begin_layout Section*
-Ch4
+\begin_layout Chapter
+Future Directions
 \end_layout
 
 \begin_layout Standard
@@ -14080,9 +14153,9 @@ Ch4
 status open
 
 \begin_layout Plain Layout
-I've already done a good bit of work outside just this globin blocking thing,
- so I'm not sure what to put for future directions.
- Does it inculde the other stuff I've done but not published?
+If there are any chapter-independent future directions, put them here.
+ Otherwise, delete this section.
+ Check in the directions if this is OK.
 \end_layout
 
 \end_inset