6 лет назад · 786a1cc6ef
--- a/thesis.lyx
+++ b/thesis.lyx
@@ -6542,6 +6542,413 @@ Is this needed?
 
															 \end_inset
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Section
														
 
															+Future Directions
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
														
 
															+ ways a preliminary study that suggests a multitude of new avenues of investigat
														
 
															+ion.
														
 
															+ Here we consider a selection of such avenues.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+Improve on the idea of an effective promoter radius
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+This study introduced the concept of an 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+effective promoter radius
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ specific to each histone mark based on distince from the TSS within which
														
 
															+ an excess of peaks was called for that mark.
														
 
															+ This concept was then used to guide further analyses throughout the study.
														
 
															+ However, while the effective promoter radius was useful in those analyses,
														
 
															+ it is both limited in theory and shown in practice to be a possible oversimplif
														
 
															+ication.
														
 
															+ First, the effective promoter radii used in this study were chosen based
														
 
															+ on manual inspection of the TSS-to-peak distance distributions in Figure
														
 
															+ 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:near-promoter-peak-enrich"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+, selecting round numbers of analyst convenience (Table 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "tab:effective-promoter-radius"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+).
														
 
															+ It would be better to define an algorithm that selects a more precise radius
														
 
															+ based on the features of the graph.
														
 
															+ One possible way to do this would be to randomly rearrange the called peaks
														
 
															+ throughout the genome many (while preserving the distribution of peak widths)
														
 
															+ and re-generate the same plot as in Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:near-promoter-peak-enrich"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ This would yield a better 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+background
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ distribution that demonstrates the degree of near-TSS enrichment that would
														
 
															+ be expected by random chance.
														
 
															+ The effective promoter radius could be defined as the point where the true
														
 
															+ distribution diverges from the randomized background distribution.
														
 
															+ 
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+Furthermore, the above definition of effective promoter radius has the significa
														
 
															+nt limitation of being based on the peak calling method.
														
 
															+ It is thus very sensitive to the choice of peak caller and significance
														
 
															+ threshold for calling peaks, as well as the degree of saturation in the
														
 
															+ sequencing.
														
 
															+ Calling peaks from ChIP-seq samples with insufficient coverage depth, with
														
 
															+ the wrong peak caller, or with a different significance threshold could
														
 
															+ give a drastically different number of called peaks, and hence a drastically
														
 
															+ different distribution of peak-to-TSS distances.
														
 
															+ To address this, it is desirable to develop a better method of determining
														
 
															+ the effective promoter radius that relies only on the distribution of read
														
 
															+ coverage around the TSS, independent of the peak calling.
														
 
															+ Furthermore, as demonstrated by the upstream-downstream asymmetries observed
														
 
															+ in Figures 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:H3K4me2-neighborhood"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+, 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:H3K4me3-neighborhood"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+, and 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:H3K27me3-neighborhood"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+, this definition should determine a different radius for the upstream and
														
 
															+ downstream directions.
														
 
															+ At this point, it may be better to rename this concept 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+effective promoter extent
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ and avoid the word 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+radius
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+, since a radius implies a symmetry about the TSS that is not supported
														
 
															+ by the data.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+Beyond improving the definition of effective promoter extent, functional
														
 
															+ validation is necessary to show that this measure of near-TSS enrichment
														
 
															+ has biological meaning.
														
 
															+ Figures 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:H3K4me2-neighborhood"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ and 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:H3K4me3-neighborhood"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ already provide a very limited functional validation of the chosen promoter
														
 
															+ extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
														
 
															+ this region are most strongly correlated with elevated gene expression.
														
 
															+ However, there are other ways to show functional relevance of the promoter
														
 
															+ extent.
														
 
															+ For example, correlations could be computed between read counts in peaks
														
 
															+ nearby gene promoters and the expression level of those genes, and these
														
 
															+ correlations could be plotted against the distance of the peak upstream
														
 
															+ or downstream of the gene's TSS.
														
 
															+ If the promoter extent truly defines a 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+sphere of influence
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ within which a histone mark is involved with the regulation of a gene,
														
 
															+ then the correlations for peaks within this extent should be significantly
														
 
															+ higher than those further upstream or downstream.
														
 
															+ Peaks within these extents may also be more likely to show differential
														
 
															+ modification than those outside genic regions of the genome.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+Design experiments to focus on post-activation convergence of naive & memory
														
 
															+ cells
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+In this study, a convergence between naive and memory cells was observed
														
 
															+ in both the pattern of gene expression and in epigenetic state of the 3
														
 
															+ histone marks studied, consistent with the hypothesis that any naive cells
														
 
															+ remaining 14 days after activation have differentiated into memory cells,
														
 
															+ and that both gene expression and these histone marks are involved in this
														
 
															+ differentiation.
														
 
															+ However, the current study was not designed with this specific hypothesis
														
 
															+ in mind, and it therefore has some deficiencies with regard to testing
														
 
															+ it.
														
 
															+ The memory CD4 samples at day 14 do not resemble the memory samples at
														
 
															+ day 0, indicating that in the specific model of activation used for this
														
 
															+ experiment, the cells are not guaranteed to return to their original pre-activa
														
 
															+tion state, or perhaps this process takes substantially longer than 14 days.
														
 
															+ This is a challenge for the convergence hypothesis because the ideal comparison
														
 
															+ to prove that naive cells are converging to a resting memory state would
														
 
															+ be to compare the final naive time point to the Day 0 memory samples, but
														
 
															+ this comparison is only meaningful if memory cells generally return to
														
 
															+ the same 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+resting
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ state that they started at.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+To better study the convergence hypothesis, a new experiment should be designed
														
 
															+ using a model system for T-cell activation that is known to allow cells
														
 
															+ to return as closely as possible to their pre-activation state.
														
 
															+ Alternatively, if it is not possible to find or design such a model system,
														
 
															+ the same cell cultures could be activated serially multiple times, and
														
 
															+ sequenced after each activation cycle right before the next activation.
														
 
															+ It is likely that several activations in the same model system will settle
														
 
															+ into a cylical pattern, converging to a consistent 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+resting
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ state after each activation, even if this state is different from the initial
														
 
															+ resting state at Day 0.
														
 
															+ If so, it will be possible to compare the final states of both naive and
														
 
															+ memory cells to show that they converge despite different initial conditions.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+In addition, if naive-to-memory convergence is a general pattern, it should
														
 
															+ also be detectable in other epigenetic marks, including other histone marks
														
 
															+ and DNA methylation.
														
 
															+ An experiment should be designed studying a large number of epigenetic
														
 
															+ marks known or suspected to be involved in regulation of gene expression,
														
 
															+ assaying all of these at the same pre- and post-activation time points.
														
 
															+ Multi-dataset factor analysis methods like MOFA can then be used to identify
														
 
															+ coordinated patterns of regulation shared across many epigenetic marks.
														
 
															+ If possible, some 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+negative control
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ marks should be included that are known 
														
 
															+\emph on
														
 
															+not
														
 
															+\emph default
														
 
															+ to be involved in T-cell activation or memory formation.
														
 
															+ Of course, CD4 T-cells are not the only adaptive immune cells with memory.
														
 
															+ A similar study could be designed for CD8 T-cells, B-cells, and even specific
														
 
															+ subsets of CD4 T-cells.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+Follow up on hints of interesting patterns in promoter relative coverage
														
 
															+ profiles
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Flex TODO Note (inline)
														
 
															+status open
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+I think I might need to write up the negative results for the Promoter CpG
														
 
															+ and defined pattern analysis before writing this section.
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Itemize
														
 
															+Also find better normalizations: maybe borrow from MACS/SICER background
														
 
															+ correction methods?
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Itemize
														
 
															+For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
														
 
															+ = peak position.
														
 
															+ Then correlate with expression.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Itemize
														
 
															+Current analysis only at Day 0.
														
 
															+ Need to study across time points.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Itemize
														
 
															+Integrating data across so many dimensions is a significant analysis challenge
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+Investigate causes of high correlation between mutually exclusive histone
														
 
															+ marks
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+The high correlation between coverage depth observed between H3K4me2 and
														
 
															+ H3K4me3 is both expected and unexpected.
														
 
															+ Since both marks are associated with elevated gene transcription, a positive
														
 
															+ correlation between them is not surprising.
														
 
															+ However, these two marks represent different post-translational modifications
														
 
															+ of the 
														
 
															+\emph on
														
 
															+same
														
 
															+\emph default
														
 
															+ lysine residue on the histone H3 polypeptide, which means that they cannot
														
 
															+ both be present on the same H3 subunit.
														
 
															+ Thus, the high correlation between them has several potential explanations.
														
 
															+ One possible reason is cell population heterogeneity: perhaps some genomic
														
 
															+ loci are frequently marked with H3K4me2 in some cells, while in other cells
														
 
															+ the same loci are marked with H3K4me3.
														
 
															+ Another possibility is allele-specific modifications: the loci are marked
														
 
															+ in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
														
 
															+ allele.
														
 
															+ Lastly, since each histone octamer contains 2 H3 subunits, it is possible
														
 
															+ that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
														
 
															+ represents a distinct epigenetic state with a different function than either
														
 
															+ double H3K4me2 or double H3K4me3.
														
 
															+ 
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+These three hypotheses could be disentangled by single-cell ChIP-seq.
														
 
															+ If the correlation between these two histone marks persists even within
														
 
															+ the reads for each individual cell, then cell population heterogeneity
														
 
															+ cannot explain the correlation.
														
 
															+ Allele-specific modification can be tested for by looking at the correlation
														
 
															+ between read coverage of the two histone marks at heterozygous loci.
														
 
															+ If the correlation between read counts for opposite loci is low, then this
														
 
															+ is consistent with allele-specific modification.
														
 
															+ Finally if the modifications do not separate by either cell or allele,
														
 
															+ the colocation of these two marks is most likely occurring at the level
														
 
															+ of individual histones, with the heterogenously modified histone representing
														
 
															+ a distinct state.
														
 
															+ 
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+However, another experiment would be required to show direct evidence of
														
 
															+ such a heterogeneously modified state.
														
 
															+ Specifically a 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+double ChIP
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+ experiment would need to be performed, where the input DNA is first subjected
														
 
															+ to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
														
 
															+ then the enriched material is collected, with proteins still bound, and
														
 
															+ immunoprecipitated 
														
 
															+\emph on
														
 
															+again
														
 
															+\emph default
														
 
															+ using the anti-H3K4me3 antibody.
														
 
															+ If this yields significant numbers of non-artifactual reads in the same
														
 
															+ regions as the individual pulldowns of the two marks, this is strong evidence
														
 
															+ that the two marks are occurring on opposite H3 subunits of the same histones.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Flex TODO Note (inline)
														
 
															+status open
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+Try to see if double ChIP-seq is actually feasible, and if not, come up
														
 
															+ with some other idea for directly detecting the mixed mod state.
														
 
															+ Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
														
 
															+on.
														
 
															+ That's one possible angle.
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															 \end_layout
														
 
															 \begin_layout Chapter
														
@@ -11223,7 +11630,7 @@ researcher degree of freedom
 
															  on the choice of batch size based on vague selection criteria and instinct,
														
 
															  which can unintentionally inproduce bias if the researcher chooses a batch
														
 
															  size based on what seems to yield the most favorable downstream results
														
 
															-  
														
 
															+ 
														
 
															 \begin_inset CommandInset citation
														
 
															 LatexCommand cite
														
 
															 key "Simmons2011"
														
@@ -11278,6 +11685,26 @@ noprefix "false"
 
															  parameter's estimation.
														
 
															 \end_layout
														
 
															+\begin_layout Subsection
														
 
															+methyl array stuff
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+The current study has showed that DNA methylation, as assayed by Illumina
														
 
															+ 450k methylation arrays, has some potential for diagnosing transplant dysfuncti
														
 
															+ons, including rejection.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Itemize
														
 
															+Eliminate the need for SVA, since it can't be applied in ML context.
														
 
															+ 
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Itemize
														
 
															+Alternatively, use SVA to identify and discard probes with strong SV association
														
 
															+s prior to training.
														
 
															+\end_layout
														
 
															+
														
 
															 \begin_layout Chapter
														
 
															 Globin-blocking for more effective blood RNA-seq analysis in primate animal
														
 
															  model
														
@@ -13229,188 +13656,12 @@ Globin-Blocking
 
															 \begin_layout Plain Layout
														
 
															 \series bold
														
 
															-Up
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\family roman
														
 
															-\series medium
														
 
															-\shape up
														
 
															-\size normal
														
 
															-\emph off
														
 
															-\bar no
														
 
															-\strikeout off
														
 
															-\xout off
														
 
															-\uuline off
														
 
															-\uwave off
														
 
															-\noun off
														
 
															-\color none
														
 
															-231
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\family roman
														
 
															-\series medium
														
 
															-\shape up
														
 
															-\size normal
														
 
															-\emph off
														
 
															-\bar no
														
 
															-\strikeout off
														
 
															-\xout off
														
 
															-\uuline off
														
 
															-\uwave off
														
 
															-\noun off
														
 
															-\color none
														
 
															-515
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\family roman
														
 
															-\series medium
														
 
															-\shape up
														
 
															-\size normal
														
 
															-\emph off
														
 
															-\bar no
														
 
															-\strikeout off
														
 
															-\xout off
														
 
															-\uuline off
														
 
															-\uwave off
														
 
															-\noun off
														
 
															-\color none
														
 
															-2
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-</row>
														
 
															-<row>
														
 
															-<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\series bold
														
 
															-NS
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\family roman
														
 
															-\series medium
														
 
															-\shape up
														
 
															-\size normal
														
 
															-\emph off
														
 
															-\bar no
														
 
															-\strikeout off
														
 
															-\xout off
														
 
															-\uuline off
														
 
															-\uwave off
														
 
															-\noun off
														
 
															-\color none
														
 
															-160
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\family roman
														
 
															-\series medium
														
 
															-\shape up
														
 
															-\size normal
														
 
															-\emph off
														
 
															-\bar no
														
 
															-\strikeout off
														
 
															-\xout off
														
 
															-\uuline off
														
 
															-\uwave off
														
 
															-\noun off
														
 
															-\color none
														
 
															-11235
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\family roman
														
 
															-\series medium
														
 
															-\shape up
														
 
															-\size normal
														
 
															-\emph off
														
 
															-\bar no
														
 
															-\strikeout off
														
 
															-\xout off
														
 
															-\uuline off
														
 
															-\uwave off
														
 
															-\noun off
														
 
															-\color none
														
 
															-136
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-</row>
														
 
															-<row>
														
 
															-<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-</cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															-\begin_inset Text
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\series bold
														
 
															-Down
														
 
															+Up
														
 
															 \end_layout
														
 
															 \end_inset
														
 
															 </cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															 \begin_inset Text
														
 
															 \begin_layout Plain Layout
														
@@ -13427,12 +13678,12 @@ Down
 
															 \uwave off
														
 
															 \noun off
														
 
															 \color none
														
 
															-0
														
 
															+231
														
 
															 \end_layout
														
 
															 \end_inset
														
 
															 </cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															 \begin_inset Text
														
 
															 \begin_layout Plain Layout
														
@@ -13449,12 +13700,12 @@ Down
 
															 \uwave off
														
 
															 \noun off
														
 
															 \color none
														
 
															-548
														
 
															+515
														
 
															 \end_layout
														
 
															 \end_inset
														
 
															 </cell>
														
 
															-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															 \begin_inset Text
														
 
															 \begin_layout Plain Layout
														
@@ -13471,575 +13722,411 @@ Down
 
															 \uwave off
														
 
															 \noun off
														
 
															 \color none
														
 
															-127
														
 
															+2
														
 
															 \end_layout
														
 
															 \end_inset
														
 
															 </cell>
														
 
															 </row>
														
 
															-</lyxtabular>
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-\begin_inset Caption Standard
														
 
															-
														
 
															-\begin_layout Plain Layout
														
 
															-
														
 
															-\series bold
														
 
															-\begin_inset Argument 1
														
 
															-status open
														
 
															+<row>
														
 
															+<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															 \begin_layout Plain Layout
														
 
															-Comparison of significantly differentially expressed genes with and without
														
 
															- globin blocking.
														
 
															-\end_layout
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-
														
 
															-\begin_inset CommandInset label
														
 
															-LatexCommand label
														
 
															-name "tab:Comparison-of-significant"
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-Comparison of significantly differentially expressed genes with and without
														
 
															- globin blocking.
														
 
															-\series default
														
 
															- Up, Down: Genes significantly up/down-regulated in post-transplant samples
														
 
															- relative to pre-transplant samples, with a false discovery rate of 10%
														
 
															- or less.
														
 
															- NS: Non-significant genes (false discovery rate greater than 10%).
														
 
															 \end_layout
														
 
															 \end_inset
														
 
															-
														
 
															-
														
 
															-\end_layout
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															 \begin_layout Plain Layout
														
 
															+\series bold
														
 
															+NS
														
 
															 \end_layout
														
 
															 \end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+\begin_layout Plain Layout
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+160
														
 
															 \end_layout
														
 
															-\begin_layout Standard
														
 
															-To compare performance on differential gene expression tests, we took subsets
														
 
															- of both the GB and non-GB libraries with exactly one pre-transplant and
														
 
															- one post-transplant sample for each animal that had paired samples available
														
 
															- for analysis (N=7 animals, N=14 samples in each subset).
														
 
															- The same test for pre- vs.
														
 
															- post-transplant differential gene expression was performed on the same
														
 
															- 7 pairs of samples from GB libraries and non-GB libraries, in each case
														
 
															- using an FDR of 10% as the threshold of significance.
														
 
															- Out of 12954 genes that passed the detection threshold in both subsets,
														
 
															- 358 were called significantly differentially expressed in the same direction
														
 
															- in both sets; 1063 were differentially expressed in the GB set only; 296
														
 
															- were differentially expressed in the non-GB set only; 2 genes were called
														
 
															- significantly up in the GB set but significantly down in the non-GB set;
														
 
															- and the remaining 11235 were not called differentially expressed in either
														
 
															- set.
														
 
															- These data are summarized in Table 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "tab:Comparison-of-significant"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-.
														
 
															- The differences in BCV calculated by EdgeR for these subsets of samples
														
 
															- were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Standard
														
 
															-The key point is that the GB data results in substantially more differentially
														
 
															- expressed calls than the non-GB data.
														
 
															- Since there is no gold standard for this dataset, it is impossible to be
														
 
															- certain whether this is due to under-calling of differential expression
														
 
															- in the non-GB samples or over-calling in the GB samples.
														
 
															- However, given that both datasets are derived from the same biological
														
 
															- samples and have nearly equal BCVs, it is more likely that the larger number
														
 
															- of DE calls in the GB samples are genuine detections that were enabled
														
 
															- by the higher sequencing depth and measurement precision of the GB samples.
														
 
															- Note that the same set of genes was considered in both subsets, so the
														
 
															- larger number of differentially expressed gene calls in the GB data set
														
 
															- reflects a greater sensitivity to detect significant differential gene
														
 
															- expression and not simply the larger total number of detected genes in
														
 
															- GB samples described earlier.
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Section
														
 
															-Discussion
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Standard
														
 
															-The original experience with whole blood gene expression profiling on DNA
														
 
															- microarrays demonstrated that the high concentration of globin transcripts
														
 
															- reduced the sensitivity to detect genes with relatively low expression
														
 
															- levels, in effect, significantly reducing the sensitivity.
														
 
															- To address this limitation, commercial protocols for globin reduction were
														
 
															- developed based on strategies to block globin transcript amplification
														
 
															- during labeling or physically removing globin transcripts by affinity bead
														
 
															- methods 
														
 
															-\begin_inset CommandInset citation
														
 
															-LatexCommand cite
														
 
															-key "Winn2010"
														
 
															-literal "false"
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-.
														
 
															- More recently, using the latest generation of labeling protocols and arrays,
														
 
															- it was determined that globin reduction was no longer necessary to obtain
														
 
															- sufficient sensitivity to detect differential transcript expression 
														
 
															-\begin_inset CommandInset citation
														
 
															-LatexCommand cite
														
 
															-key "NuGEN2010"
														
 
															-literal "false"
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-.
														
 
															- However, we are not aware of any publications using these currently available
														
 
															- protocols the with latest generation of microarrays that actually compare
														
 
															- the detection sensitivity with and without globin reduction.
														
 
															- However, in practice this has now been adopted generally primarily driven
														
 
															- by concerns for cost control.
														
 
															- The main objective of our work was to directly test the impact of globin
														
 
															- gene transcripts and a new globin blocking protocol for application to
														
 
															- the newest generation of differential gene expression profiling determined
														
 
															- using next generation sequencing.
														
 
															- 
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Standard
														
 
															-The challenge of doing global gene expression profiling in cynomolgus monkeys
														
 
															- is that the current available arrays were never designed to comprehensively
														
 
															- cover this genome and have not been updated since the first assemblies
														
 
															- of the cynomolgus genome were published.
														
 
															- Therefore, we determined that the best strategy for peripheral blood profiling
														
 
															- was to do deep RNA-seq and inform the workflow using the latest available
														
 
															- genome assembly and annotation 
														
 
															-\begin_inset CommandInset citation
														
 
															-LatexCommand cite
														
 
															-key "Wilson2013"
														
 
															-literal "false"
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-.
														
 
															- However, it was not immediately clear whether globin reduction was necessary
														
 
															- for RNA-seq or how much improvement in efficiency or sensitivity to detect
														
 
															- differential gene expression would be achieved for the added cost and work.
														
 
															- 
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Standard
														
 
															-We only found one report that demonstrated that globin reduction significantly
														
 
															- improved the effective read yields for sequencing of human peripheral blood
														
 
															- cell RNA using a DeepSAGE protocol 
														
 
															-\begin_inset CommandInset citation
														
 
															-LatexCommand cite
														
 
															-key "Mastrokolias2012"
														
 
															-literal "false"
														
 
															-
														
 
															 \end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															-.
														
 
															- The approach to DeepSAGE involves two different restriction enzymes that
														
 
															- purify and then tag small fragments of transcripts at specific locations
														
 
															- and thus, significantly reduces the complexity of the transcriptome.
														
 
															- Therefore, we could not determine how DeepSAGE results would translate
														
 
															- to the common strategy in the field for assaying the entire transcript
														
 
															- population by whole-transcriptome 3’-end RNA-seq.
														
 
															- Furthermore, if globin reduction is necessary, we also needed a globin
														
 
															- reduction method specific to cynomolgus globin sequences that would work
														
 
															- an organism for which no kit is available off the shelf.
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Standard
														
 
															-As mentioned above, the addition of globin blocking oligos has a very small
														
 
															- impact on measured expression levels of gene expression.
														
 
															- However, this is a non-issue for the purposes of differential expression
														
 
															- testing, since a systematic change in a gene in all samples does not affect
														
 
															- relative expression levels between samples.
														
 
															- However, we must acknowledge that simple comparisons of gene expression
														
 
															- data obtained by GB and non-GB protocols are not possible without additional
														
 
															- normalization.
														
 
															- 
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Standard
														
 
															-More importantly, globin blocking not only nearly doubles the yield of usable
														
 
															- reads, it also increases inter-sample correlation and sensitivity to detect
														
 
															- differential gene expression relative to the same set of samples profiled
														
 
															- without blocking.
														
 
															- In addition, globin blocking does not add a significant amount of random
														
 
															- noise to the data.
														
 
															- Globin blocking thus represents a cost-effective way to squeeze more data
														
 
															- and statistical power out of the same blood samples and the same amount
														
 
															- of sequencing.
														
 
															- In conclusion, globin reduction greatly increases the yield of useful RNA-seq
														
 
															- reads mapping to the rest of the genome, with minimal perturbations in
														
 
															- the relative levels of non-globin genes.
														
 
															- Based on these results, globin transcript reduction using sequence-specific,
														
 
															- complementary blocking oligonucleotides is recommended for all deep RNA-seq
														
 
															- of cynomolgus and other nonhuman primate blood samples.
														
 
															-\end_layout
														
 
															+\begin_layout Plain Layout
														
 
															-\begin_layout Chapter
														
 
															-Future Directions
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+11235
														
 
															 \end_layout
														
 
															-\begin_layout Standard
														
 
															-\begin_inset Flex TODO Note (inline)
														
 
															-status open
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															 \begin_layout Plain Layout
														
 
															-Consider putting each chapter's future directions with that chapter instead
														
 
															- of in a separate one.
														
 
															- Check instructions to see if this is allowed/appropriate.
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+136
														
 
															 \end_layout
														
 
															 \end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+<row>
														
 
															+<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+\begin_layout Plain Layout
														
 
															 \end_layout
														
 
															-\begin_layout Section*
														
 
															-Ch2
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															-\begin_layout Standard
														
 
															-The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
														
 
															- ways a preliminary study that suggests a multitude of new avenues of investigat
														
 
															-ion.
														
 
															- Here we consider a selection of such avenues.
														
 
															-\end_layout
														
 
															+\begin_layout Plain Layout
														
 
															-\begin_layout Subsection*
														
 
															-Improving on the effective promoter radius
														
 
															+\series bold
														
 
															+Down
														
 
															 \end_layout
														
 
															-\begin_layout Standard
														
 
															-This study introduced the concept of an 
														
 
															-\begin_inset Quotes eld
														
 
															 \end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															-effective promoter radius
														
 
															-\begin_inset Quotes erd
														
 
															-\end_inset
														
 
															+\begin_layout Plain Layout
														
 
															- specific to each histone mark based on distince from the TSS within which
														
 
															- an excess of peaks was called for that mark.
														
 
															- This concept was then used to guide further analyses throughout the study.
														
 
															- However, while the effective promoter radius was useful in those analyses,
														
 
															- it is both limited in theory and shown in practice to be a possible oversimplif
														
 
															-ication.
														
 
															- First, the effective promoter radii used in this study were chosen based
														
 
															- on manual inspection of the TSS-to-peak distance distributions in Figure
														
 
															- 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "fig:near-promoter-peak-enrich"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0
														
 
															+\end_layout
														
 
															 \end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															-, selecting round numbers of analyst convenience (Table 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "tab:effective-promoter-radius"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+548
														
 
															+\end_layout
														
 
															 \end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															-).
														
 
															- It would be better to define an algorithm that selects a more precise radius
														
 
															- based on the features of the graph.
														
 
															- One possible way to do this would be to randomly rearrange the called peaks
														
 
															- throughout the genome many (while preserving the distribution of peak widths)
														
 
															- and re-generate the same plot as in Figure 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "fig:near-promoter-peak-enrich"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															+\begin_layout Plain Layout
														
 
															-\end_inset
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+127
														
 
															+\end_layout
														
 
															-.
														
 
															- This would yield a better 
														
 
															-\begin_inset Quotes eld
														
 
															 \end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+</lyxtabular>
														
 
															-background
														
 
															-\begin_inset Quotes erd
														
 
															 \end_inset
														
 
															- distribution that demonstrates the degree of near-TSS enrichment that would
														
 
															- be expected by random chance.
														
 
															- The effective promoter radius could be defined as the point where the true
														
 
															- distribution diverges from the randomized background distribution.
														
 
															- 
														
 
															+
														
 
															 \end_layout
														
 
															-\begin_layout Standard
														
 
															-Furthermore, the above definition of effective promoter radius has the significa
														
 
															-nt limitation of being based on the peak calling method.
														
 
															- It is thus very sensitive to the choice of peak caller and significance
														
 
															- threshold for calling peaks, as well as the degree of saturation in the
														
 
															- sequencing.
														
 
															- Calling peaks from ChIP-seq samples with insufficient coverage depth, with
														
 
															- the wrong peak caller, or with a different significance threshold could
														
 
															- give a drastically different number of called peaks, and hence a drastically
														
 
															- different distribution of peak-to-TSS distances.
														
 
															- To address this, it is desirable to develop a better method of determining
														
 
															- the effective promoter radius that relies only on the distribution of read
														
 
															- coverage around the TSS, independent of the peak calling.
														
 
															- Furthermore, as demonstrated by the upstream-downstream asymmetries observed
														
 
															- in Figures 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "fig:H3K4me2-neighborhood"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															-\end_inset
														
 
															+\begin_layout Plain Layout
														
 
															-, 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "fig:H3K4me3-neighborhood"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															+\series bold
														
 
															+\begin_inset Argument 1
														
 
															+status open
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+Comparison of significantly differentially expressed genes with and without
														
 
															+ globin blocking.
														
 
															+\end_layout
														
 
															 \end_inset
														
 
															-, and 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "fig:H3K27me3-neighborhood"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															-\end_inset
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "tab:Comparison-of-significant"
														
 
															-, this definition should determine a different radius for the upstream and
														
 
															- downstream directions.
														
 
															- At this point, it may be better to rename this concept 
														
 
															-\begin_inset Quotes eld
														
 
															 \end_inset
														
 
															-effective promoter extent
														
 
															-\begin_inset Quotes erd
														
 
															-\end_inset
														
 
															+Comparison of significantly differentially expressed genes with and without
														
 
															+ globin blocking.
														
 
															- and avoid the word 
														
 
															-\begin_inset Quotes eld
														
 
															-\end_inset
														
 
															+\series default
														
 
															+ Up, Down: Genes significantly up/down-regulated in post-transplant samples
														
 
															+ relative to pre-transplant samples, with a false discovery rate of 10%
														
 
															+ or less.
														
 
															+ NS: Non-significant genes (false discovery rate greater than 10%).
														
 
															+\end_layout
														
 
															-radius
														
 
															-\begin_inset Quotes erd
														
 
															 \end_inset
														
 
															-, since a radius implies a symmetry about the TSS that is not supported
														
 
															- by the data.
														
 
															+
														
 
															 \end_layout
														
 
															-\begin_layout Standard
														
 
															-Beyond improving the definition of effective promoter extent, functional
														
 
															- validation is necessary to show that this measure of near-TSS enrichment
														
 
															- has biological meaning.
														
 
															- Figures 
														
 
															-\begin_inset CommandInset ref
														
 
															-LatexCommand ref
														
 
															-reference "fig:H3K4me2-neighborhood"
														
 
															-plural "false"
														
 
															-caps "false"
														
 
															-noprefix "false"
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\end_layout
														
 
															 \end_inset
														
 
															- and 
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+To compare performance on differential gene expression tests, we took subsets
														
 
															+ of both the GB and non-GB libraries with exactly one pre-transplant and
														
 
															+ one post-transplant sample for each animal that had paired samples available
														
 
															+ for analysis (N=7 animals, N=14 samples in each subset).
														
 
															+ The same test for pre- vs.
														
 
															+ post-transplant differential gene expression was performed on the same
														
 
															+ 7 pairs of samples from GB libraries and non-GB libraries, in each case
														
 
															+ using an FDR of 10% as the threshold of significance.
														
 
															+ Out of 12954 genes that passed the detection threshold in both subsets,
														
 
															+ 358 were called significantly differentially expressed in the same direction
														
 
															+ in both sets; 1063 were differentially expressed in the GB set only; 296
														
 
															+ were differentially expressed in the non-GB set only; 2 genes were called
														
 
															+ significantly up in the GB set but significantly down in the non-GB set;
														
 
															+ and the remaining 11235 were not called differentially expressed in either
														
 
															+ set.
														
 
															+ These data are summarized in Table 
														
 
															 \begin_inset CommandInset ref
														
 
															 LatexCommand ref
														
 
															-reference "fig:H3K4me3-neighborhood"
														
 
															+reference "tab:Comparison-of-significant"
														
 
															 plural "false"
														
 
															 caps "false"
														
 
															 noprefix "false"
														
 
															 \end_inset
														
 
															- already provide a very limited functional validation of the chosen promoter
														
 
															- extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
														
 
															- this region are most strongly correlated with elevated gene expression.
														
 
															- However, there are other ways to show functional relevance of the promoter
														
 
															- extent.
														
 
															- For example, correlations could be computed between read counts in peaks
														
 
															- nearby gene promoters and the expression level of those genes, and these
														
 
															- correlations could be plotted against the distance of the peak upstream
														
 
															- or downstream of the gene's TSS.
														
 
															- If the promoter extent truly defines a 
														
 
															-\begin_inset Quotes eld
														
 
															-\end_inset
														
 
															-
														
 
															-sphere of influence
														
 
															-\begin_inset Quotes erd
														
 
															-\end_inset
														
 
															+.
														
 
															+ The differences in BCV calculated by EdgeR for these subsets of samples
														
 
															+ were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
														
 
															+\end_layout
														
 
															- within which a histone mark is involved with the regulation of a gene,
														
 
															- then the correlations for peaks within this extent should be significantly
														
 
															- higher than those further upstream or downstream.
														
 
															- Peaks within these extents may also be more likely to show differential
														
 
															- modification than those outside genic regions of the genome.
														
 
															+\begin_layout Standard
														
 
															+The key point is that the GB data results in substantially more differentially
														
 
															+ expressed calls than the non-GB data.
														
 
															+ Since there is no gold standard for this dataset, it is impossible to be
														
 
															+ certain whether this is due to under-calling of differential expression
														
 
															+ in the non-GB samples or over-calling in the GB samples.
														
 
															+ However, given that both datasets are derived from the same biological
														
 
															+ samples and have nearly equal BCVs, it is more likely that the larger number
														
 
															+ of DE calls in the GB samples are genuine detections that were enabled
														
 
															+ by the higher sequencing depth and measurement precision of the GB samples.
														
 
															+ Note that the same set of genes was considered in both subsets, so the
														
 
															+ larger number of differentially expressed gene calls in the GB data set
														
 
															+ reflects a greater sensitivity to detect significant differential gene
														
 
															+ expression and not simply the larger total number of detected genes in
														
 
															+ GB samples described earlier.
														
 
															 \end_layout
														
 
															-\begin_layout Subsection*
														
 
															-Post-activation convergence of naive & memory cells
														
 
															+\begin_layout Section
														
 
															+Discussion
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
 
															-In this study, a convergence between naive and memory cells was observed
														
 
															- in both the pattern of gene expression and in epigenetic state of the 3
														
 
															- histone marks studied.
														
 
															-\end_layout
														
 
															+The original experience with whole blood gene expression profiling on DNA
														
 
															+ microarrays demonstrated that the high concentration of globin transcripts
														
 
															+ reduced the sensitivity to detect genes with relatively low expression
														
 
															+ levels, in effect, significantly reducing the sensitivity.
														
 
															+ To address this limitation, commercial protocols for globin reduction were
														
 
															+ developed based on strategies to block globin transcript amplification
														
 
															+ during labeling or physically removing globin transcripts by affinity bead
														
 
															+ methods 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "Winn2010"
														
 
															+literal "false"
														
 
															-\begin_layout Itemize
														
 
															-N-to-M convergence deserves further study of some kind
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															-\begin_deeper
														
 
															-\begin_layout Itemize
														
 
															-maybe serial activation & rest cycles for naive and memory, showing a cyclical
														
 
															- pattern returning to the same state again and again after the first activation
														
 
															-\end_layout
														
 
															+.
														
 
															+ More recently, using the latest generation of labeling protocols and arrays,
														
 
															+ it was determined that globin reduction was no longer necessary to obtain
														
 
															+ sufficient sensitivity to detect differential transcript expression 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "NuGEN2010"
														
 
															+literal "false"
														
 
															-\end_deeper
														
 
															-\begin_layout Itemize
														
 
															-Study other epigenetic marks in more contexts, including looking for similar
														
 
															- convergence patterns.
														
 
															- Use MOFA to identify coordinated patterns.
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															-\begin_deeper
														
 
															-\begin_layout Itemize
														
 
															-DNA methylation, histone marks, chromatin accessibility & conformation in
														
 
															- CD4 T-cells
														
 
															+.
														
 
															+ However, we are not aware of any publications using these currently available
														
 
															+ protocols the with latest generation of microarrays that actually compare
														
 
															+ the detection sensitivity with and without globin reduction.
														
 
															+ However, in practice this has now been adopted generally primarily driven
														
 
															+ by concerns for cost control.
														
 
															+ The main objective of our work was to directly test the impact of globin
														
 
															+ gene transcripts and a new globin blocking protocol for application to
														
 
															+ the newest generation of differential gene expression profiling determined
														
 
															+ using next generation sequencing.
														
 
															+ 
														
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-Also look at other types of lymphocytes: CD8 T-cells, B-cells, NK cells
														
 
															-\end_layout
														
 
															+\begin_layout Standard
														
 
															+The challenge of doing global gene expression profiling in cynomolgus monkeys
														
 
															+ is that the current available arrays were never designed to comprehensively
														
 
															+ cover this genome and have not been updated since the first assemblies
														
 
															+ of the cynomolgus genome were published.
														
 
															+ Therefore, we determined that the best strategy for peripheral blood profiling
														
 
															+ was to do deep RNA-seq and inform the workflow using the latest available
														
 
															+ genome assembly and annotation 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "Wilson2013"
														
 
															+literal "false"
														
 
															-\end_deeper
														
 
															-\begin_layout Subsection*
														
 
															-Promoter positional coverage: follow up on hints of interesting patterns
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															-\begin_layout Itemize
														
 
															-Also find better normalizations: maybe borrow from MACS/SICER background
														
 
															- correction methods?
														
 
															+.
														
 
															+ However, it was not immediately clear whether globin reduction was necessary
														
 
															+ for RNA-seq or how much improvement in efficiency or sensitivity to detect
														
 
															+ differential gene expression would be achieved for the added cost and work.
														
 
															+ 
														
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
														
 
															- = peak position.
														
 
															- Then correlate with expression.
														
 
															-\end_layout
														
 
															+\begin_layout Standard
														
 
															+We only found one report that demonstrated that globin reduction significantly
														
 
															+ improved the effective read yields for sequencing of human peripheral blood
														
 
															+ cell RNA using a DeepSAGE protocol 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "Mastrokolias2012"
														
 
															+literal "false"
														
 
															-\begin_layout Itemize
														
 
															-Current analysis only at Day 0.
														
 
															- Need to study across time points.
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															-\begin_layout Subsection*
														
 
															-H3K4me correlation
														
 
															+.
														
 
															+ The approach to DeepSAGE involves two different restriction enzymes that
														
 
															+ purify and then tag small fragments of transcripts at specific locations
														
 
															+ and thus, significantly reduces the complexity of the transcriptome.
														
 
															+ Therefore, we could not determine how DeepSAGE results would translate
														
 
															+ to the common strategy in the field for assaying the entire transcript
														
 
															+ population by whole-transcriptome 3’-end RNA-seq.
														
 
															+ Furthermore, if globin reduction is necessary, we also needed a globin
														
 
															+ reduction method specific to cynomolgus globin sequences that would work
														
 
															+ an organism for which no kit is available off the shelf.
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
 
															-The high correlation between coverage depth observed between H3K4me2 and
														
 
															- H3K4me3 is both expected and unexpected.
														
 
															- Since both marks are associated with elevated gene transcription, a positive
														
 
															- correlation between them is not surprising.
														
 
															- However, these two marks represent different post-translational modifications
														
 
															- of the 
														
 
															-\emph on
														
 
															-same
														
 
															-\emph default
														
 
															- lysine residue on the histone H3 polypeptide, which means that they cannot
														
 
															- both be present on the same H3 subunit.
														
 
															- Thus, the high correlation between them has several potential explanations.
														
 
															- One possible reason is cell population heterogeneity: perhaps some genomic
														
 
															- loci are frequently marked with H3K4me2 in some cells, while in other cells
														
 
															- the same loci are marked with H3K4me3.
														
 
															- Another possibility is allele-specific modifications: the loci are marked
														
 
															- in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
														
 
															- allele.
														
 
															- Lastly, since each histone octamer contains 2 H3 subunits, it is possible
														
 
															- that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
														
 
															- represents a distinct epigenetic state with a different function than either
														
 
															- double H3K4me2 or double H3K4me3.
														
 
															+As mentioned above, the addition of globin blocking oligos has a very small
														
 
															+ impact on measured expression levels of gene expression.
														
 
															+ However, this is a non-issue for the purposes of differential expression
														
 
															+ testing, since a systematic change in a gene in all samples does not affect
														
 
															+ relative expression levels between samples.
														
 
															+ However, we must acknowledge that simple comparisons of gene expression
														
 
															+ data obtained by GB and non-GB protocols are not possible without additional
														
 
															+ normalization.
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
 
															-These three hypotheses could be disentangled by single-cell ChIP-seq.
														
 
															- If the correlation between these two histone marks persists even within
														
 
															- the reads for each individual cell, then cell population heterogeneity
														
 
															- cannot explain the correlation.
														
 
															- Allele-specific modification can be tested for by looking at the correlation
														
 
															- between read coverage of the two histone marks at heterozygous loci.
														
 
															- If the correlation between read counts for opposite loci is low, then this
														
 
															- is consistent with allele-specific modification.
														
 
															- Finally if the modifications do not separate by either cell or allele,
														
 
															- the colocation of these two marks is most likely occurring at the level
														
 
															- of individual histones, with the heterogenously modified histone representing
														
 
															- a distinct state.
														
 
															- 
														
 
															+More importantly, globin blocking not only nearly doubles the yield of usable
														
 
															+ reads, it also increases inter-sample correlation and sensitivity to detect
														
 
															+ differential gene expression relative to the same set of samples profiled
														
 
															+ without blocking.
														
 
															+ In addition, globin blocking does not add a significant amount of random
														
 
															+ noise to the data.
														
 
															+ Globin blocking thus represents a cost-effective way to squeeze more data
														
 
															+ and statistical power out of the same blood samples and the same amount
														
 
															+ of sequencing.
														
 
															+ In conclusion, globin reduction greatly increases the yield of useful RNA-seq
														
 
															+ reads mapping to the rest of the genome, with minimal perturbations in
														
 
															+ the relative levels of non-globin genes.
														
 
															+ Based on these results, globin transcript reduction using sequence-specific,
														
 
															+ complementary blocking oligonucleotides is recommended for all deep RNA-seq
														
 
															+ of cynomolgus and other nonhuman primate blood samples.
														
 
															 \end_layout
														
 
															-\begin_layout Standard
														
 
															-However, another experiment would be required to show direct evidence of
														
 
															- such a heterogeneously modified state.
														
 
															- Specifically a 
														
 
															-\begin_inset Quotes eld
														
 
															-\end_inset
														
 
															-
														
 
															-double ChIP
														
 
															-\begin_inset Quotes erd
														
 
															-\end_inset
														
 
															-
														
 
															- experiment would need to be performed, where the input DNA is first subjected
														
 
															- to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
														
 
															- then the enriched material is collected, with proteins still bound, and
														
 
															- immunoprecipitated 
														
 
															-\emph on
														
 
															-again
														
 
															-\emph default
														
 
															- using the anti-H3K4me3 antibody.
														
 
															- If this yields significant numbers of non-artifactual reads in the same
														
 
															- regions as the individual pulldowns of the two marks, this is strong evidence
														
 
															- that the two marks are occurring on opposite H3 subunits of the same histones.
														
 
															+\begin_layout Section
														
 
															+Future Directions
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
@@ -14047,11 +14134,9 @@ again
 
															 status open
														
 
															 \begin_layout Plain Layout
														
 
															-Try to see if double ChIP-seq is actually feasible, and if not, come up
														
 
															- with some other idea for directly detecting the mixed mod state.
														
 
															- Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
														
 
															-on.
														
 
															- That's one possible angle.
														
 
															+I've already done a good bit of work outside just this globin blocking thing,
														
 
															+ so I'm not sure what to put for future directions.
														
 
															+ Does it inculde the other stuff I've done but not published?
														
 
															 \end_layout
														
 
															 \end_inset
														
@@ -14059,20 +14144,8 @@ on.
 
															 \end_layout
														
 
															-\begin_layout Section*
														
 
															-Ch3
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Itemize
														
 
															-Use CV or bootstrap to better evaluate classifiers
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Itemize
														
 
															-fRMAtools could be adapted to not require equal-sized groups
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Section*
														
 
															-Ch4
														
 
															+\begin_layout Chapter
														
 
															+Future Directions
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
@@ -14080,9 +14153,9 @@ Ch4
 
															 status open
														
 
															 \begin_layout Plain Layout
														
 
															-I've already done a good bit of work outside just this globin blocking thing,
														
 
															- so I'm not sure what to put for future directions.
														
 
															- Does it inculde the other stuff I've done but not published?
														
 
															+If there are any chapter-independent future directions, put them here.
														
 
															+ Otherwise, delete this section.
														
 
															+ Check in the directions if this is OK.
														
 
															 \end_layout
														
 
															 \end_inset