|
@@ -6542,6 +6542,413 @@ Is this needed?
|
|
|
\end_inset
|
|
|
|
|
|
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Section
|
|
|
+Future Directions
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
|
|
|
+ ways a preliminary study that suggests a multitude of new avenues of investigat
|
|
|
+ion.
|
|
|
+ Here we consider a selection of such avenues.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Subsection
|
|
|
+Improve on the idea of an effective promoter radius
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+This study introduced the concept of an
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+effective promoter radius
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ specific to each histone mark based on distince from the TSS within which
|
|
|
+ an excess of peaks was called for that mark.
|
|
|
+ This concept was then used to guide further analyses throughout the study.
|
|
|
+ However, while the effective promoter radius was useful in those analyses,
|
|
|
+ it is both limited in theory and shown in practice to be a possible oversimplif
|
|
|
+ication.
|
|
|
+ First, the effective promoter radii used in this study were chosen based
|
|
|
+ on manual inspection of the TSS-to-peak distance distributions in Figure
|
|
|
+
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "fig:near-promoter-peak-enrich"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+, selecting round numbers of analyst convenience (Table
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "tab:effective-promoter-radius"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+).
|
|
|
+ It would be better to define an algorithm that selects a more precise radius
|
|
|
+ based on the features of the graph.
|
|
|
+ One possible way to do this would be to randomly rearrange the called peaks
|
|
|
+ throughout the genome many (while preserving the distribution of peak widths)
|
|
|
+ and re-generate the same plot as in Figure
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "fig:near-promoter-peak-enrich"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+.
|
|
|
+ This would yield a better
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+background
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ distribution that demonstrates the degree of near-TSS enrichment that would
|
|
|
+ be expected by random chance.
|
|
|
+ The effective promoter radius could be defined as the point where the true
|
|
|
+ distribution diverges from the randomized background distribution.
|
|
|
+
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+Furthermore, the above definition of effective promoter radius has the significa
|
|
|
+nt limitation of being based on the peak calling method.
|
|
|
+ It is thus very sensitive to the choice of peak caller and significance
|
|
|
+ threshold for calling peaks, as well as the degree of saturation in the
|
|
|
+ sequencing.
|
|
|
+ Calling peaks from ChIP-seq samples with insufficient coverage depth, with
|
|
|
+ the wrong peak caller, or with a different significance threshold could
|
|
|
+ give a drastically different number of called peaks, and hence a drastically
|
|
|
+ different distribution of peak-to-TSS distances.
|
|
|
+ To address this, it is desirable to develop a better method of determining
|
|
|
+ the effective promoter radius that relies only on the distribution of read
|
|
|
+ coverage around the TSS, independent of the peak calling.
|
|
|
+ Furthermore, as demonstrated by the upstream-downstream asymmetries observed
|
|
|
+ in Figures
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "fig:H3K4me2-neighborhood"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+,
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "fig:H3K4me3-neighborhood"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+, and
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "fig:H3K27me3-neighborhood"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+, this definition should determine a different radius for the upstream and
|
|
|
+ downstream directions.
|
|
|
+ At this point, it may be better to rename this concept
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+effective promoter extent
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ and avoid the word
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+radius
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+, since a radius implies a symmetry about the TSS that is not supported
|
|
|
+ by the data.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+Beyond improving the definition of effective promoter extent, functional
|
|
|
+ validation is necessary to show that this measure of near-TSS enrichment
|
|
|
+ has biological meaning.
|
|
|
+ Figures
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "fig:H3K4me2-neighborhood"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ and
|
|
|
+\begin_inset CommandInset ref
|
|
|
+LatexCommand ref
|
|
|
+reference "fig:H3K4me3-neighborhood"
|
|
|
+plural "false"
|
|
|
+caps "false"
|
|
|
+noprefix "false"
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ already provide a very limited functional validation of the chosen promoter
|
|
|
+ extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
|
|
|
+ this region are most strongly correlated with elevated gene expression.
|
|
|
+ However, there are other ways to show functional relevance of the promoter
|
|
|
+ extent.
|
|
|
+ For example, correlations could be computed between read counts in peaks
|
|
|
+ nearby gene promoters and the expression level of those genes, and these
|
|
|
+ correlations could be plotted against the distance of the peak upstream
|
|
|
+ or downstream of the gene's TSS.
|
|
|
+ If the promoter extent truly defines a
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+sphere of influence
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ within which a histone mark is involved with the regulation of a gene,
|
|
|
+ then the correlations for peaks within this extent should be significantly
|
|
|
+ higher than those further upstream or downstream.
|
|
|
+ Peaks within these extents may also be more likely to show differential
|
|
|
+ modification than those outside genic regions of the genome.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Subsection
|
|
|
+Design experiments to focus on post-activation convergence of naive & memory
|
|
|
+ cells
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+In this study, a convergence between naive and memory cells was observed
|
|
|
+ in both the pattern of gene expression and in epigenetic state of the 3
|
|
|
+ histone marks studied, consistent with the hypothesis that any naive cells
|
|
|
+ remaining 14 days after activation have differentiated into memory cells,
|
|
|
+ and that both gene expression and these histone marks are involved in this
|
|
|
+ differentiation.
|
|
|
+ However, the current study was not designed with this specific hypothesis
|
|
|
+ in mind, and it therefore has some deficiencies with regard to testing
|
|
|
+ it.
|
|
|
+ The memory CD4 samples at day 14 do not resemble the memory samples at
|
|
|
+ day 0, indicating that in the specific model of activation used for this
|
|
|
+ experiment, the cells are not guaranteed to return to their original pre-activa
|
|
|
+tion state, or perhaps this process takes substantially longer than 14 days.
|
|
|
+ This is a challenge for the convergence hypothesis because the ideal comparison
|
|
|
+ to prove that naive cells are converging to a resting memory state would
|
|
|
+ be to compare the final naive time point to the Day 0 memory samples, but
|
|
|
+ this comparison is only meaningful if memory cells generally return to
|
|
|
+ the same
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+resting
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ state that they started at.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+To better study the convergence hypothesis, a new experiment should be designed
|
|
|
+ using a model system for T-cell activation that is known to allow cells
|
|
|
+ to return as closely as possible to their pre-activation state.
|
|
|
+ Alternatively, if it is not possible to find or design such a model system,
|
|
|
+ the same cell cultures could be activated serially multiple times, and
|
|
|
+ sequenced after each activation cycle right before the next activation.
|
|
|
+ It is likely that several activations in the same model system will settle
|
|
|
+ into a cylical pattern, converging to a consistent
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+resting
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ state after each activation, even if this state is different from the initial
|
|
|
+ resting state at Day 0.
|
|
|
+ If so, it will be possible to compare the final states of both naive and
|
|
|
+ memory cells to show that they converge despite different initial conditions.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+In addition, if naive-to-memory convergence is a general pattern, it should
|
|
|
+ also be detectable in other epigenetic marks, including other histone marks
|
|
|
+ and DNA methylation.
|
|
|
+ An experiment should be designed studying a large number of epigenetic
|
|
|
+ marks known or suspected to be involved in regulation of gene expression,
|
|
|
+ assaying all of these at the same pre- and post-activation time points.
|
|
|
+ Multi-dataset factor analysis methods like MOFA can then be used to identify
|
|
|
+ coordinated patterns of regulation shared across many epigenetic marks.
|
|
|
+ If possible, some
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+negative control
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ marks should be included that are known
|
|
|
+\emph on
|
|
|
+not
|
|
|
+\emph default
|
|
|
+ to be involved in T-cell activation or memory formation.
|
|
|
+ Of course, CD4 T-cells are not the only adaptive immune cells with memory.
|
|
|
+ A similar study could be designed for CD8 T-cells, B-cells, and even specific
|
|
|
+ subsets of CD4 T-cells.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Subsection
|
|
|
+Follow up on hints of interesting patterns in promoter relative coverage
|
|
|
+ profiles
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
+status open
|
|
|
+
|
|
|
+\begin_layout Plain Layout
|
|
|
+I think I might need to write up the negative results for the Promoter CpG
|
|
|
+ and defined pattern analysis before writing this section.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Itemize
|
|
|
+Also find better normalizations: maybe borrow from MACS/SICER background
|
|
|
+ correction methods?
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Itemize
|
|
|
+For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
|
|
|
+ = peak position.
|
|
|
+ Then correlate with expression.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Itemize
|
|
|
+Current analysis only at Day 0.
|
|
|
+ Need to study across time points.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Itemize
|
|
|
+Integrating data across so many dimensions is a significant analysis challenge
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Subsection
|
|
|
+Investigate causes of high correlation between mutually exclusive histone
|
|
|
+ marks
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+The high correlation between coverage depth observed between H3K4me2 and
|
|
|
+ H3K4me3 is both expected and unexpected.
|
|
|
+ Since both marks are associated with elevated gene transcription, a positive
|
|
|
+ correlation between them is not surprising.
|
|
|
+ However, these two marks represent different post-translational modifications
|
|
|
+ of the
|
|
|
+\emph on
|
|
|
+same
|
|
|
+\emph default
|
|
|
+ lysine residue on the histone H3 polypeptide, which means that they cannot
|
|
|
+ both be present on the same H3 subunit.
|
|
|
+ Thus, the high correlation between them has several potential explanations.
|
|
|
+ One possible reason is cell population heterogeneity: perhaps some genomic
|
|
|
+ loci are frequently marked with H3K4me2 in some cells, while in other cells
|
|
|
+ the same loci are marked with H3K4me3.
|
|
|
+ Another possibility is allele-specific modifications: the loci are marked
|
|
|
+ in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
|
|
|
+ allele.
|
|
|
+ Lastly, since each histone octamer contains 2 H3 subunits, it is possible
|
|
|
+ that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
|
|
|
+ represents a distinct epigenetic state with a different function than either
|
|
|
+ double H3K4me2 or double H3K4me3.
|
|
|
+
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+These three hypotheses could be disentangled by single-cell ChIP-seq.
|
|
|
+ If the correlation between these two histone marks persists even within
|
|
|
+ the reads for each individual cell, then cell population heterogeneity
|
|
|
+ cannot explain the correlation.
|
|
|
+ Allele-specific modification can be tested for by looking at the correlation
|
|
|
+ between read coverage of the two histone marks at heterozygous loci.
|
|
|
+ If the correlation between read counts for opposite loci is low, then this
|
|
|
+ is consistent with allele-specific modification.
|
|
|
+ Finally if the modifications do not separate by either cell or allele,
|
|
|
+ the colocation of these two marks is most likely occurring at the level
|
|
|
+ of individual histones, with the heterogenously modified histone representing
|
|
|
+ a distinct state.
|
|
|
+
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+However, another experiment would be required to show direct evidence of
|
|
|
+ such a heterogeneously modified state.
|
|
|
+ Specifically a
|
|
|
+\begin_inset Quotes eld
|
|
|
+\end_inset
|
|
|
+
|
|
|
+double ChIP
|
|
|
+\begin_inset Quotes erd
|
|
|
+\end_inset
|
|
|
+
|
|
|
+ experiment would need to be performed, where the input DNA is first subjected
|
|
|
+ to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
|
|
|
+ then the enriched material is collected, with proteins still bound, and
|
|
|
+ immunoprecipitated
|
|
|
+\emph on
|
|
|
+again
|
|
|
+\emph default
|
|
|
+ using the anti-H3K4me3 antibody.
|
|
|
+ If this yields significant numbers of non-artifactual reads in the same
|
|
|
+ regions as the individual pulldowns of the two marks, this is strong evidence
|
|
|
+ that the two marks are occurring on opposite H3 subunits of the same histones.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
+status open
|
|
|
+
|
|
|
+\begin_layout Plain Layout
|
|
|
+Try to see if double ChIP-seq is actually feasible, and if not, come up
|
|
|
+ with some other idea for directly detecting the mixed mod state.
|
|
|
+ Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
|
|
|
+on.
|
|
|
+ That's one possible angle.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\end_inset
|
|
|
+
|
|
|
+
|
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Chapter
|
|
@@ -11223,7 +11630,7 @@ researcher degree of freedom
|
|
|
on the choice of batch size based on vague selection criteria and instinct,
|
|
|
which can unintentionally inproduce bias if the researcher chooses a batch
|
|
|
size based on what seems to yield the most favorable downstream results
|
|
|
-
|
|
|
+
|
|
|
\begin_inset CommandInset citation
|
|
|
LatexCommand cite
|
|
|
key "Simmons2011"
|
|
@@ -11278,6 +11685,26 @@ noprefix "false"
|
|
|
parameter's estimation.
|
|
|
\end_layout
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
+methyl array stuff
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+The current study has showed that DNA methylation, as assayed by Illumina
|
|
|
+ 450k methylation arrays, has some potential for diagnosing transplant dysfuncti
|
|
|
+ons, including rejection.
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Itemize
|
|
|
+Eliminate the need for SVA, since it can't be applied in ML context.
|
|
|
+
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Itemize
|
|
|
+Alternatively, use SVA to identify and discard probes with strong SV association
|
|
|
+s prior to training.
|
|
|
+\end_layout
|
|
|
+
|
|
|
\begin_layout Chapter
|
|
|
Globin-blocking for more effective blood RNA-seq analysis in primate animal
|
|
|
model
|
|
@@ -13229,188 +13656,12 @@ Globin-Blocking
|
|
|
\begin_layout Plain Layout
|
|
|
|
|
|
\series bold
|
|
|
-Up
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\family roman
|
|
|
-\series medium
|
|
|
-\shape up
|
|
|
-\size normal
|
|
|
-\emph off
|
|
|
-\bar no
|
|
|
-\strikeout off
|
|
|
-\xout off
|
|
|
-\uuline off
|
|
|
-\uwave off
|
|
|
-\noun off
|
|
|
-\color none
|
|
|
-231
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\family roman
|
|
|
-\series medium
|
|
|
-\shape up
|
|
|
-\size normal
|
|
|
-\emph off
|
|
|
-\bar no
|
|
|
-\strikeout off
|
|
|
-\xout off
|
|
|
-\uuline off
|
|
|
-\uwave off
|
|
|
-\noun off
|
|
|
-\color none
|
|
|
-515
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\family roman
|
|
|
-\series medium
|
|
|
-\shape up
|
|
|
-\size normal
|
|
|
-\emph off
|
|
|
-\bar no
|
|
|
-\strikeout off
|
|
|
-\xout off
|
|
|
-\uuline off
|
|
|
-\uwave off
|
|
|
-\noun off
|
|
|
-\color none
|
|
|
-2
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-</row>
|
|
|
-<row>
|
|
|
-<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\series bold
|
|
|
-NS
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\family roman
|
|
|
-\series medium
|
|
|
-\shape up
|
|
|
-\size normal
|
|
|
-\emph off
|
|
|
-\bar no
|
|
|
-\strikeout off
|
|
|
-\xout off
|
|
|
-\uuline off
|
|
|
-\uwave off
|
|
|
-\noun off
|
|
|
-\color none
|
|
|
-160
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\family roman
|
|
|
-\series medium
|
|
|
-\shape up
|
|
|
-\size normal
|
|
|
-\emph off
|
|
|
-\bar no
|
|
|
-\strikeout off
|
|
|
-\xout off
|
|
|
-\uuline off
|
|
|
-\uwave off
|
|
|
-\noun off
|
|
|
-\color none
|
|
|
-11235
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\family roman
|
|
|
-\series medium
|
|
|
-\shape up
|
|
|
-\size normal
|
|
|
-\emph off
|
|
|
-\bar no
|
|
|
-\strikeout off
|
|
|
-\xout off
|
|
|
-\uuline off
|
|
|
-\uwave off
|
|
|
-\noun off
|
|
|
-\color none
|
|
|
-136
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-</row>
|
|
|
-<row>
|
|
|
-<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
-\begin_inset Text
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\series bold
|
|
|
-Down
|
|
|
+Up
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
@@ -13427,12 +13678,12 @@ Down
|
|
|
\uwave off
|
|
|
\noun off
|
|
|
\color none
|
|
|
-0
|
|
|
+231
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
@@ -13449,12 +13700,12 @@ Down
|
|
|
\uwave off
|
|
|
\noun off
|
|
|
\color none
|
|
|
-548
|
|
|
+515
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
</cell>
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
@@ -13471,575 +13722,411 @@ Down
|
|
|
\uwave off
|
|
|
\noun off
|
|
|
\color none
|
|
|
-127
|
|
|
+2
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
</cell>
|
|
|
</row>
|
|
|
-</lyxtabular>
|
|
|
-
|
|
|
-\end_inset
|
|
|
-
|
|
|
-
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-\begin_inset Caption Standard
|
|
|
-
|
|
|
-\begin_layout Plain Layout
|
|
|
-
|
|
|
-\series bold
|
|
|
-\begin_inset Argument 1
|
|
|
-status open
|
|
|
+<row>
|
|
|
+<cell multirow="4" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
|
-Comparison of significantly differentially expressed genes with and without
|
|
|
- globin blocking.
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\end_inset
|
|
|
-
|
|
|
-
|
|
|
-\begin_inset CommandInset label
|
|
|
-LatexCommand label
|
|
|
-name "tab:Comparison-of-significant"
|
|
|
-
|
|
|
-\end_inset
|
|
|
-
|
|
|
-Comparison of significantly differentially expressed genes with and without
|
|
|
- globin blocking.
|
|
|
|
|
|
-\series default
|
|
|
- Up, Down: Genes significantly up/down-regulated in post-transplant samples
|
|
|
- relative to pre-transplant samples, with a false discovery rate of 10%
|
|
|
- or less.
|
|
|
- NS: Non-significant genes (false discovery rate greater than 10%).
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
-
|
|
|
-
|
|
|
-\end_layout
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
|
|
|
|
+\series bold
|
|
|
+NS
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
+\family roman
|
|
|
+\series medium
|
|
|
+\shape up
|
|
|
+\size normal
|
|
|
+\emph off
|
|
|
+\bar no
|
|
|
+\strikeout off
|
|
|
+\xout off
|
|
|
+\uuline off
|
|
|
+\uwave off
|
|
|
+\noun off
|
|
|
+\color none
|
|
|
+160
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
-To compare performance on differential gene expression tests, we took subsets
|
|
|
- of both the GB and non-GB libraries with exactly one pre-transplant and
|
|
|
- one post-transplant sample for each animal that had paired samples available
|
|
|
- for analysis (N=7 animals, N=14 samples in each subset).
|
|
|
- The same test for pre- vs.
|
|
|
- post-transplant differential gene expression was performed on the same
|
|
|
- 7 pairs of samples from GB libraries and non-GB libraries, in each case
|
|
|
- using an FDR of 10% as the threshold of significance.
|
|
|
- Out of 12954 genes that passed the detection threshold in both subsets,
|
|
|
- 358 were called significantly differentially expressed in the same direction
|
|
|
- in both sets; 1063 were differentially expressed in the GB set only; 296
|
|
|
- were differentially expressed in the non-GB set only; 2 genes were called
|
|
|
- significantly up in the GB set but significantly down in the non-GB set;
|
|
|
- and the remaining 11235 were not called differentially expressed in either
|
|
|
- set.
|
|
|
- These data are summarized in Table
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "tab:Comparison-of-significant"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
-
|
|
|
-\end_inset
|
|
|
-
|
|
|
-.
|
|
|
- The differences in BCV calculated by EdgeR for these subsets of samples
|
|
|
- were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Standard
|
|
|
-The key point is that the GB data results in substantially more differentially
|
|
|
- expressed calls than the non-GB data.
|
|
|
- Since there is no gold standard for this dataset, it is impossible to be
|
|
|
- certain whether this is due to under-calling of differential expression
|
|
|
- in the non-GB samples or over-calling in the GB samples.
|
|
|
- However, given that both datasets are derived from the same biological
|
|
|
- samples and have nearly equal BCVs, it is more likely that the larger number
|
|
|
- of DE calls in the GB samples are genuine detections that were enabled
|
|
|
- by the higher sequencing depth and measurement precision of the GB samples.
|
|
|
- Note that the same set of genes was considered in both subsets, so the
|
|
|
- larger number of differentially expressed gene calls in the GB data set
|
|
|
- reflects a greater sensitivity to detect significant differential gene
|
|
|
- expression and not simply the larger total number of detected genes in
|
|
|
- GB samples described earlier.
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Section
|
|
|
-Discussion
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Standard
|
|
|
-The original experience with whole blood gene expression profiling on DNA
|
|
|
- microarrays demonstrated that the high concentration of globin transcripts
|
|
|
- reduced the sensitivity to detect genes with relatively low expression
|
|
|
- levels, in effect, significantly reducing the sensitivity.
|
|
|
- To address this limitation, commercial protocols for globin reduction were
|
|
|
- developed based on strategies to block globin transcript amplification
|
|
|
- during labeling or physically removing globin transcripts by affinity bead
|
|
|
- methods
|
|
|
-\begin_inset CommandInset citation
|
|
|
-LatexCommand cite
|
|
|
-key "Winn2010"
|
|
|
-literal "false"
|
|
|
-
|
|
|
-\end_inset
|
|
|
-
|
|
|
-.
|
|
|
- More recently, using the latest generation of labeling protocols and arrays,
|
|
|
- it was determined that globin reduction was no longer necessary to obtain
|
|
|
- sufficient sensitivity to detect differential transcript expression
|
|
|
-\begin_inset CommandInset citation
|
|
|
-LatexCommand cite
|
|
|
-key "NuGEN2010"
|
|
|
-literal "false"
|
|
|
-
|
|
|
-\end_inset
|
|
|
-
|
|
|
-.
|
|
|
- However, we are not aware of any publications using these currently available
|
|
|
- protocols the with latest generation of microarrays that actually compare
|
|
|
- the detection sensitivity with and without globin reduction.
|
|
|
- However, in practice this has now been adopted generally primarily driven
|
|
|
- by concerns for cost control.
|
|
|
- The main objective of our work was to directly test the impact of globin
|
|
|
- gene transcripts and a new globin blocking protocol for application to
|
|
|
- the newest generation of differential gene expression profiling determined
|
|
|
- using next generation sequencing.
|
|
|
-
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Standard
|
|
|
-The challenge of doing global gene expression profiling in cynomolgus monkeys
|
|
|
- is that the current available arrays were never designed to comprehensively
|
|
|
- cover this genome and have not been updated since the first assemblies
|
|
|
- of the cynomolgus genome were published.
|
|
|
- Therefore, we determined that the best strategy for peripheral blood profiling
|
|
|
- was to do deep RNA-seq and inform the workflow using the latest available
|
|
|
- genome assembly and annotation
|
|
|
-\begin_inset CommandInset citation
|
|
|
-LatexCommand cite
|
|
|
-key "Wilson2013"
|
|
|
-literal "false"
|
|
|
-
|
|
|
-\end_inset
|
|
|
-
|
|
|
-.
|
|
|
- However, it was not immediately clear whether globin reduction was necessary
|
|
|
- for RNA-seq or how much improvement in efficiency or sensitivity to detect
|
|
|
- differential gene expression would be achieved for the added cost and work.
|
|
|
-
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Standard
|
|
|
-We only found one report that demonstrated that globin reduction significantly
|
|
|
- improved the effective read yields for sequencing of human peripheral blood
|
|
|
- cell RNA using a DeepSAGE protocol
|
|
|
-\begin_inset CommandInset citation
|
|
|
-LatexCommand cite
|
|
|
-key "Mastrokolias2012"
|
|
|
-literal "false"
|
|
|
-
|
|
|
\end_inset
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
-.
|
|
|
- The approach to DeepSAGE involves two different restriction enzymes that
|
|
|
- purify and then tag small fragments of transcripts at specific locations
|
|
|
- and thus, significantly reduces the complexity of the transcriptome.
|
|
|
- Therefore, we could not determine how DeepSAGE results would translate
|
|
|
- to the common strategy in the field for assaying the entire transcript
|
|
|
- population by whole-transcriptome 3’-end RNA-seq.
|
|
|
- Furthermore, if globin reduction is necessary, we also needed a globin
|
|
|
- reduction method specific to cynomolgus globin sequences that would work
|
|
|
- an organism for which no kit is available off the shelf.
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Standard
|
|
|
-As mentioned above, the addition of globin blocking oligos has a very small
|
|
|
- impact on measured expression levels of gene expression.
|
|
|
- However, this is a non-issue for the purposes of differential expression
|
|
|
- testing, since a systematic change in a gene in all samples does not affect
|
|
|
- relative expression levels between samples.
|
|
|
- However, we must acknowledge that simple comparisons of gene expression
|
|
|
- data obtained by GB and non-GB protocols are not possible without additional
|
|
|
- normalization.
|
|
|
-
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Standard
|
|
|
-More importantly, globin blocking not only nearly doubles the yield of usable
|
|
|
- reads, it also increases inter-sample correlation and sensitivity to detect
|
|
|
- differential gene expression relative to the same set of samples profiled
|
|
|
- without blocking.
|
|
|
- In addition, globin blocking does not add a significant amount of random
|
|
|
- noise to the data.
|
|
|
- Globin blocking thus represents a cost-effective way to squeeze more data
|
|
|
- and statistical power out of the same blood samples and the same amount
|
|
|
- of sequencing.
|
|
|
- In conclusion, globin reduction greatly increases the yield of useful RNA-seq
|
|
|
- reads mapping to the rest of the genome, with minimal perturbations in
|
|
|
- the relative levels of non-globin genes.
|
|
|
- Based on these results, globin transcript reduction using sequence-specific,
|
|
|
- complementary blocking oligonucleotides is recommended for all deep RNA-seq
|
|
|
- of cynomolgus and other nonhuman primate blood samples.
|
|
|
-\end_layout
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
-\begin_layout Chapter
|
|
|
-Future Directions
|
|
|
+\family roman
|
|
|
+\series medium
|
|
|
+\shape up
|
|
|
+\size normal
|
|
|
+\emph off
|
|
|
+\bar no
|
|
|
+\strikeout off
|
|
|
+\xout off
|
|
|
+\uuline off
|
|
|
+\uwave off
|
|
|
+\noun off
|
|
|
+\color none
|
|
|
+11235
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
-status open
|
|
|
+\end_inset
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
|
-Consider putting each chapter's future directions with that chapter instead
|
|
|
- of in a separate one.
|
|
|
- Check instructions to see if this is allowed/appropriate.
|
|
|
+
|
|
|
+\family roman
|
|
|
+\series medium
|
|
|
+\shape up
|
|
|
+\size normal
|
|
|
+\emph off
|
|
|
+\bar no
|
|
|
+\strikeout off
|
|
|
+\xout off
|
|
|
+\uuline off
|
|
|
+\uwave off
|
|
|
+\noun off
|
|
|
+\color none
|
|
|
+136
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
+</cell>
|
|
|
+</row>
|
|
|
+<row>
|
|
|
+<cell multirow="4" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Section*
|
|
|
-Ch2
|
|
|
-\end_layout
|
|
|
+\end_inset
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
-The analysis of RNA-seq and ChIP-seq in CD4 T-cells in Chapter 2 is in many
|
|
|
- ways a preliminary study that suggests a multitude of new avenues of investigat
|
|
|
-ion.
|
|
|
- Here we consider a selection of such avenues.
|
|
|
-\end_layout
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
-\begin_layout Subsection*
|
|
|
-Improving on the effective promoter radius
|
|
|
+\series bold
|
|
|
+Down
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
-This study introduced the concept of an
|
|
|
-\begin_inset Quotes eld
|
|
|
\end_inset
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
-effective promoter radius
|
|
|
-\begin_inset Quotes erd
|
|
|
-\end_inset
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
- specific to each histone mark based on distince from the TSS within which
|
|
|
- an excess of peaks was called for that mark.
|
|
|
- This concept was then used to guide further analyses throughout the study.
|
|
|
- However, while the effective promoter radius was useful in those analyses,
|
|
|
- it is both limited in theory and shown in practice to be a possible oversimplif
|
|
|
-ication.
|
|
|
- First, the effective promoter radii used in this study were chosen based
|
|
|
- on manual inspection of the TSS-to-peak distance distributions in Figure
|
|
|
-
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "fig:near-promoter-peak-enrich"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
+\family roman
|
|
|
+\series medium
|
|
|
+\shape up
|
|
|
+\size normal
|
|
|
+\emph off
|
|
|
+\bar no
|
|
|
+\strikeout off
|
|
|
+\xout off
|
|
|
+\uuline off
|
|
|
+\uwave off
|
|
|
+\noun off
|
|
|
+\color none
|
|
|
+0
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
-, selecting round numbers of analyst convenience (Table
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "tab:effective-promoter-radius"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
+\begin_layout Plain Layout
|
|
|
+
|
|
|
+\family roman
|
|
|
+\series medium
|
|
|
+\shape up
|
|
|
+\size normal
|
|
|
+\emph off
|
|
|
+\bar no
|
|
|
+\strikeout off
|
|
|
+\xout off
|
|
|
+\uuline off
|
|
|
+\uwave off
|
|
|
+\noun off
|
|
|
+\color none
|
|
|
+548
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
+</cell>
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
+\begin_inset Text
|
|
|
|
|
|
-).
|
|
|
- It would be better to define an algorithm that selects a more precise radius
|
|
|
- based on the features of the graph.
|
|
|
- One possible way to do this would be to randomly rearrange the called peaks
|
|
|
- throughout the genome many (while preserving the distribution of peak widths)
|
|
|
- and re-generate the same plot as in Figure
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "fig:near-promoter-peak-enrich"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
-\end_inset
|
|
|
+\family roman
|
|
|
+\series medium
|
|
|
+\shape up
|
|
|
+\size normal
|
|
|
+\emph off
|
|
|
+\bar no
|
|
|
+\strikeout off
|
|
|
+\xout off
|
|
|
+\uuline off
|
|
|
+\uwave off
|
|
|
+\noun off
|
|
|
+\color none
|
|
|
+127
|
|
|
+\end_layout
|
|
|
|
|
|
-.
|
|
|
- This would yield a better
|
|
|
-\begin_inset Quotes eld
|
|
|
\end_inset
|
|
|
+</cell>
|
|
|
+</row>
|
|
|
+</lyxtabular>
|
|
|
|
|
|
-background
|
|
|
-\begin_inset Quotes erd
|
|
|
\end_inset
|
|
|
|
|
|
- distribution that demonstrates the degree of near-TSS enrichment that would
|
|
|
- be expected by random chance.
|
|
|
- The effective promoter radius could be defined as the point where the true
|
|
|
- distribution diverges from the randomized background distribution.
|
|
|
-
|
|
|
+
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
-Furthermore, the above definition of effective promoter radius has the significa
|
|
|
-nt limitation of being based on the peak calling method.
|
|
|
- It is thus very sensitive to the choice of peak caller and significance
|
|
|
- threshold for calling peaks, as well as the degree of saturation in the
|
|
|
- sequencing.
|
|
|
- Calling peaks from ChIP-seq samples with insufficient coverage depth, with
|
|
|
- the wrong peak caller, or with a different significance threshold could
|
|
|
- give a drastically different number of called peaks, and hence a drastically
|
|
|
- different distribution of peak-to-TSS distances.
|
|
|
- To address this, it is desirable to develop a better method of determining
|
|
|
- the effective promoter radius that relies only on the distribution of read
|
|
|
- coverage around the TSS, independent of the peak calling.
|
|
|
- Furthermore, as demonstrated by the upstream-downstream asymmetries observed
|
|
|
- in Figures
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "fig:H3K4me2-neighborhood"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
+\begin_layout Plain Layout
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
-\end_inset
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
-,
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "fig:H3K4me3-neighborhood"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
+\series bold
|
|
|
+\begin_inset Argument 1
|
|
|
+status open
|
|
|
+
|
|
|
+\begin_layout Plain Layout
|
|
|
+Comparison of significantly differentially expressed genes with and without
|
|
|
+ globin blocking.
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
|
|
|
-, and
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "fig:H3K27me3-neighborhood"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
|
|
|
-\end_inset
|
|
|
+\begin_inset CommandInset label
|
|
|
+LatexCommand label
|
|
|
+name "tab:Comparison-of-significant"
|
|
|
|
|
|
-, this definition should determine a different radius for the upstream and
|
|
|
- downstream directions.
|
|
|
- At this point, it may be better to rename this concept
|
|
|
-\begin_inset Quotes eld
|
|
|
\end_inset
|
|
|
|
|
|
-effective promoter extent
|
|
|
-\begin_inset Quotes erd
|
|
|
-\end_inset
|
|
|
+Comparison of significantly differentially expressed genes with and without
|
|
|
+ globin blocking.
|
|
|
|
|
|
- and avoid the word
|
|
|
-\begin_inset Quotes eld
|
|
|
-\end_inset
|
|
|
+\series default
|
|
|
+ Up, Down: Genes significantly up/down-regulated in post-transplant samples
|
|
|
+ relative to pre-transplant samples, with a false discovery rate of 10%
|
|
|
+ or less.
|
|
|
+ NS: Non-significant genes (false discovery rate greater than 10%).
|
|
|
+\end_layout
|
|
|
|
|
|
-radius
|
|
|
-\begin_inset Quotes erd
|
|
|
\end_inset
|
|
|
|
|
|
-, since a radius implies a symmetry about the TSS that is not supported
|
|
|
- by the data.
|
|
|
+
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
-Beyond improving the definition of effective promoter extent, functional
|
|
|
- validation is necessary to show that this measure of near-TSS enrichment
|
|
|
- has biological meaning.
|
|
|
- Figures
|
|
|
-\begin_inset CommandInset ref
|
|
|
-LatexCommand ref
|
|
|
-reference "fig:H3K4me2-neighborhood"
|
|
|
-plural "false"
|
|
|
-caps "false"
|
|
|
-noprefix "false"
|
|
|
+\begin_layout Plain Layout
|
|
|
+
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
|
|
|
|
- and
|
|
|
+
|
|
|
+\end_layout
|
|
|
+
|
|
|
+\begin_layout Standard
|
|
|
+To compare performance on differential gene expression tests, we took subsets
|
|
|
+ of both the GB and non-GB libraries with exactly one pre-transplant and
|
|
|
+ one post-transplant sample for each animal that had paired samples available
|
|
|
+ for analysis (N=7 animals, N=14 samples in each subset).
|
|
|
+ The same test for pre- vs.
|
|
|
+ post-transplant differential gene expression was performed on the same
|
|
|
+ 7 pairs of samples from GB libraries and non-GB libraries, in each case
|
|
|
+ using an FDR of 10% as the threshold of significance.
|
|
|
+ Out of 12954 genes that passed the detection threshold in both subsets,
|
|
|
+ 358 were called significantly differentially expressed in the same direction
|
|
|
+ in both sets; 1063 were differentially expressed in the GB set only; 296
|
|
|
+ were differentially expressed in the non-GB set only; 2 genes were called
|
|
|
+ significantly up in the GB set but significantly down in the non-GB set;
|
|
|
+ and the remaining 11235 were not called differentially expressed in either
|
|
|
+ set.
|
|
|
+ These data are summarized in Table
|
|
|
\begin_inset CommandInset ref
|
|
|
LatexCommand ref
|
|
|
-reference "fig:H3K4me3-neighborhood"
|
|
|
+reference "tab:Comparison-of-significant"
|
|
|
plural "false"
|
|
|
caps "false"
|
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
|
|
|
|
- already provide a very limited functional validation of the chosen promoter
|
|
|
- extents for H3K4me2 and H3K4me3 by showing that spikes in coverage within
|
|
|
- this region are most strongly correlated with elevated gene expression.
|
|
|
- However, there are other ways to show functional relevance of the promoter
|
|
|
- extent.
|
|
|
- For example, correlations could be computed between read counts in peaks
|
|
|
- nearby gene promoters and the expression level of those genes, and these
|
|
|
- correlations could be plotted against the distance of the peak upstream
|
|
|
- or downstream of the gene's TSS.
|
|
|
- If the promoter extent truly defines a
|
|
|
-\begin_inset Quotes eld
|
|
|
-\end_inset
|
|
|
-
|
|
|
-sphere of influence
|
|
|
-\begin_inset Quotes erd
|
|
|
-\end_inset
|
|
|
+.
|
|
|
+ The differences in BCV calculated by EdgeR for these subsets of samples
|
|
|
+ were negligible (BCV = 0.302 for GB and 0.297 for non-GB).
|
|
|
+\end_layout
|
|
|
|
|
|
- within which a histone mark is involved with the regulation of a gene,
|
|
|
- then the correlations for peaks within this extent should be significantly
|
|
|
- higher than those further upstream or downstream.
|
|
|
- Peaks within these extents may also be more likely to show differential
|
|
|
- modification than those outside genic regions of the genome.
|
|
|
+\begin_layout Standard
|
|
|
+The key point is that the GB data results in substantially more differentially
|
|
|
+ expressed calls than the non-GB data.
|
|
|
+ Since there is no gold standard for this dataset, it is impossible to be
|
|
|
+ certain whether this is due to under-calling of differential expression
|
|
|
+ in the non-GB samples or over-calling in the GB samples.
|
|
|
+ However, given that both datasets are derived from the same biological
|
|
|
+ samples and have nearly equal BCVs, it is more likely that the larger number
|
|
|
+ of DE calls in the GB samples are genuine detections that were enabled
|
|
|
+ by the higher sequencing depth and measurement precision of the GB samples.
|
|
|
+ Note that the same set of genes was considered in both subsets, so the
|
|
|
+ larger number of differentially expressed gene calls in the GB data set
|
|
|
+ reflects a greater sensitivity to detect significant differential gene
|
|
|
+ expression and not simply the larger total number of detected genes in
|
|
|
+ GB samples described earlier.
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Subsection*
|
|
|
-Post-activation convergence of naive & memory cells
|
|
|
+\begin_layout Section
|
|
|
+Discussion
|
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
|
-In this study, a convergence between naive and memory cells was observed
|
|
|
- in both the pattern of gene expression and in epigenetic state of the 3
|
|
|
- histone marks studied.
|
|
|
-\end_layout
|
|
|
+The original experience with whole blood gene expression profiling on DNA
|
|
|
+ microarrays demonstrated that the high concentration of globin transcripts
|
|
|
+ reduced the sensitivity to detect genes with relatively low expression
|
|
|
+ levels, in effect, significantly reducing the sensitivity.
|
|
|
+ To address this limitation, commercial protocols for globin reduction were
|
|
|
+ developed based on strategies to block globin transcript amplification
|
|
|
+ during labeling or physically removing globin transcripts by affinity bead
|
|
|
+ methods
|
|
|
+\begin_inset CommandInset citation
|
|
|
+LatexCommand cite
|
|
|
+key "Winn2010"
|
|
|
+literal "false"
|
|
|
|
|
|
-\begin_layout Itemize
|
|
|
-N-to-M convergence deserves further study of some kind
|
|
|
-\end_layout
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_deeper
|
|
|
-\begin_layout Itemize
|
|
|
-maybe serial activation & rest cycles for naive and memory, showing a cyclical
|
|
|
- pattern returning to the same state again and again after the first activation
|
|
|
-\end_layout
|
|
|
+.
|
|
|
+ More recently, using the latest generation of labeling protocols and arrays,
|
|
|
+ it was determined that globin reduction was no longer necessary to obtain
|
|
|
+ sufficient sensitivity to detect differential transcript expression
|
|
|
+\begin_inset CommandInset citation
|
|
|
+LatexCommand cite
|
|
|
+key "NuGEN2010"
|
|
|
+literal "false"
|
|
|
|
|
|
-\end_deeper
|
|
|
-\begin_layout Itemize
|
|
|
-Study other epigenetic marks in more contexts, including looking for similar
|
|
|
- convergence patterns.
|
|
|
- Use MOFA to identify coordinated patterns.
|
|
|
-\end_layout
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_deeper
|
|
|
-\begin_layout Itemize
|
|
|
-DNA methylation, histone marks, chromatin accessibility & conformation in
|
|
|
- CD4 T-cells
|
|
|
+.
|
|
|
+ However, we are not aware of any publications using these currently available
|
|
|
+ protocols the with latest generation of microarrays that actually compare
|
|
|
+ the detection sensitivity with and without globin reduction.
|
|
|
+ However, in practice this has now been adopted generally primarily driven
|
|
|
+ by concerns for cost control.
|
|
|
+ The main objective of our work was to directly test the impact of globin
|
|
|
+ gene transcripts and a new globin blocking protocol for application to
|
|
|
+ the newest generation of differential gene expression profiling determined
|
|
|
+ using next generation sequencing.
|
|
|
+
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Itemize
|
|
|
-Also look at other types of lymphocytes: CD8 T-cells, B-cells, NK cells
|
|
|
-\end_layout
|
|
|
+\begin_layout Standard
|
|
|
+The challenge of doing global gene expression profiling in cynomolgus monkeys
|
|
|
+ is that the current available arrays were never designed to comprehensively
|
|
|
+ cover this genome and have not been updated since the first assemblies
|
|
|
+ of the cynomolgus genome were published.
|
|
|
+ Therefore, we determined that the best strategy for peripheral blood profiling
|
|
|
+ was to do deep RNA-seq and inform the workflow using the latest available
|
|
|
+ genome assembly and annotation
|
|
|
+\begin_inset CommandInset citation
|
|
|
+LatexCommand cite
|
|
|
+key "Wilson2013"
|
|
|
+literal "false"
|
|
|
|
|
|
-\end_deeper
|
|
|
-\begin_layout Subsection*
|
|
|
-Promoter positional coverage: follow up on hints of interesting patterns
|
|
|
-\end_layout
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Itemize
|
|
|
-Also find better normalizations: maybe borrow from MACS/SICER background
|
|
|
- correction methods?
|
|
|
+.
|
|
|
+ However, it was not immediately clear whether globin reduction was necessary
|
|
|
+ for RNA-seq or how much improvement in efficiency or sensitivity to detect
|
|
|
+ differential gene expression would be achieved for the added cost and work.
|
|
|
+
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Itemize
|
|
|
-For H3K4, define polar coordinates based on PC1 & 2: R = peak size, Theta
|
|
|
- = peak position.
|
|
|
- Then correlate with expression.
|
|
|
-\end_layout
|
|
|
+\begin_layout Standard
|
|
|
+We only found one report that demonstrated that globin reduction significantly
|
|
|
+ improved the effective read yields for sequencing of human peripheral blood
|
|
|
+ cell RNA using a DeepSAGE protocol
|
|
|
+\begin_inset CommandInset citation
|
|
|
+LatexCommand cite
|
|
|
+key "Mastrokolias2012"
|
|
|
+literal "false"
|
|
|
|
|
|
-\begin_layout Itemize
|
|
|
-Current analysis only at Day 0.
|
|
|
- Need to study across time points.
|
|
|
-\end_layout
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Subsection*
|
|
|
-H3K4me correlation
|
|
|
+.
|
|
|
+ The approach to DeepSAGE involves two different restriction enzymes that
|
|
|
+ purify and then tag small fragments of transcripts at specific locations
|
|
|
+ and thus, significantly reduces the complexity of the transcriptome.
|
|
|
+ Therefore, we could not determine how DeepSAGE results would translate
|
|
|
+ to the common strategy in the field for assaying the entire transcript
|
|
|
+ population by whole-transcriptome 3’-end RNA-seq.
|
|
|
+ Furthermore, if globin reduction is necessary, we also needed a globin
|
|
|
+ reduction method specific to cynomolgus globin sequences that would work
|
|
|
+ an organism for which no kit is available off the shelf.
|
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
|
-The high correlation between coverage depth observed between H3K4me2 and
|
|
|
- H3K4me3 is both expected and unexpected.
|
|
|
- Since both marks are associated with elevated gene transcription, a positive
|
|
|
- correlation between them is not surprising.
|
|
|
- However, these two marks represent different post-translational modifications
|
|
|
- of the
|
|
|
-\emph on
|
|
|
-same
|
|
|
-\emph default
|
|
|
- lysine residue on the histone H3 polypeptide, which means that they cannot
|
|
|
- both be present on the same H3 subunit.
|
|
|
- Thus, the high correlation between them has several potential explanations.
|
|
|
- One possible reason is cell population heterogeneity: perhaps some genomic
|
|
|
- loci are frequently marked with H3K4me2 in some cells, while in other cells
|
|
|
- the same loci are marked with H3K4me3.
|
|
|
- Another possibility is allele-specific modifications: the loci are marked
|
|
|
- in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
|
|
|
- allele.
|
|
|
- Lastly, since each histone octamer contains 2 H3 subunits, it is possible
|
|
|
- that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
|
|
|
- represents a distinct epigenetic state with a different function than either
|
|
|
- double H3K4me2 or double H3K4me3.
|
|
|
+As mentioned above, the addition of globin blocking oligos has a very small
|
|
|
+ impact on measured expression levels of gene expression.
|
|
|
+ However, this is a non-issue for the purposes of differential expression
|
|
|
+ testing, since a systematic change in a gene in all samples does not affect
|
|
|
+ relative expression levels between samples.
|
|
|
+ However, we must acknowledge that simple comparisons of gene expression
|
|
|
+ data obtained by GB and non-GB protocols are not possible without additional
|
|
|
+ normalization.
|
|
|
|
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
|
-These three hypotheses could be disentangled by single-cell ChIP-seq.
|
|
|
- If the correlation between these two histone marks persists even within
|
|
|
- the reads for each individual cell, then cell population heterogeneity
|
|
|
- cannot explain the correlation.
|
|
|
- Allele-specific modification can be tested for by looking at the correlation
|
|
|
- between read coverage of the two histone marks at heterozygous loci.
|
|
|
- If the correlation between read counts for opposite loci is low, then this
|
|
|
- is consistent with allele-specific modification.
|
|
|
- Finally if the modifications do not separate by either cell or allele,
|
|
|
- the colocation of these two marks is most likely occurring at the level
|
|
|
- of individual histones, with the heterogenously modified histone representing
|
|
|
- a distinct state.
|
|
|
-
|
|
|
+More importantly, globin blocking not only nearly doubles the yield of usable
|
|
|
+ reads, it also increases inter-sample correlation and sensitivity to detect
|
|
|
+ differential gene expression relative to the same set of samples profiled
|
|
|
+ without blocking.
|
|
|
+ In addition, globin blocking does not add a significant amount of random
|
|
|
+ noise to the data.
|
|
|
+ Globin blocking thus represents a cost-effective way to squeeze more data
|
|
|
+ and statistical power out of the same blood samples and the same amount
|
|
|
+ of sequencing.
|
|
|
+ In conclusion, globin reduction greatly increases the yield of useful RNA-seq
|
|
|
+ reads mapping to the rest of the genome, with minimal perturbations in
|
|
|
+ the relative levels of non-globin genes.
|
|
|
+ Based on these results, globin transcript reduction using sequence-specific,
|
|
|
+ complementary blocking oligonucleotides is recommended for all deep RNA-seq
|
|
|
+ of cynomolgus and other nonhuman primate blood samples.
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
-However, another experiment would be required to show direct evidence of
|
|
|
- such a heterogeneously modified state.
|
|
|
- Specifically a
|
|
|
-\begin_inset Quotes eld
|
|
|
-\end_inset
|
|
|
-
|
|
|
-double ChIP
|
|
|
-\begin_inset Quotes erd
|
|
|
-\end_inset
|
|
|
-
|
|
|
- experiment would need to be performed, where the input DNA is first subjected
|
|
|
- to an immunoprecipitation pulldown from the anti-H3K4me2 antibody, and
|
|
|
- then the enriched material is collected, with proteins still bound, and
|
|
|
- immunoprecipitated
|
|
|
-\emph on
|
|
|
-again
|
|
|
-\emph default
|
|
|
- using the anti-H3K4me3 antibody.
|
|
|
- If this yields significant numbers of non-artifactual reads in the same
|
|
|
- regions as the individual pulldowns of the two marks, this is strong evidence
|
|
|
- that the two marks are occurring on opposite H3 subunits of the same histones.
|
|
|
+\begin_layout Section
|
|
|
+Future Directions
|
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
@@ -14047,11 +14134,9 @@ again
|
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
|
-Try to see if double ChIP-seq is actually feasible, and if not, come up
|
|
|
- with some other idea for directly detecting the mixed mod state.
|
|
|
- Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
|
|
|
-on.
|
|
|
- That's one possible angle.
|
|
|
+I've already done a good bit of work outside just this globin blocking thing,
|
|
|
+ so I'm not sure what to put for future directions.
|
|
|
+ Does it inculde the other stuff I've done but not published?
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
@@ -14059,20 +14144,8 @@ on.
|
|
|
|
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Section*
|
|
|
-Ch3
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Itemize
|
|
|
-Use CV or bootstrap to better evaluate classifiers
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Itemize
|
|
|
-fRMAtools could be adapted to not require equal-sized groups
|
|
|
-\end_layout
|
|
|
-
|
|
|
-\begin_layout Section*
|
|
|
-Ch4
|
|
|
+\begin_layout Chapter
|
|
|
+Future Directions
|
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
@@ -14080,9 +14153,9 @@ Ch4
|
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
|
-I've already done a good bit of work outside just this globin blocking thing,
|
|
|
- so I'm not sure what to put for future directions.
|
|
|
- Does it inculde the other stuff I've done but not published?
|
|
|
+If there are any chapter-independent future directions, put them here.
|
|
|
+ Otherwise, delete this section.
|
|
|
+ Check in the directions if this is OK.
|
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|