|
@@ -41,6 +41,9 @@
|
|
% (to avoid landscape figures breaking up text)
|
|
% (to avoid landscape figures breaking up text)
|
|
\usepackage{afterpage}
|
|
\usepackage{afterpage}
|
|
|
|
|
|
|
|
+% Consider: force floats after placement in text
|
|
|
|
+% https://tex.stackexchange.com/questions/15706/force-floats-to-be-typeset-after-their-occurrence-in-the-source-text
|
|
|
|
+
|
|
% This one breaks subfigs so it's disabled
|
|
% This one breaks subfigs so it's disabled
|
|
% https://tex.stackexchange.com/questions/65680/automatically-bold-first-sentence-of-a-floats-caption
|
|
% https://tex.stackexchange.com/questions/65680/automatically-bold-first-sentence-of-a-floats-caption
|
|
|
|
|
|
@@ -3101,6 +3104,52 @@ literal "false"
|
|
batch effect in the data.
|
|
batch effect in the data.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Due to an error in sample preparation, the RNA from the samples for days
|
|
|
|
+ 0 and 5 were sequenced using a different kit than those for days 1 and
|
|
|
|
+ 14.
|
|
|
|
+ This induced a substantial batch effect in the data due to differences
|
|
|
|
+ in sequencing biases between the two kits, and this batch effect is unfortunate
|
|
|
|
+ly confounded with the time point variable (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:RNA-PCA-no-batchsub"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ To do the best possible analysis with this data, this batch effect was
|
|
|
|
+ subtracted out from the data using ComBat
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Johnson2007"
|
|
|
|
+literal "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, ignoring the time point variable due to the confounding with the batch
|
|
|
|
+ variable.
|
|
|
|
+ The result is a marked improvement, but the unavoidable confounding with
|
|
|
|
+ time point means that certain real patterns of gene expression will be
|
|
|
|
+ indistinguishable from the batch effect and subtracted out as a result.
|
|
|
|
+ Specifically, any
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+zig-zag
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ pattern, such as a gene whose expression goes up on day 1, down on day
|
|
|
|
+ 5, and back up again on day 14, will be attenuated or eliminated entirely.
|
|
|
|
+ In the context of a T-cell activation time course, it is unlikely that
|
|
|
|
+ many genes of interest will follow such an expression pattern, so this
|
|
|
|
+ loss was deemed an acceptable cost for correcting the batch effect.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
@@ -3233,49 +3282,43 @@ PCoA plots of RNA-seq data showing effect of batch correction.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Due to an error in sample preparation, the RNA from the samples for days
|
|
|
|
- 0 and 5 were sequenced using a different kit than those for days 1 and
|
|
|
|
- 14.
|
|
|
|
- This induced a substantial batch effect in the data due to differences
|
|
|
|
- in sequencing biases between the two kits, and this batch effect is unfortunate
|
|
|
|
-ly confounded with the time point variable (Figure
|
|
|
|
|
|
+However, removing the systematic component of the batch effect still leaves
|
|
|
|
+ the noise component.
|
|
|
|
+ The gene quantifications from the first batch are substantially noisier
|
|
|
|
+ than those in the second batch.
|
|
|
|
+ This analysis corrected for this by using
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+limma
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+'s sample weighting method to assign lower weights to the noisy samples
|
|
|
|
+ of batch 1 (Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:RNA-PCA-no-batchsub"
|
|
|
|
|
|
+reference "fig:RNA-seq-weights-vs-covars"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- To do the best possible analysis with this data, this batch effect was
|
|
|
|
- subtracted out from the data using ComBat
|
|
|
|
|
|
+)
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
-key "Johnson2007"
|
|
|
|
|
|
+key "Ritchie2006,Liu2015"
|
|
literal "false"
|
|
literal "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, ignoring the time point variable due to the confounding with the batch
|
|
|
|
- variable.
|
|
|
|
- The result is a marked improvement, but the unavoidable confounding with
|
|
|
|
- time point means that certain real patterns of gene expression will be
|
|
|
|
- indistinguishable from the batch effect and subtracted out as a result.
|
|
|
|
- Specifically, any
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-zig-zag
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- pattern, such as a gene whose expression goes up on day 1, down on day
|
|
|
|
- 5, and back up again on day 14, will be attenuated or eliminated entirely.
|
|
|
|
- In the context of a T-cell activation time course, it is unlikely that
|
|
|
|
- many genes of interest will follow such an expression pattern, so this
|
|
|
|
- loss was deemed an acceptable cost for correcting the batch effect.
|
|
|
|
|
|
+.
|
|
|
|
+ The resulting analysis gives an accurate assessment of statistical significance
|
|
|
|
+ for all comparisons, which unfortunately means a loss of statistical power
|
|
|
|
+ for comparisons involving samples in batch 1.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -3332,36 +3375,6 @@ RNA-seq sample weights, grouped by experimental and technical covariates.
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-However, removing the systematic component of the batch effect still leaves
|
|
|
|
- the noise component.
|
|
|
|
- The gene quantifications from the first batch are substantially noisier
|
|
|
|
- than those in the second batch.
|
|
|
|
- This analysis corrected for this by using
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-limma
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-'s sample weighting method to assign lower weights to the noisy samples
|
|
|
|
- of batch 1
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Ritchie2006,Liu2015"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- The resulting analysis gives an accurate assessment of statistical significance
|
|
|
|
- for all comparisons, which unfortunately means a loss of statistical power
|
|
|
|
- for comparisons involving samples in batch 1.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
In any case, the
|
|
In any case, the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
@@ -3490,62 +3503,28 @@ ChIP-seq differential modification analysis
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/csaw/CCF-plots-noBL-PAGE2-CROP.pdf
|
|
|
|
- lyxscale 50
|
|
|
|
- height 40theight%
|
|
|
|
- groupId ccf-subfig
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:CCF-without-blacklist"
|
|
|
|
-
|
|
|
|
|
|
+Be consistent about use of
|
|
|
|
+\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-Cross-correlation plots without removing blacklisted reads.
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-Without blacklisting, many artifactual peaks are visible in the cross-correlatio
|
|
|
|
-ns of the ChIP-seq samples, and the peak at the true fragment size (147
|
|
|
|
-\begin_inset space ~
|
|
|
|
|
|
+differential binding
|
|
|
|
+\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-bp) is frequently overshadowed by the artifactual peak at the read length
|
|
|
|
- (100
|
|
|
|
-\begin_inset space ~
|
|
|
|
|
|
+ vs
|
|
|
|
+\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-bp).
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
|
|
+differential modification
|
|
|
|
+\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
|
|
+ throughout this chapter.
|
|
|
|
+ The latter is usually preferred.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -3553,249 +3532,73 @@ bp).
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Sequence reads were retrieved from
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/csaw/CCF-plots-PAGE2-CROP.pdf
|
|
|
|
- lyxscale 50
|
|
|
|
- height 40theight%
|
|
|
|
- groupId ccf-subfig
|
|
|
|
|
|
+SRA
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Leinonen2011"
|
|
|
|
+literal "false"
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+.
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term (Capital)
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:CCF-with-blacklist"
|
|
|
|
|
|
+ChIP-seq
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-Cross-correlation plots with blacklisted reads removed.
|
|
|
|
|
|
+ (and input) reads were aligned to GRCh38 genome assembly using Bowtie 2
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Langmead2012,Schneider2017,gh-hg38-ref"
|
|
|
|
+literal "false"
|
|
|
|
|
|
-\series default
|
|
|
|
- After blacklisting, most ChIP-seq samples have clean-looking periodic cross-cor
|
|
|
|
-relation plots, with the largest peak around 147
|
|
|
|
-\begin_inset space ~
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-bp, the expected size for a fragment of DNA from a single nucleosome, and
|
|
|
|
- little to no peak at the read length, 100
|
|
|
|
-\begin_inset space ~
|
|
|
|
-\end_inset
|
|
|
|
|
|
+.
|
|
|
|
+ Artifact regions were annotated using a custom implementation of the
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
|
|
-bp.
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GreyListChIP
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ algorithm, and these
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
|
|
+greylists
|
|
|
|
+\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ were merged with the published
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+ENCODE
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Strand cross-correlation plots for ChIP-seq data, before and after blacklisting.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:CCF-master"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-Strand cross-correlation plots for ChIP-seq data, before and after blacklisting.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Note Note
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me2-sample-MAplot-bins-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 100col%
|
|
|
|
- groupId colwidth-raster
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:MA-plot-bigbins"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-MA plot of H3K4me2 read counts in 10kb bins for two arbitrary samples.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Be consistent about use of
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-differential binding
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- vs
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-differential modification
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- throughout this chapter.
|
|
|
|
- The latter is usually preferred.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-Sequence reads were retrieved from
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-SRA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Leinonen2011"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term (Capital)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-ChIP-seq
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- (and input) reads were aligned to GRCh38 genome assembly using Bowtie 2
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Langmead2012,Schneider2017,gh-hg38-ref"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- Artifact regions were annotated using a custom implementation of the
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GreyListChIP
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- algorithm, and these
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-greylists
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- were merged with the published
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-ENCODE
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
blacklists
|
|
blacklists
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
@@ -3911,113 +3714,83 @@ literal "false"
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Promoters were defined by computing the distance from each annotated
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/csaw/CCF-plots-noBL-PAGE2-CROP.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ height 40theight%
|
|
|
|
+ groupId ccf-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- to the nearest called peak and examining the distribution of distances,
|
|
|
|
- observing that peaks for each histone mark were enriched within a certain
|
|
|
|
- distance of the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:CCF-without-blacklist"
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- For H3K4me2 and H3K4me3, this distance was about 1
|
|
|
|
|
|
+Cross-correlation plots without removing blacklisted reads.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+Without blacklisting, many artifactual peaks are visible in the cross-correlatio
|
|
|
|
+ns of the ChIP-seq samples, and the peak at the true fragment size (147
|
|
\begin_inset space ~
|
|
\begin_inset space ~
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-kb, while for H3K27me3 it was 2.5
|
|
|
|
|
|
+bp) is frequently overshadowed by the artifactual peak at the read length
|
|
|
|
+ (100
|
|
\begin_inset space ~
|
|
\begin_inset space ~
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-kb.
|
|
|
|
- These distances were used as an
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
|
|
+bp).
|
|
|
|
+\end_layout
|
|
|
|
|
|
-effective promoter radius
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- for each mark.
|
|
|
|
- The promoter region for each gene was defined as the region of the genome
|
|
|
|
- within this distance upstream or downstream of the gene's annotated
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- For genes with multiple annotated
|
|
|
|
-\begin_inset Flex Glossary Term (pl)
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-, a promoter region was defined for each
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- individually, and any promoters that overlapped (due to multiple
|
|
|
|
-\begin_inset Flex Glossary Term (pl)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- being closer than 2 times the radius) were merged into one large promoter.
|
|
|
|
- Thus, some genes had multiple promoters defined, which were each analyzed
|
|
|
|
- separately for differential modification.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status collapsed
|
|
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
\begin_inset Graphics
|
|
\begin_inset Graphics
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me2-PCA-raw-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-subfig
|
|
|
|
|
|
+ filename graphics/CD4-csaw/csaw/CCF-plots-PAGE2-CROP.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ height 40theight%
|
|
|
|
+ groupId ccf-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -4032,37 +3805,30 @@ status collapsed
|
|
\series bold
|
|
\series bold
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:PCoA-H3K4me2-bad"
|
|
|
|
|
|
+name "fig:CCF-with-blacklist"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-H3K4me2, no correction
|
|
|
|
-\end_layout
|
|
|
|
|
|
+Cross-correlation plots with blacklisted reads removed.
|
|
|
|
|
|
|
|
+\series default
|
|
|
|
+ After blacklisting, most ChIP-seq samples have clean-looking periodic cross-cor
|
|
|
|
+relation plots, with the largest peak around 147
|
|
|
|
+\begin_inset space ~
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
|
|
+bp, the expected size for a fragment of DNA from a single nucleosome, and
|
|
|
|
+ little to no peak at the read length, 100
|
|
|
|
+\begin_inset space ~
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+bp.
|
|
|
|
+\end_layout
|
|
|
|
|
|
-\begin_inset space \hfill{}
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me2-PCA-SVsub-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-subfig
|
|
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -4073,15 +3839,25 @@ status collapsed
|
|
\begin_inset Caption Standard
|
|
\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Strand cross-correlation plots for ChIP-seq data, before and after blacklisting.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
|
|
-\series bold
|
|
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:PCoA-H3K4me2-good"
|
|
|
|
|
|
+name "fig:CCF-master"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-H3K4me2, SVs subtracted
|
|
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Strand cross-correlation plots for ChIP-seq data, before and after blacklisting.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4094,6 +3870,10 @@ H3K4me2, SVs subtracted
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status open
|
|
|
|
+
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
@@ -4103,10 +3883,10 @@ status collapsed
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
\begin_inset Graphics
|
|
\begin_inset Graphics
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me3-PCA-raw-CROP.png
|
|
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me2-sample-MAplot-bins-CROP.png
|
|
lyxscale 25
|
|
lyxscale 25
|
|
- width 45col%
|
|
|
|
- groupId pcoa-subfig
|
|
|
|
|
|
+ width 100col%
|
|
|
|
+ groupId colwidth-raster
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -4121,11 +3901,11 @@ status collapsed
|
|
\series bold
|
|
\series bold
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:PCoA-H3K4me3-bad"
|
|
|
|
|
|
+name "fig:MA-plot-bigbins"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-H3K4me3, no correction
|
|
|
|
|
|
+MA plot of H3K4me2 read counts in 10kb bins for two arbitrary samples.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4136,137 +3916,197 @@ H3K4me3, no correction
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\begin_inset space \hfill{}
|
|
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Promoters were defined by computing the distance from each annotated
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me3-PCA-SVsub-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-subfig
|
|
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ to the nearest called peak and examining the distribution of distances,
|
|
|
|
+ observing that peaks for each histone mark were enriched within a certain
|
|
|
|
+ distance of the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:PCoA-H3K4me3-good"
|
|
|
|
|
|
+.
|
|
|
|
+ For H3K4me2 and H3K4me3, this distance was about 1
|
|
|
|
+\begin_inset space ~
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+kb, while for H3K27me3 it was 2.5
|
|
|
|
+\begin_inset space ~
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-H3K4me3, SVs subtracted
|
|
|
|
-\end_layout
|
|
|
|
|
|
+kb.
|
|
|
|
+ These distances were used as an
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+effective promoter radius
|
|
|
|
+\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ for each mark.
|
|
|
|
+ The promoter region for each gene was defined as the region of the genome
|
|
|
|
+ within this distance upstream or downstream of the gene's annotated
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+.
|
|
|
|
+ For genes with multiple annotated
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, a promoter region was defined for each
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K27me3-PCA-raw-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-subfig
|
|
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ individually, and any promoters that overlapped (due to multiple
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:PCoA-H3K27me3-bad"
|
|
|
|
|
|
+ being closer than 2 times the radius) were merged into one large promoter.
|
|
|
|
+ Thus, some genes had multiple promoters defined, which were each analyzed
|
|
|
|
+ separately for differential modification.
|
|
|
|
+\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Reads in promoters, peaks, and sliding windows across the genome were counted
|
|
|
|
+ and normalized using
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
|
|
-H3K27me3, no correction
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+csaw
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ and analyzed for differential modification using
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+edgeR
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Lun2014,Lun2015a,Lund2012,Phipson2016"
|
|
|
|
+literal "false"
|
|
|
|
|
|
-\begin_inset space \hfill{}
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
|
|
+.
|
|
|
|
+ Unobserved confounding factors in the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K27me3-PCA-SVsub-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-subfig
|
|
|
|
|
|
+ChIP-seq
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ data were corrected using
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+SVA
|
|
|
|
+\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:PCoA-H3K27me3-good"
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Leek2007,Leek2014"
|
|
|
|
+literal "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-H3K27me3, SVs subtracted
|
|
|
|
-\end_layout
|
|
|
|
|
|
+.
|
|
|
|
+ Principal coordinate plots of the promoter count data for each histone
|
|
|
|
+ mark before and after subtracting surrogate variable effects are shown
|
|
|
|
+ in Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:PCoA-ChIP"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
|
|
+.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me2-PCA-raw-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-subfig
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
@@ -4276,27 +4116,15 @@ H3K27me3, SVs subtracted
|
|
\begin_inset Caption Standard
|
|
\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-PCoA plots of ChIP-seq sliding window data, before and after subtracting
|
|
|
|
- surrogate variables (SVs).
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
|
|
|
|
|
|
+\series bold
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:PCoA-ChIP"
|
|
|
|
|
|
+name "fig:PCoA-H3K4me2-bad"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-PCoA plots of ChIP-seq sliding window data, before and after subtracting
|
|
|
|
- surrogate variables (SVs).
|
|
|
|
|
|
+H3K4me2, no correction
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4307,194 +4135,135 @@ PCoA plots of ChIP-seq sliding window data, before and after subtracting
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_inset space \hfill{}
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-Reads in promoters, peaks, and sliding windows across the genome were counted
|
|
|
|
- and normalized using
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
-status open
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-csaw
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me2-PCA-SVsub-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- and analyzed for differential modification using
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-edgeR
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Lun2014,Lun2015a,Lund2012,Phipson2016"
|
|
|
|
-literal "false"
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:PCoA-H3K4me2-good"
|
|
|
|
|
|
-.
|
|
|
|
- Unobserved confounding factors in the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-ChIP-seq
|
|
|
|
|
|
+H3K4me2, SVs subtracted
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- data were corrected using
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-SVA
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Leek2007,Leek2014"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- Principal coordinate plots of the promoter count data for each histone
|
|
|
|
- mark before and after subtracting surrogate variable effects are shown
|
|
|
|
- in Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:PCoA-ChIP"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
-.
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-To investigate whether the location of a peak within the promoter region
|
|
|
|
- was important,
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-relative coverage profiles
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- were generated.
|
|
|
|
- First, 500-bp sliding windows were tiled around each annotated
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me3-PCA-raw-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-: one window centered on the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- itself, and 10 windows each upstream and downstream, thus covering a 10.5-kb
|
|
|
|
- region centered on the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:PCoA-H3K4me3-bad"
|
|
|
|
|
|
- with 21 windows.
|
|
|
|
- Reads in each window for each
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
|
|
+H3K4me3, no correction
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- were counted in each sample, and the counts were normalized and converted
|
|
|
|
- to
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-logCPM
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- as in the differential modification analysis.
|
|
|
|
- Then, the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-logCPM
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
|
|
+\begin_inset space \hfill{}
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- values within each promoter were normalized to an average of zero, such
|
|
|
|
- that each window's normalized abundance now represents the relative read
|
|
|
|
- depth of that window compared to all other windows in the same promoter.
|
|
|
|
- The normalized abundance values for each window in a promoter are collectively
|
|
|
|
- referred to as that promoter's
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
-relative coverage profile
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me3-PCA-SVsub-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-subfig
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-MOFA recovers biologically relevant variation from blind analysis by correlating
|
|
|
|
- across datasets
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset ERT
|
|
|
|
-status open
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:PCoA-H3K4me3-good"
|
|
|
|
|
|
-\backslash
|
|
|
|
-afterpage{
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+H3K4me3, SVs subtracted
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
|
|
-\backslash
|
|
|
|
-begin{landscape}
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4502,25 +4271,19 @@ begin{landscape}
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
\begin_inset Graphics
|
|
\begin_inset Graphics
|
|
- filename graphics/CD4-csaw/MOFA-varExplaiend-matrix-CROP.png
|
|
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K27me3-PCA-raw-CROP.png
|
|
lyxscale 25
|
|
lyxscale 25
|
|
width 45col%
|
|
width 45col%
|
|
- groupId mofa-subfig
|
|
|
|
|
|
+ groupId pcoa-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -4535,25 +4298,11 @@ status open
|
|
\series bold
|
|
\series bold
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:mofa-varexplained"
|
|
|
|
|
|
+name "fig:PCoA-H3K27me3-bad"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-Variance explained in each data set by each latent factor estimated by MOFA.
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
- For each LF learned by MOFA, the variance explained by that factor in each
|
|
|
|
- data set (
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-view
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-) is shown by the shading of the cells in the lower section.
|
|
|
|
- The upper section shows the total fraction of each data set's variance
|
|
|
|
- that is explained by all LFs combined.
|
|
|
|
|
|
+H3K27me3, no correction
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4571,15 +4320,15 @@ view
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
\begin_inset Graphics
|
|
\begin_inset Graphics
|
|
- filename graphics/CD4-csaw/MOFA-LF-scatter-CROP.png
|
|
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K27me3-PCA-SVsub-CROP.png
|
|
lyxscale 25
|
|
lyxscale 25
|
|
width 45col%
|
|
width 45col%
|
|
- groupId mofa-subfig
|
|
|
|
|
|
+ groupId pcoa-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -4594,16 +4343,11 @@ status open
|
|
\series bold
|
|
\series bold
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:mofa-lf-scatter"
|
|
|
|
|
|
+name "fig:PCoA-H3K27me3-good"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-Scatter plots of specific pairs of MOFA latent factors.
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
- LFs 1, 4, and 5 explain substantial variation in all data sets, so they
|
|
|
|
- are plotted against each other in order to reveal patterns of variation
|
|
|
|
- that are shared across all data sets.
|
|
|
|
|
|
+H3K27me3, SVs subtracted
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4624,7 +4368,8 @@ Scatter plots of specific pairs of MOFA latent factors.
|
|
status collapsed
|
|
status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-MOFA latent factors identify shared patterns of variation.
|
|
|
|
|
|
+PCoA plots of ChIP-seq sliding window data, before and after subtracting
|
|
|
|
+ surrogate variables (SVs).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4632,13 +4377,14 @@ MOFA latent factors identify shared patterns of variation.
|
|
|
|
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:MOFA-master"
|
|
|
|
|
|
+name "fig:PCoA-ChIP"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
-MOFA latent factors identify shared patterns of variation.
|
|
|
|
|
|
+PCoA plots of ChIP-seq sliding window data, before and after subtracting
|
|
|
|
+ surrogate variables (SVs).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4652,24 +4398,98 @@ MOFA latent factors identify shared patterns of variation.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-\begin_inset ERT
|
|
|
|
|
|
+To investigate whether the location of a peak within the promoter region
|
|
|
|
+ was important,
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+relative coverage profiles
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ were generated.
|
|
|
|
+ First, 500-bp sliding windows were tiled around each annotated
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\backslash
|
|
|
|
-end{landscape}
|
|
|
|
|
|
+: one window centered on the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ itself, and 10 windows each upstream and downstream, thus covering a 10.5-kb
|
|
|
|
+ region centered on the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ with 21 windows.
|
|
|
|
+ Reads in each window for each
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ were counted in each sample, and the counts were normalized and converted
|
|
|
|
+ to
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
+logCPM
|
|
|
|
+\end_layout
|
|
|
|
|
|
-}
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ as in the differential modification analysis.
|
|
|
|
+ Then, the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+logCPM
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ values within each promoter were normalized to an average of zero, such
|
|
|
|
+ that each window's normalized abundance now represents the relative read
|
|
|
|
+ depth of that window compared to all other windows in the same promoter.
|
|
|
|
+ The normalized abundance values for each window in a promoter are collectively
|
|
|
|
+ referred to as that promoter's
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+relative coverage profile
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+MOFA recovers biologically relevant variation from blind analysis by correlating
|
|
|
|
+ across datasets
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -4816,7 +4636,32 @@ noprefix "false"
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-\begin_inset Note Note
|
|
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+afterpage{
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+begin{landscape}
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
status collapsed
|
|
status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
@@ -4828,10 +4673,10 @@ status open
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
\begin_inset Graphics
|
|
\begin_inset Graphics
|
|
- filename graphics/CD4-csaw/MOFA-batch-correct-CROP.png
|
|
|
|
|
|
+ filename graphics/CD4-csaw/MOFA-varExplaiend-matrix-CROP.png
|
|
lyxscale 25
|
|
lyxscale 25
|
|
- width 100col%
|
|
|
|
- groupId colwidth-raster
|
|
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId mofa-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -4846,16 +4691,25 @@ status open
|
|
\series bold
|
|
\series bold
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:mofa-batchsub"
|
|
|
|
|
|
+name "fig:mofa-varexplained"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-Result of RNA-seq batch-correction using MOFA latent factors
|
|
|
|
-\end_layout
|
|
|
|
|
|
+Variance explained in each data set by each latent factor estimated by MOFA.
|
|
|
|
|
|
|
|
+\series default
|
|
|
|
+ For each LF learned by MOFA, the variance explained by that factor in each
|
|
|
|
+ data set (
|
|
|
|
+\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+view
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+) is shown by the shading of the cells in the lower section.
|
|
|
|
+ The upper section shows the total fraction of each data set's variance
|
|
|
|
+ that is explained by all LFs combined.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4866,121 +4720,334 @@ Result of RNA-seq batch-correction using MOFA latent factors
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_inset space \hfill{}
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Note Note
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Placing these floats is a challenge
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/MOFA-LF-scatter-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId mofa-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float table
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Tabular
|
|
|
|
-<lyxtabular version="3" rows="11" columns="3">
|
|
|
|
-<features tabularvalignment="middle">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Test
|
|
|
|
-\end_layout
|
|
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:mofa-lf-scatter"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Est.
|
|
|
|
- non-null
|
|
|
|
-\end_layout
|
|
|
|
|
|
+Scatter plots of specific pairs of MOFA latent factors.
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\series default
|
|
|
|
+ LFs 1, 4, and 5 explain substantial variation in all data sets, so they
|
|
|
|
+ are plotted against each other in order to reveal patterns of variation
|
|
|
|
+ that are shared across all data sets.
|
|
|
|
+\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Formula $\mathrm{FDR}\le10\%$
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Naïve Day 0 vs Day 1
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\end_layout
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-5992
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-1613
|
|
|
|
|
|
+MOFA latent factors identify shared patterns of variation.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:MOFA-master"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+MOFA latent factors identify shared patterns of variation.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+end{landscape}
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+}
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/MOFA-batch-correct-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 100col%
|
|
|
|
+ groupId colwidth-raster
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:mofa-batchsub"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+Result of RNA-seq batch-correction using MOFA latent factors
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Section
|
|
|
|
+Results
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Focus on what hypotheses were tested, then select figures that show how
|
|
|
|
+ those hypotheses were tested, even if the result is a negative.
|
|
|
|
+ Not every interesting result needs to be in here.
|
|
|
|
+ Chapter should tell a story.
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+Interpretation of RNA-seq analysis is limited by a major confounding factor
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Putting a float here causes an error.
|
|
|
|
+ No idea why.
|
|
|
|
+ See above for the floats that should be placed here.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Genes called as present in the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RNA-seq
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ data were tested for differential expression between all time points and
|
|
|
|
+ cell types.
|
|
|
|
+ The counts of differentially expressed genes are shown in Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:Estimated-and-detected-rnaseq"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ Notably, all the results for Day 0 and Day 5 have substantially fewer genes
|
|
|
|
+ called differentially expressed than any of the results for other time
|
|
|
|
+ points.
|
|
|
|
+ This is an unfortunate result of the difference in sample quality between
|
|
|
|
+ the two batches of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RNA-seq
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ data.
|
|
|
|
+ All the samples in Batch 1, which includes all the samples from Days 0
|
|
|
|
+ and 5, have substantially more variability than the samples in Batch 2,
|
|
|
|
+ which includes the other time points.
|
|
|
|
+ This is reflected in the substantially higher weights assigned to Batch
|
|
|
|
+ 2 (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:RNA-seq-weights-vs-covars"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ The batch effect has both a systematic component and a random noise component.
|
|
|
|
+ While the systematic component was subtracted out using ComBat (Figure
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:RNA-PCA"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+), no such correction is possible for the noise component: Batch 1 simply
|
|
|
|
+ has substantially more random noise in it, which reduces the statistical
|
|
|
|
+ power for any differential expression tests involving samples in that batch.
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Placing these floats is a challenge
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float table
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Tabular
|
|
|
|
+<lyxtabular version="3" rows="11" columns="3">
|
|
|
|
+<features tabularvalignment="middle">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Naïve Day 0 vs Day 5
|
|
|
|
|
|
+Test
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-3038
|
|
|
|
|
|
+Est.
|
|
|
|
+ non-null
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-32
|
|
|
|
|
|
+\begin_inset Formula $\mathrm{FDR}\le10\%$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -4991,83 +5058,607 @@ Naïve Day 0 vs Day 5
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Naïve Day 0 vs Day 14
|
|
|
|
|
|
+Naïve Day 0 vs Day 1
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+5992
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+1613
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Naïve Day 0 vs Day 5
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+3038
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+32
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Naïve Day 0 vs Day 14
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+1870
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+190
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Memory Day 0 vs Day 1
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+3195
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+411
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Memory Day 0 vs Day 5
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+2688
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+18
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Memory Day 0 vs Day 14
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+1911
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+227
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Day 0 Naïve vs Memory
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+0
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+2
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Day 1 Naïve vs Memory
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+9167
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+5532
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Day 5 Naïve vs Memory
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+0
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+0
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Day 14 Naïve vs Memory
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+6446
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+2319
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+</lyxtabular>
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Estimated and detected differentially expressed genes.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "tab:Estimated-and-detected-rnaseq"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Estimated and detected differentially expressed genes.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+Test
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+: Which sample groups were compared;
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+Est non-null
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+: Estimated number of differentially expressed genes, using the method of
|
|
|
|
+ averaging local FDR values
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Phipson2013Thesis"
|
|
|
|
+literal "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+;
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset Formula $\mathrm{FDR}\le10\%$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+: Number of significantly differentially expressed genes at an FDR threshold
|
|
|
|
+ of 10%.
|
|
|
|
+ The total number of genes tested was 16707.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Despite the difficulty in detecting specific differentially expressed genes,
|
|
|
|
+ there is still evidence that differential expression is present for these
|
|
|
|
+ time points.
|
|
|
|
+ In Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:rna-pca-final"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, there is a clear separation between naïve and memory samples at Day 0,
|
|
|
|
+ despite the fact that only 2 genes were significantly differentially expressed
|
|
|
|
+ for this comparison.
|
|
|
|
+ Similarly, the small numbers of genes detected for the Day 0 vs Day 5 compariso
|
|
|
|
+ns do not reflect the large separation between these time points in Figure
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:rna-pca-final"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ In addition, the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+MOFA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+LF
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ plots in Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:mofa-lf-scatter"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ This suggests that there is indeed a differential expression signal present
|
|
|
|
+ in the data for these comparisons, but the large variability in the Batch
|
|
|
|
+ 1 samples obfuscates this signal at the individual gene level.
|
|
|
|
+ As a result, it is impossible to make any meaningful statements about the
|
|
|
|
+
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+size
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ of the gene signature for any time point, since the number of significant
|
|
|
|
+ genes as well as the estimated number of differentially expressed genes
|
|
|
|
+ depends so strongly on the variations in sample quality in addition to
|
|
|
|
+ the size of the differential expression signal in the data.
|
|
|
|
+ Gene-set enrichment analyses are similarly impractical.
|
|
|
|
+ However, analyses looking at genome-wide patterns of expression are still
|
|
|
|
+ practical.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/RNA-seq/PCA-final-12-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 100col%
|
|
|
|
+ groupId colwidth-raster
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+PCoA plot of RNA-seq samples after ComBat batch correction.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:rna-pca-final"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+PCoA plot of RNA-seq samples after ComBat batch correction.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+Each point represents an individual sample.
|
|
|
|
+ Samples with the same combination of cell type and time point are encircled
|
|
|
|
+ with a shaded region to aid in visual identification of the sample groups.
|
|
|
|
+ Samples with of same cell type from the same donor are connected by lines
|
|
|
|
+ to indicate the
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+trajectory
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ of each donor's cells over time in PCoA space.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+H3K4 and H3K27 methylation occur in broad regions and are enriched near
|
|
|
|
+ promoters
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float table
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-1870
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-190
|
|
|
|
|
|
+Also get
|
|
|
|
+\emph on
|
|
|
|
+median
|
|
|
|
+\emph default
|
|
|
|
+ peak width and maybe other quantiles (25%, 75%)
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Memory Day 0 vs Day 1
|
|
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Tabular
|
|
|
|
+<lyxtabular version="3" rows="4" columns="5">
|
|
|
|
+<features tabularvalignment="middle">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-3195
|
|
|
|
|
|
+Histone Mark
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-411
|
|
|
|
|
|
+# Peaks
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Memory Day 0 vs Day 5
|
|
|
|
|
|
+Mean peak width
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-2688
|
|
|
|
|
|
+genome coverage
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-18
|
|
|
|
|
|
+FRiP
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5078,7 +5669,7 @@ Memory Day 0 vs Day 5
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Memory Day 0 vs Day 14
|
|
|
|
|
|
+H3K4me2
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5087,27 +5678,16 @@ Memory Day 0 vs Day 14
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-1911
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-227
|
|
|
|
|
|
+14965
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Day 0 Naïve vs Memory
|
|
|
|
|
|
+3970
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5116,7 +5696,7 @@ Day 0 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-0
|
|
|
|
|
|
+1.92%
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5125,7 +5705,7 @@ Day 0 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-2
|
|
|
|
|
|
+14.2%
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5136,7 +5716,7 @@ Day 0 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Day 1 Naïve vs Memory
|
|
|
|
|
|
+H3K4me3
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5145,27 +5725,16 @@ Day 1 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-9167
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-5532
|
|
|
|
|
|
+6163
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Day 5 Naïve vs Memory
|
|
|
|
|
|
+2946
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5174,7 +5743,7 @@ Day 5 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-0
|
|
|
|
|
|
+0.588%
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5183,7 +5752,7 @@ Day 5 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-0
|
|
|
|
|
|
+6.57%
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5194,7 +5763,7 @@ Day 5 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Day 14 Naïve vs Memory
|
|
|
|
|
|
+H3K27me3
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5203,230 +5772,53 @@ Day 14 Naïve vs Memory
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-6446
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-2319
|
|
|
|
|
|
+18139
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-</row>
|
|
|
|
-</lyxtabular>
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Estimated and detected differentially expressed genes.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "tab:Estimated-and-detected-rnaseq"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-Estimated and detected differentially expressed genes.
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-Test
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-: Which sample groups were compared;
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-Est non-null
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-: Estimated number of differentially expressed genes, using the method of
|
|
|
|
- averaging local FDR values
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Phipson2013Thesis"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-;
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset Formula $\mathrm{FDR}\le10\%$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-: Number of significantly differentially expressed genes at an FDR threshold
|
|
|
|
- of 10%.
|
|
|
|
- The total number of genes tested was 16707.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Section
|
|
|
|
-Results
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Focus on what hypotheses were tested, then select figures that show how
|
|
|
|
- those hypotheses were tested, even if the result is a negative.
|
|
|
|
- Not every interesting result needs to be in here.
|
|
|
|
- Chapter should tell a story.
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-Interpretation of RNA-seq analysis is limited by a major confounding factor
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Note Note
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Putting a float here causes an error.
|
|
|
|
- No idea why.
|
|
|
|
- See above for the floats that should be placed here.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-Genes called as present in the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-RNA-seq
|
|
|
|
|
|
+18967
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
|
|
- data were tested for differential expression between all time points and
|
|
|
|
- cell types.
|
|
|
|
- The counts of differentially expressed genes are shown in Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:Estimated-and-detected-rnaseq"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+11.1%
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-
|
|
|
|
-.
|
|
|
|
- Notably, all the results for Day 0 and Day 5 have substantially fewer genes
|
|
|
|
- called differentially expressed than any of the results for other time
|
|
|
|
- points.
|
|
|
|
- This is an unfortunate result of the difference in sample quality between
|
|
|
|
- the two batches of
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-RNA-seq
|
|
|
|
|
|
+22.5%
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-
|
|
|
|
- data.
|
|
|
|
- All the samples in Batch 1, which includes all the samples from Days 0
|
|
|
|
- and 5, have substantially more variability than the samples in Batch 2,
|
|
|
|
- which includes the other time points.
|
|
|
|
- This is reflected in the substantially higher weights assigned to Batch
|
|
|
|
- 2 (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:RNA-seq-weights-vs-covars"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+</lyxtabular>
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- The batch effect has both a systematic component and a random noise component.
|
|
|
|
- While the systematic component was subtracted out using ComBat (Figure
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:RNA-PCA"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
-), no such correction is possible for the noise component: Batch 1 simply
|
|
|
|
- has substantially more random noise in it, which reduces the statistical
|
|
|
|
- power for any differential expression tests involving samples in that batch.
|
|
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/RNA-seq/PCA-final-12-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 100col%
|
|
|
|
- groupId colwidth-raster
|
|
|
|
|
|
+Get the IDR threshold
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -5441,7 +5833,7 @@ status collapsed
|
|
status collapsed
|
|
status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-PCoA plot of RNA-seq samples after ComBat batch correction.
|
|
|
|
|
|
+Summary of peak-calling statistics.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5449,28 +5841,18 @@ PCoA plot of RNA-seq samples after ComBat batch correction.
|
|
|
|
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:rna-pca-final"
|
|
|
|
|
|
+name "tab:peak-calling-summary"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
-PCoA plot of RNA-seq samples after ComBat batch correction.
|
|
|
|
|
|
+Summary of peak-calling statistics.
|
|
|
|
|
|
\series default
|
|
\series default
|
|
-Each point represents an individual sample.
|
|
|
|
- Samples with the same combination of cell type and time point are encircled
|
|
|
|
- with a shaded region to aid in visual identification of the sample groups.
|
|
|
|
- Samples with of same cell type from the same donor are connected by lines
|
|
|
|
- to indicate the
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-trajectory
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- of each donor's cells over time in PCoA space.
|
|
|
|
|
|
+For each histone mark, the number of peaks called using SICER at an IDR
|
|
|
|
+ threshold of ???, the mean width of those peaks, the fraction of the genome
|
|
|
|
+ covered by peaks, and the fraction of reads in peaks (FRiP).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5484,253 +5866,250 @@ trajectory
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Despite the difficulty in detecting specific differentially expressed genes,
|
|
|
|
- there is still evidence that differential expression is present for these
|
|
|
|
- time points.
|
|
|
|
- In Figure
|
|
|
|
|
|
+Table
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:rna-pca-final"
|
|
|
|
|
|
+reference "tab:peak-calling-summary"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, there is a clear separation between naïve and memory samples at Day 0,
|
|
|
|
- despite the fact that only 2 genes were significantly differentially expressed
|
|
|
|
- for this comparison.
|
|
|
|
- Similarly, the small numbers of genes detected for the Day 0 vs Day 5 compariso
|
|
|
|
-ns do not reflect the large separation between these time points in Figure
|
|
|
|
-
|
|
|
|
|
|
+ gives a summary of the peak calling statistics for each histone mark.
|
|
|
|
+ Consistent with previous observations, all 3 histone marks occur in broad
|
|
|
|
+ regions spanning many consecutive nucleosomes, rather than in sharp peaks
|
|
|
|
+ as would be expected for a transcription factor or other molecule that
|
|
|
|
+ binds to specific sites.
|
|
|
|
+ This conclusion is further supported by Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:rna-pca-final"
|
|
|
|
|
|
+reference "fig:CCF-with-blacklist"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- In addition, the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+, in which a clear nucleosome-sized periodicity is visible in the cross-correlat
|
|
|
|
+ion value for each sample, indicating that each time a given mark is present
|
|
|
|
+ on one histone, it is also likely to be found on adjacent histones as well.
|
|
|
|
+ H3K27me3 enrichment in particular is substantially more broad than either
|
|
|
|
+ H3K4 mark, with a mean peak width of almost 19,000 bp.
|
|
|
|
+ This is also reflected in the periodicity observed in Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:CCF-with-blacklist"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-MOFA
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, which remains strong much farther out for H3K27me3 than the other marks,
|
|
|
|
+ showing H3K27me3 especially tends to be found on long runs of consecutive
|
|
|
|
+ histones.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+All 3 histone marks tend to occur more often near promoter regions, as shown
|
|
|
|
+ in Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:near-promoter-peak-enrich"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
|
|
+.
|
|
|
|
+ The majority of each density distribution is flat, representing the background
|
|
|
|
+ density of peaks genome-wide.
|
|
|
|
+ Each distribution has a peak near zero, representing an enrichment of peaks
|
|
|
|
+ close to
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-LF
|
|
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- plots in Figure
|
|
|
|
|
|
+ positions relative to the remainder of the genome.
|
|
|
|
+ Interestingly, the
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+radius
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ within which this enrichment occurs is not the same for every histone mark
|
|
|
|
+ (Table
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:mofa-lf-scatter"
|
|
|
|
|
|
+reference "tab:effective-promoter-radius"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- This suggests that there is indeed a differential expression signal present
|
|
|
|
- in the data for these comparisons, but the large variability in the Batch
|
|
|
|
- 1 samples obfuscates this signal at the individual gene level.
|
|
|
|
- As a result, it is impossible to make any meaningful statements about the
|
|
|
|
-
|
|
|
|
|
|
+).
|
|
|
|
+ For H3K4me2 and H3K4me3, peaks are most enriched within 1
|
|
|
|
+\begin_inset space ~
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+kbp of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ positions, while for H3K27me3, enrichment is broader, extending to 2.5
|
|
|
|
+\begin_inset space ~
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+kbp.
|
|
|
|
+ These
|
|
\begin_inset Quotes eld
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-size
|
|
|
|
|
|
+effective promoter radii
|
|
\begin_inset Quotes erd
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- of the gene signature for any time point, since the number of significant
|
|
|
|
- genes as well as the estimated number of differentially expressed genes
|
|
|
|
- depends so strongly on the variations in sample quality in addition to
|
|
|
|
- the size of the differential expression signal in the data.
|
|
|
|
- Gene-set enrichment analyses are similarly impractical.
|
|
|
|
- However, analyses looking at genome-wide patterns of expression are still
|
|
|
|
- practical.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-H3K4 and H3K27 methylation occur in broad regions and are enriched near
|
|
|
|
- promoters
|
|
|
|
|
|
+ remain approximately the same across all combinations of experimental condition
|
|
|
|
+ (cell type, time point, and donor), so they appear to be a property of
|
|
|
|
+ the histone mark itself.
|
|
|
|
+ Hence, these radii were used to define the promoter regions for each histone
|
|
|
|
+ mark in all further analyses.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-\begin_inset Float table
|
|
|
|
|
|
+\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
\begin_inset Flex TODO Note (inline)
|
|
\begin_inset Flex TODO Note (inline)
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Also get
|
|
|
|
-\emph on
|
|
|
|
-median
|
|
|
|
-\emph default
|
|
|
|
- peak width and maybe other quantiles (25%, 75%)
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Tabular
|
|
|
|
-<lyxtabular version="3" rows="4" columns="5">
|
|
|
|
-<features tabularvalignment="middle">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Histone Mark
|
|
|
|
|
|
+Future direction idea: Need a control: shuffle all peaks and repeat, N times.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-# Peaks
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\end_layout
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Mean peak width
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/Promoter Peak Distance Profile-PAGE1-CROP.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ width 80col%
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-genome coverage
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\end_layout
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-FRiP
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-H3K4me2
|
|
|
|
|
|
+Enrichment of peaks in promoter neighborhoods.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-14965
|
|
|
|
-\end_layout
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:near-promoter-peak-enrich"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-3970
|
|
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Enrichment of peaks in promoter neighborhoods.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+This plot shows the distribution of distances from each annotated transcription
|
|
|
|
+ start site in the genome to the nearest called peak.
|
|
|
|
+ Each line represents one combination of histone mark, cell type, and time
|
|
|
|
+ point.
|
|
|
|
+ Distributions are smoothed using kernel density estimation.
|
|
|
|
+ TSSs that occur
|
|
|
|
+\emph on
|
|
|
|
+within
|
|
|
|
+\emph default
|
|
|
|
+ peaks were excluded from this plot to avoid a large spike at zero that
|
|
|
|
+ would overshadow the rest of the distribution.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-1.92%
|
|
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-14.2%
|
|
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float table
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-H3K4me3
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Tabular
|
|
|
|
+<lyxtabular version="3" rows="4" columns="2">
|
|
|
|
+<features tabularvalignment="middle">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<column alignment="center" valignment="top">
|
|
|
|
+<row>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-6163
|
|
|
|
|
|
+Histone mark
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-2946
|
|
|
|
|
|
+Effective promoter radius
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-0.588%
|
|
|
|
|
|
+H3K4me2
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5739,45 +6118,38 @@ H3K4me3
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-6.57%
|
|
|
|
|
|
+1 kb
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
</row>
|
|
</row>
|
|
<row>
|
|
<row>
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-H3K27me3
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-18139
|
|
|
|
|
|
+H3K4me3
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-18967
|
|
|
|
|
|
+1 kb
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
|
|
+</row>
|
|
|
|
+<row>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-11.1%
|
|
|
|
|
|
+H3K27me3
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5786,7 +6158,7 @@ H3K27me3
|
|
\begin_inset Text
|
|
\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-22.5%
|
|
|
|
|
|
+2.5 kb
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5797,19 +6169,6 @@ H3K27me3
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Get the IDR threshold
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
@@ -5820,7 +6179,7 @@ Get the IDR threshold
|
|
status collapsed
|
|
status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Summary of peak-calling statistics.
|
|
|
|
|
|
+Effective promoter radius for each histone mark.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5828,18 +6187,28 @@ Summary of peak-calling statistics.
|
|
|
|
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "tab:peak-calling-summary"
|
|
|
|
|
|
+name "tab:effective-promoter-radius"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
-Summary of peak-calling statistics.
|
|
|
|
-
|
|
|
|
|
|
+Effective promoter radius for each histone mark.
|
|
|
|
+
|
|
\series default
|
|
\series default
|
|
-For each histone mark, the number of peaks called using SICER at an IDR
|
|
|
|
- threshold of ???, the mean width of those peaks, the fraction of the genome
|
|
|
|
- covered by peaks, and the fraction of reads in peaks (FRiP).
|
|
|
|
|
|
+ These values represent the approximate distance from transcription start
|
|
|
|
+ site positions within which an excess of peaks are found, as shown in Figure
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:near-promoter-peak-enrich"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5853,119 +6222,112 @@ For each histone mark, the number of peaks called using SICER at an IDR
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:peak-calling-summary"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Consider also showing figure for distance to nearest peak center, and reference
|
|
|
|
+ median peak size once that is known.
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- gives a summary of the peak calling statistics for each histone mark.
|
|
|
|
- Consistent with previous observations, all 3 histone marks occur in broad
|
|
|
|
- regions spanning many consecutive nucleosomes, rather than in sharp peaks
|
|
|
|
- as would be expected for a transcription factor or other molecule that
|
|
|
|
- binds to specific sites.
|
|
|
|
- This conclusion is further supported by Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:CCF-with-blacklist"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+H3K4 and H3K27 promoter methylation has broadly the expected correlation
|
|
|
|
+ with gene expression
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+H3K4me2 and H3K4me2 have previously been reported as activating marks whose
|
|
|
|
+ presence in a gene's promoter is associated with higher gene expression,
|
|
|
|
+ while H3K27me3 has been reported as inactivating
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "LaMere2016,LaMere2017"
|
|
|
|
+literal "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, in which a clear nucleosome-sized periodicity is visible in the cross-correlat
|
|
|
|
-ion value for each sample, indicating that each time a given mark is present
|
|
|
|
- on one histone, it is also likely to be found on adjacent histones as well.
|
|
|
|
- H3K27me3 enrichment in particular is substantially more broad than either
|
|
|
|
- H3K4 mark, with a mean peak width of almost 19,000 bp.
|
|
|
|
- This is also reflected in the periodicity observed in Figure
|
|
|
|
|
|
+.
|
|
|
|
+ The data are consistent with this characterization: genes whose promoters
|
|
|
|
+ (as defined by the radii for each histone mark listed in
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:CCF-with-blacklist"
|
|
|
|
|
|
+reference "tab:effective-promoter-radius"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, which remains strong much farther out for H3K27me3 than the other marks,
|
|
|
|
- showing H3K27me3 especially tends to be found on long runs of consecutive
|
|
|
|
- histones.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Ensure this figure uses the peak calls from the new analysis.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
|
|
+) overlap with a H3K4me2 or H3K4me3 peak tend to have higher expression
|
|
|
|
+ than those that don't, while H3K27me3 is likewise associated with lower
|
|
|
|
+ gene expression, as shown in
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:fpkm-by-peak"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+.
|
|
|
|
+ This pattern holds across all combinations of cell type and time point
|
|
|
|
+ (Welch's
|
|
|
|
+\emph on
|
|
|
|
+t
|
|
|
|
+\emph default
|
|
|
|
+-test, all
|
|
|
|
+\begin_inset Formula $p\mathrm{-values}\ll2.2\times10^{-16}$
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+).
|
|
|
|
+ The difference in average
|
|
|
|
+\begin_inset Formula $\log_{2}$
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Need a control: shuffle all peaks and repeat, N times.
|
|
|
|
- Do real vs shuffled control both in a top/bottom arrangement.
|
|
|
|
|
|
+FPKM
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ values when a peak overlaps the promoter is about
|
|
|
|
+\begin_inset Formula $+5.67$
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Consider counting TSS inside peaks as negative number indicating how far
|
|
|
|
-
|
|
|
|
-\emph on
|
|
|
|
-inside
|
|
|
|
-\emph default
|
|
|
|
- the peak the TSS is (i.e.
|
|
|
|
- distance to nearest non-peak area).
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
|
|
+ for H3K4me2,
|
|
|
|
+\begin_inset Formula $+5.76$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ for H3K4me2, and
|
|
|
|
+\begin_inset Formula $-4.00$
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+ for H3K27me3.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\begin_inset Flex TODO Note (inline)
|
|
\begin_inset Flex TODO Note (inline)
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-The H3K4 part of this figure is included in
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "LaMere2016"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- as Fig.
|
|
|
|
- S2.
|
|
|
|
- Do I need to do anything about that?
|
|
|
|
|
|
+This figure is generated from the old analysis.
|
|
|
|
+ Either note that in some way or re-generate it from the new peak calls.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5976,9 +6338,9 @@ literal "false"
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
\begin_inset Graphics
|
|
\begin_inset Graphics
|
|
- filename graphics/CD4-csaw/Promoter Peak Distance Profile-PAGE1-CROP.pdf
|
|
|
|
|
|
+ filename graphics/CD4-csaw/FPKM by Peak Violin Plots-CROP.pdf
|
|
lyxscale 50
|
|
lyxscale 50
|
|
- width 80col%
|
|
|
|
|
|
+ width 100col%
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -5993,7 +6355,7 @@ literal "false"
|
|
status collapsed
|
|
status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Enrichment of peaks in promoter neighborhoods.
|
|
|
|
|
|
+Expression distributions of genes with and without promoter peaks.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -6001,26 +6363,13 @@ Enrichment of peaks in promoter neighborhoods.
|
|
|
|
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:near-promoter-peak-enrich"
|
|
|
|
|
|
+name "fig:fpkm-by-peak"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
-Enrichment of peaks in promoter neighborhoods.
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-This plot shows the distribution of distances from each annotated transcription
|
|
|
|
- start site in the genome to the nearest called peak.
|
|
|
|
- Each line represents one combination of histone mark, cell type, and time
|
|
|
|
- point.
|
|
|
|
- Distributions are smoothed using kernel density estimation.
|
|
|
|
- TSSs that occur
|
|
|
|
-\emph on
|
|
|
|
-within
|
|
|
|
-\emph default
|
|
|
|
- peaks were excluded from this plot to avoid a large spike at zero that
|
|
|
|
- would overshadow the rest of the distribution.
|
|
|
|
|
|
+Expression distributions of genes with and without promoter peaks.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -6033,100 +6382,148 @@ within
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+Gene expression and promoter histone methylation patterns in naïve and memory
|
|
|
|
+ show convergence at day 14
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-\begin_inset Float table
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
|
|
+We hypothesized that if naïve cells had differentiated into memory cells
|
|
|
|
+ by Day 14, then their patterns of expression and histone modification should
|
|
|
|
+ converge with those of memory cells at Day 14.
|
|
|
|
+ Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:PCoA-promoters"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Tabular
|
|
|
|
-<lyxtabular version="3" rows="4" columns="2">
|
|
|
|
-<features tabularvalignment="middle">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<column alignment="center" valignment="top">
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ shows the patterns of variation in all 3 histone marks in the promoter
|
|
|
|
+ regions of the genome using
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Histone mark
|
|
|
|
|
|
+PCoA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Effective promoter radius
|
|
|
|
-\end_layout
|
|
|
|
|
|
+.
|
|
|
|
+ All 3 marks show a noticeable convergence between the naïve and memory
|
|
|
|
+ samples at day 14, visible as an overlapping of the day 14 groups on each
|
|
|
|
+ plot.
|
|
|
|
+ This is consistent with the counts of significantly differentially modified
|
|
|
|
+ promoters and estimates of the total numbers of differentially modified
|
|
|
|
+ promoters shown in Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:Number-signif-promoters"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ For all histone marks, evidence of differential modification between naïve
|
|
|
|
+ and memory samples was detected at every time point except day 14.
|
|
|
|
+ The day 14 convergence pattern is also present in the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-H3K4me2
|
|
|
|
|
|
+RNA-seq
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-1 kb
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ data (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:RNA-PCA-group"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+
|
|
|
|
+), albeit in the 2nd and 3rd principal coordinates, indicating that it is
|
|
|
|
+ not the most dominant pattern driving gene expression.
|
|
|
|
+ Taken together, the data show that promoter histone methylation for these
|
|
|
|
+ 3 histone marks and RNA expression for naïve and memory cells are most
|
|
|
|
+ similar at day 14, the furthest time point after activation.
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-H3K4me3
|
|
|
|
|
|
+MOFA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+
|
|
|
|
+ was also able to capture this day 14 convergence pattern in
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-1 kb
|
|
|
|
|
|
+LF
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-<row>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-H3K27me3
|
|
|
|
-\end_layout
|
|
|
|
|
|
+5 (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:mofa-lf-scatter"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
|
|
+
|
|
|
|
+), which accounts for shared variation across all 3 histone marks and the
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-2.5 kb
|
|
|
|
|
|
+RNA-seq
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-</lyxtabular>
|
|
|
|
|
|
+
|
|
|
|
+ data, confirming that this convergence is a coordinated pattern across
|
|
|
|
+ all 4 data sets.
|
|
|
|
+ While this observation does not prove that the naïve cells have differentiated
|
|
|
|
+ into memory cells at Day 14, it is consistent with that hypothesis.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+placement p
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me2-promoter-PCA-group-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-prom-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -6137,41 +6534,41 @@ H3K27me3
|
|
\begin_inset Caption Standard
|
|
\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Effective promoter radius for each histone mark.
|
|
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:PCoA-H3K4me2-prom"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+PCoA plot of H3K4me2 promoters, after subtracting surrogate variables
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "tab:effective-promoter-radius"
|
|
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\series bold
|
|
|
|
-Effective promoter radius for each histone mark.
|
|
|
|
|
|
+\begin_inset space \hfill{}
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\series default
|
|
|
|
- These values represent the approximate distance from transcription start
|
|
|
|
- site positions within which an excess of peaks are found, as shown in Figure
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:near-promoter-peak-enrich"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
|
|
-.
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me3-promoter-PCA-group-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-prom-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -6179,140 +6576,89 @@ noprefix "false"
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:PCoA-H3K4me3-prom"
|
|
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+PCoA plot of H3K4me3 promoters, after subtracting surrogate variables
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-All 3 histone marks tend to occur more often near promoter regions, as shown
|
|
|
|
- in Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:near-promoter-peak-enrich"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- The majority of each density distribution is flat, representing the background
|
|
|
|
- density of peaks genome-wide.
|
|
|
|
- Each distribution has a peak near zero, representing an enrichment of peaks
|
|
|
|
- close to
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- positions relative to the remainder of the genome.
|
|
|
|
- Interestingly, the
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
-radius
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\end_layout
|
|
|
|
|
|
- within which this enrichment occurs is not the same for every histone mark
|
|
|
|
- (Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:effective-promoter-radius"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status collapsed
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K27me3-promoter-PCA-group-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-prom-subfig
|
|
|
|
|
|
-).
|
|
|
|
- For H3K4me2 and H3K4me3, peaks are most enriched within 1
|
|
|
|
-\begin_inset space ~
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-kbp of
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
- positions, while for H3K27me3, enrichment is broader, extending to 2.5
|
|
|
|
-\begin_inset space ~
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
|
|
-kbp.
|
|
|
|
- These
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\series bold
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:PCoA-H3K27me3-prom"
|
|
|
|
|
|
-effective promoter radii
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- remain approximately the same across all combinations of experimental condition
|
|
|
|
- (cell type, time point, and donor), so they appear to be a property of
|
|
|
|
- the histone mark itself.
|
|
|
|
- Hence, these radii were used to define the promoter regions for each histone
|
|
|
|
- mark in all further analyses.
|
|
|
|
|
|
+PCoA plot of H3K27me3 promoters, after subtracting surrogate variables
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Consider also showing figure for distance to nearest peak center, and reference
|
|
|
|
- median peak size once that is known.
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_inset space \hfill{}
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-H3K4 and H3K27 promoter methylation has broadly the expected correlation
|
|
|
|
- with gene expression
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
status collapsed
|
|
status collapsed
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-This figure is generated from the old analysis.
|
|
|
|
- Either note that in some way or re-generate it from the new peak calls.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
\begin_inset Graphics
|
|
\begin_inset Graphics
|
|
- filename graphics/CD4-csaw/FPKM by Peak Violin Plots-CROP.pdf
|
|
|
|
- lyxscale 50
|
|
|
|
- width 100col%
|
|
|
|
|
|
+ filename graphics/CD4-csaw/RNA-seq/PCA-final-23-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 45col%
|
|
|
|
+ groupId pcoa-prom-subfig
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -6323,25 +6669,15 @@ This figure is generated from the old analysis.
|
|
\begin_inset Caption Standard
|
|
\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Expression distributions of genes with and without promoter peaks.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
|
|
|
|
|
|
+\series bold
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:fpkm-by-peak"
|
|
|
|
|
|
+name "fig:RNA-PCA-group"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-Expression distributions of genes with and without promoter peaks.
|
|
|
|
|
|
+RNA-seq PCoA showing principal coordinates 2 and 3.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -6354,84 +6690,39 @@ Expression distributions of genes with and without promoter peaks.
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-H3K4me2 and H3K4me2 have previously been reported as activating marks whose
|
|
|
|
- presence in a gene's promoter is associated with higher gene expression,
|
|
|
|
- while H3K27me3 has been reported as inactivating
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "LaMere2016,LaMere2017"
|
|
|
|
-literal "false"
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
|
|
-.
|
|
|
|
- The data are consistent with this characterization: genes whose promoters
|
|
|
|
- (as defined by the radii for each histone mark listed in
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:effective-promoter-radius"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+PCoA plots for promoter ChIP-seq and expression RNA-seq data
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-) overlap with a H3K4me2 or H3K4me3 peak tend to have higher expression
|
|
|
|
- than those that don't, while H3K27me3 is likewise associated with lower
|
|
|
|
- gene expression, as shown in
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:fpkm-by-peak"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
-.
|
|
|
|
- This pattern holds across all combinations of cell type and time point
|
|
|
|
- (Welch's
|
|
|
|
-\emph on
|
|
|
|
-t
|
|
|
|
-\emph default
|
|
|
|
--test, all
|
|
|
|
-\begin_inset Formula $p\mathrm{-values}\ll2.2\times10^{-16}$
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:PCoA-promoters"
|
|
|
|
|
|
-).
|
|
|
|
- The difference in average
|
|
|
|
-\begin_inset Formula $\log_{2}$
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-FPKM
|
|
|
|
|
|
+\series bold
|
|
|
|
+PCoA plots for promoter ChIP-seq and expression RNA-seq data
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- values when a peak overlaps the promoter is about
|
|
|
|
-\begin_inset Formula $+5.67$
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
- for H3K4me2,
|
|
|
|
-\begin_inset Formula $+5.76$
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\end_layout
|
|
|
|
|
|
- for H3K4me2, and
|
|
|
|
-\begin_inset Formula $-4.00$
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- for H3K27me3.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-Gene expression and promoter histone methylation patterns in naïve and memory
|
|
|
|
- show convergence at day 14
|
|
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -6461,7 +6752,7 @@ begin{landscape}
|
|
\begin_inset Float table
|
|
\begin_inset Float table
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -6839,312 +7130,78 @@ Day 14
|
|
\end_inset
|
|
\end_inset
|
|
</cell>
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
-\begin_inset Text
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-0
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-0
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
-\begin_inset Text
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-0
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-</cell>
|
|
|
|
-</row>
|
|
|
|
-</lyxtabular>
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Number of differentially modified promoters between naïve and memory cells
|
|
|
|
- at each time point after activation.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "tab:Number-signif-promoters"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-Number of differentially modified promoters between naïve and memory cells
|
|
|
|
- at each time point after activation.
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-This table shows both the number of differentially modified promoters detected
|
|
|
|
- at a 10% FDR threshold (left half), and the total number of differentially
|
|
|
|
- modified promoters as estimated using the method of
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Phipson2013"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- (right half).
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset ERT
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\backslash
|
|
|
|
-end{landscape}
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-}
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-We hypothesized that if naïve cells had differentiated into memory cells
|
|
|
|
- by Day 14, then their patterns of expression and histone modification should
|
|
|
|
- converge with those of memory cells at Day 14.
|
|
|
|
- Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:PCoA-promoters"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows the patterns of variation in all 3 histone marks in the promoter
|
|
|
|
- regions of the genome using
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-PCoA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- All 3 marks show a noticeable convergence between the naïve and memory
|
|
|
|
- samples at day 14, visible as an overlapping of the day 14 groups on each
|
|
|
|
- plot.
|
|
|
|
- This is consistent with the counts of significantly differentially modified
|
|
|
|
- promoters and estimates of the total numbers of differentially modified
|
|
|
|
- promoters shown in Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:Number-signif-promoters"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- For all histone marks, evidence of differential modification between naïve
|
|
|
|
- and memory samples was detected at every time point except day 14.
|
|
|
|
- The day 14 convergence pattern is also present in the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-RNA-seq
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- data (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:RNA-PCA-group"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-), albeit in the 2nd and 3rd principal coordinates, indicating that it is
|
|
|
|
- not the most dominant pattern driving gene expression.
|
|
|
|
- Taken together, the data show that promoter histone methylation for these
|
|
|
|
- 3 histone marks and RNA expression for naïve and memory cells are most
|
|
|
|
- similar at day 14, the furthest time point after activation.
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-MOFA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- was also able to capture this day 14 convergence pattern in
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-LF
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-5 (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:mofa-lf-scatter"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-), which accounts for shared variation across all 3 histone marks and the
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-RNA-seq
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- data, confirming that this convergence is a coordinated pattern across
|
|
|
|
- all 4 data sets.
|
|
|
|
- While this observation does not prove that the naïve cells have differentiated
|
|
|
|
- into memory cells at Day 14, it is consistent with that hypothesis.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-placement p
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me2-promoter-PCA-group-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-prom-subfig
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+\begin_inset Text
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:PCoA-H3K4me2-prom"
|
|
|
|
|
|
+0
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
|
|
-PCoA plot of H3K4me2 promoters, after subtracting surrogate variables
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+0
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
+</cell>
|
|
|
|
+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
|
|
+\begin_inset Text
|
|
|
|
|
|
-
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+0
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
+</cell>
|
|
|
|
+</row>
|
|
|
|
+</lyxtabular>
|
|
|
|
|
|
-
|
|
|
|
-\begin_inset space \hfill{}
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
status collapsed
|
|
status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me3-promoter-PCA-group-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-prom-subfig
|
|
|
|
|
|
+Number of differentially modified promoters between naïve and memory cells
|
|
|
|
+ at each time point after activation.
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "tab:Number-signif-promoters"
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:PCoA-H3K4me3-prom"
|
|
|
|
|
|
+Number of differentially modified promoters between naïve and memory cells
|
|
|
|
+ at each time point after activation.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+This table shows both the number of differentially modified promoters detected
|
|
|
|
+ at a 10% FDR threshold (left half), and the total number of differentially
|
|
|
|
+ modified promoters as estimated using the method of
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Phipson2013"
|
|
|
|
+literal "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-PCoA plot of H3K4me3 promoters, after subtracting surrogate variables
|
|
|
|
|
|
+ (right half).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -7157,177 +7214,256 @@ PCoA plot of H3K4me3 promoters, after subtracting surrogate variables
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K27me3-promoter-PCA-group-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-prom-subfig
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
+\backslash
|
|
|
|
+end{landscape}
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:PCoA-H3K27me3-prom"
|
|
|
|
|
|
+}
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-PCoA plot of H3K27me3 promoters, after subtracting surrogate variables
|
|
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+Effect of H3K4me2 and H3K4me3 promoter coverage upstream vs downstream of
|
|
|
|
+ TSS
|
|
|
|
+\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Need a better section title, for this and the next one.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\begin_inset space \hfill{}
|
|
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Make sure use of coverage/abundance/whatever is consistent.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/RNA-seq/PCA-final-23-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 45col%
|
|
|
|
- groupId pcoa-prom-subfig
|
|
|
|
|
|
+For the figures in this section and the next, the group labels are arbitrary,
|
|
|
|
+ so if time allows, it would be good to manually reorder them in a logical
|
|
|
|
+ way, e.g.
|
|
|
|
+ most upstream to most downstream.
|
|
|
|
+ If this is done, make sure to update the text with the correct group labels.
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+To test whether the position of a histone mark relative to a gene's
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:RNA-PCA-group"
|
|
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-RNA-seq PCoA showing principal coordinates 2 and 3.
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ was important, we looked at the
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+landscape
|
|
|
|
+\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+ChIP-seq
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ read coverage in naïve Day 0 samples within 5 kb of each gene's
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
|
|
+ by binning reads into 500-bp windows tiled across each promoter
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-PCoA plots for promoter ChIP-seq and expression RNA-seq data
|
|
|
|
|
|
+logCPM
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ values were calculated for the bins in each promoter and then the average
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:PCoA-promoters"
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+logCPM
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ for each promoter's bins was normalized to zero, such that the values represent
|
|
|
|
+ coverage relative to other regions of the same promoter rather than being
|
|
|
|
+ proportional to absolute read count.
|
|
|
|
+ The promoters were then clustered based on the normalized bin abundances
|
|
|
|
+ using
|
|
|
|
+\begin_inset Formula $k$
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\series bold
|
|
|
|
-PCoA plots for promoter ChIP-seq and expression RNA-seq data
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
|
|
+-means clustering with
|
|
|
|
+\begin_inset Formula $K=6$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+.
|
|
|
|
+ Different values of
|
|
|
|
+\begin_inset Formula $K$
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+ were also tested, but did not substantially change the interpretation of
|
|
|
|
+ the data.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+For H3K4me2, plotting the average bin abundances for each cluster reveals
|
|
|
|
+ a simple pattern (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:H3K4me2-neighborhood-clusters"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+): Cluster 5 represents a completely flat promoter coverage profile, likely
|
|
|
|
+ consisting of genes with no H3K4me2 methylation in the promoter.
|
|
|
|
+ All the other clusters represent a continuum of peak positions relative
|
|
|
|
+ to the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-Effect of H3K4me2 and H3K4me3 promoter coverage upstream vs downstream of
|
|
|
|
- TSS
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
|
|
+.
|
|
|
|
+ In order from must upstream to most downstream, they are Clusters 6, 4,
|
|
|
|
+ 3, 1, and 2.
|
|
|
|
+ There do not appear to be any clusters representing coverage patterns other
|
|
|
|
+ than lone peaks, such as coverage troughs or double peaks.
|
|
|
|
+ Next, all promoters were plotted in a
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Need a better section title, for this and the next one.
|
|
|
|
|
|
+PCA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ plot based on the same relative bin abundance data, and colored based on
|
|
|
|
+ cluster membership (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:H3K4me2-neighborhood-pca"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
|
|
+).
|
|
|
|
+ The
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Make sure use of coverage/abundance/whatever is consistent.
|
|
|
|
|
|
+PCA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ plot shows Cluster 5 (the
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\end_layout
|
|
|
|
|
|
+no peak
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
|
|
+ cluster) at the center, with the other clusters arranged in a counter-clockwise
|
|
|
|
+ arc around it in the order noted above, from most upstream peak to most
|
|
|
|
+ downstream.
|
|
|
|
+ Notably, the
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-For the figures in this section and the next, the group labels are arbitrary,
|
|
|
|
- so if time allows, it would be good to manually reorder them in a logical
|
|
|
|
- way, e.g.
|
|
|
|
- most upstream to most downstream.
|
|
|
|
- If this is done, make sure to update the text with the correct group labels.
|
|
|
|
-\end_layout
|
|
|
|
|
|
+clusters
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ form a single large
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+cloud
|
|
|
|
+\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ with no apparent separation between them, further supporting the conclusion
|
|
|
|
+ that these clusters represent an arbitrary partitioning of a continuous
|
|
|
|
+ distribution of promoter coverage landscapes.
|
|
|
|
+ While the clusters are a useful abstraction that aids in visualization,
|
|
|
|
+ they are ultimately not an accurate representation of the data.
|
|
|
|
+ The continuous nature of the distribution also explains why different values
|
|
|
|
+ of
|
|
|
|
+\begin_inset Formula $K$
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+ led to similar conclusions.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -7437,7 +7573,6 @@ name "fig:H3K4me2-neighborhood-pca"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
PCA of relative coverage depth, colored by K-means cluster membership.
|
|
PCA of relative coverage depth, colored by K-means cluster membership.
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -7599,189 +7734,6 @@ end{landscape}
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-To test whether the position of a histone mark relative to a gene's
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- was important, we looked at the
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-landscape
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- of
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-ChIP-seq
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- read coverage in naïve Day 0 samples within 5 kb of each gene's
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- by binning reads into 500-bp windows tiled across each promoter
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-logCPM
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- values were calculated for the bins in each promoter and then the average
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-logCPM
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- for each promoter's bins was normalized to zero, such that the values represent
|
|
|
|
- coverage relative to other regions of the same promoter rather than being
|
|
|
|
- proportional to absolute read count.
|
|
|
|
- The promoters were then clustered based on the normalized bin abundances
|
|
|
|
- using
|
|
|
|
-\begin_inset Formula $k$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
--means clustering with
|
|
|
|
-\begin_inset Formula $K=6$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- Different values of
|
|
|
|
-\begin_inset Formula $K$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- were also tested, but did not substantially change the interpretation of
|
|
|
|
- the data.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-For H3K4me2, plotting the average bin abundances for each cluster reveals
|
|
|
|
- a simple pattern (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:H3K4me2-neighborhood-clusters"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-): Cluster 5 represents a completely flat promoter coverage profile, likely
|
|
|
|
- consisting of genes with no H3K4me2 methylation in the promoter.
|
|
|
|
- All the other clusters represent a continuum of peak positions relative
|
|
|
|
- to the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- In order from must upstream to most downstream, they are Clusters 6, 4,
|
|
|
|
- 3, 1, and 2.
|
|
|
|
- There do not appear to be any clusters representing coverage patterns other
|
|
|
|
- than lone peaks, such as coverage troughs or double peaks.
|
|
|
|
- Next, all promoters were plotted in a
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-PCA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- plot based on the same relative bin abundance data, and colored based on
|
|
|
|
- cluster membership (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:H3K4me2-neighborhood-pca"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-).
|
|
|
|
- The
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-PCA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- plot shows Cluster 5 (the
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-no peak
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- cluster) at the center, with the other clusters arranged in a counter-clockwise
|
|
|
|
- arc around it in the order noted above, from most upstream peak to most
|
|
|
|
- downstream.
|
|
|
|
- Notably, the
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-clusters
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- form a single large
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-cloud
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- with no apparent separation between them, further supporting the conclusion
|
|
|
|
- that these clusters represent an arbitrary partitioning of a continuous
|
|
|
|
- distribution of promoter coverage landscapes.
|
|
|
|
- While the clusters are a useful abstraction that aids in visualization,
|
|
|
|
- they are ultimately not an accurate representation of the data.
|
|
|
|
- The continuous nature of the distribution also explains why different values
|
|
|
|
- of
|
|
|
|
-\begin_inset Formula $K$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- led to similar conclusions.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Flex TODO Note (inline)
|
|
\begin_inset Flex TODO Note (inline)
|
|
status open
|
|
status open
|
|
@@ -7925,23 +7877,49 @@ radius
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-TSS
|
|
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ may have a different degree of influence depending on whether it is upstream
|
|
|
|
+ or downstream of the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+All observations described above for H3K4me2
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+ChIP-seq
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- may have a different degree of influence depending on whether it is upstream
|
|
|
|
- or downstream of the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ also appear to hold for H3K4me3 as well (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:H3K4me3-neighborhood"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
|
|
+).
|
|
|
|
+ This is expected, since there is a high correlation between the positions
|
|
|
|
+ where both histone marks occur.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -8212,21 +8190,49 @@ end{landscape}
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+Promoter coverage H3K27me3
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-All observations described above for H3K4me2
|
|
|
|
|
|
+Unlike both H3K4 marks, whose main patterns of variation appear directly
|
|
|
|
+ related to the size and position of a single peak within the promoter,
|
|
|
|
+ the patterns of H3K27me3 methylation in promoters are more complex (Figure
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:H3K27me3-neighborhood"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ Once again looking at the relative coverage in a 500-bp wide bins in a
|
|
|
|
+ 5kb radius around each
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-ChIP-seq
|
|
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- also appear to hold for H3K4me3 as well (Figure
|
|
|
|
|
|
+, promoters were clustered based on the normalized relative coverage values
|
|
|
|
+ in each bin using
|
|
|
|
+\begin_inset Formula $k$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+-means clustering with
|
|
|
|
+\begin_inset Formula $K=6$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ (Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:H3K4me3-neighborhood"
|
|
|
|
|
|
+reference "fig:H3K27me3-neighborhood-clusters"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
@@ -8234,12 +8240,106 @@ noprefix "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
).
|
|
).
|
|
- This is expected, since there is a high correlation between the positions
|
|
|
|
- where both histone marks occur.
|
|
|
|
|
|
+ This time, 3
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+axes
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ of variation can be observed, each represented by 2 clusters with opposing
|
|
|
|
+ patterns.
|
|
|
|
+ The first axis is greater upstream coverage (Cluster 1) vs.
|
|
|
|
+ greater downstream coverage (Cluster 3); the second axis is the coverage
|
|
|
|
+ at the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-Promoter coverage H3K27me3
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ itself: peak (Cluster 4) or trough (Cluster 2); lastly, the third axis
|
|
|
|
+ represents a trough upstream of the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ (Cluster 5) vs.
|
|
|
|
+ downstream of the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TSS
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ (Cluster 6).
|
|
|
|
+ Referring to these opposing pairs of clusters as axes of variation is justified
|
|
|
|
+, because they correspond precisely to the first 3
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+PC
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ in the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+PCA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ plot of the relative coverage values (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:H3K27me3-neighborhood-pca"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ The
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+PCA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ plot reveals that as in the case of H3K4me2, all the
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+clusters
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ are really just sections of a single connected cloud rather than discrete
|
|
|
|
+ clusters.
|
|
|
|
+ The cloud is approximately ellipsoid-shaped, with each PC being an axis
|
|
|
|
+ of the ellipse, and each cluster consisting of a pyramidal section of the
|
|
|
|
+ ellipsoid.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -8469,214 +8569,66 @@ kbp downstream, and the logCPM values were normalized within each promoter
|
|
\begin_inset Formula $K=6$
|
|
\begin_inset Formula $K=6$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-,
|
|
|
|
-\series bold
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-and the average bin values were plotted for each cluster (a).
|
|
|
|
- The
|
|
|
|
-\begin_inset Formula $x$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
--axis is the genomic coordinate of each bin relative to the the transcription
|
|
|
|
- start site, and the
|
|
|
|
-\begin_inset Formula $y$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
--axis is the mean relative coverage depth of that bin across all promoters
|
|
|
|
- in the cluster.
|
|
|
|
- Each line represents the average
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-shape
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- of the promoter coverage for promoters in that cluster.
|
|
|
|
- PCA was performed on the same data, and the first two PCs were plotted,
|
|
|
|
- coloring each point by its K-means cluster identity (b).
|
|
|
|
- For each cluster, the distribution of gene expression values was plotted
|
|
|
|
- (c).
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset ERT
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\backslash
|
|
|
|
-end{landscape}
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-}
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-Unlike both H3K4 marks, whose main patterns of variation appear directly
|
|
|
|
- related to the size and position of a single peak within the promoter,
|
|
|
|
- the patterns of H3K27me3 methylation in promoters are more complex (Figure
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:H3K27me3-neighborhood"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-).
|
|
|
|
- Once again looking at the relative coverage in a 500-bp wide bins in a
|
|
|
|
- 5kb radius around each
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-, promoters were clustered based on the normalized relative coverage values
|
|
|
|
- in each bin using
|
|
|
|
-\begin_inset Formula $k$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
--means clustering with
|
|
|
|
-\begin_inset Formula $K=6$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:H3K27me3-neighborhood-clusters"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+,
|
|
|
|
+\series bold
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+and the average bin values were plotted for each cluster (a).
|
|
|
|
+ The
|
|
|
|
+\begin_inset Formula $x$
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+-axis is the genomic coordinate of each bin relative to the the transcription
|
|
|
|
+ start site, and the
|
|
|
|
+\begin_inset Formula $y$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- This time, 3
|
|
|
|
|
|
+-axis is the mean relative coverage depth of that bin across all promoters
|
|
|
|
+ in the cluster.
|
|
|
|
+ Each line represents the average
|
|
\begin_inset Quotes eld
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-axes
|
|
|
|
|
|
+shape
|
|
\begin_inset Quotes erd
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- of variation can be observed, each represented by 2 clusters with opposing
|
|
|
|
- patterns.
|
|
|
|
- The first axis is greater upstream coverage (Cluster 1) vs.
|
|
|
|
- greater downstream coverage (Cluster 3); the second axis is the coverage
|
|
|
|
- at the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
|
|
+ of the promoter coverage for promoters in that cluster.
|
|
|
|
+ PCA was performed on the same data, and the first two PCs were plotted,
|
|
|
|
+ coloring each point by its K-means cluster identity (b).
|
|
|
|
+ For each cluster, the distribution of gene expression values was plotted
|
|
|
|
+ (c).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- itself: peak (Cluster 4) or trough (Cluster 2); lastly, the third axis
|
|
|
|
- represents a trough upstream of the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- (Cluster 5) vs.
|
|
|
|
- downstream of the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-TSS
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- (Cluster 6).
|
|
|
|
- Referring to these opposing pairs of clusters as axes of variation is justified
|
|
|
|
-, because they correspond precisely to the first 3
|
|
|
|
-\begin_inset Flex Glossary Term (pl)
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset ERT
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-PC
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
- in the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-PCA
|
|
|
|
|
|
+\backslash
|
|
|
|
+end{landscape}
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- plot of the relative coverage values (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:H3K27me3-neighborhood-pca"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-).
|
|
|
|
- The
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-PCA
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+}
|
|
|
|
+\end_layout
|
|
|
|
|
|
- plot reveals that as in the case of H3K4me2, all the
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-clusters
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
- are really just sections of a single connected cloud rather than discrete
|
|
|
|
- clusters.
|
|
|
|
- The cloud is approximately ellipsoid-shaped, with each PC being an axis
|
|
|
|
- of the ellipse, and each cluster consisting of a pyramidal section of the
|
|
|
|
- ellipsoid.
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -9077,83 +9029,6 @@ LF
|
|
would not be expected to converge in this way after activation.
|
|
would not be expected to converge in this way after activation.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/LaMere2016_fig8.pdf
|
|
|
|
- lyxscale 50
|
|
|
|
- width 60col%
|
|
|
|
- groupId colwidth
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Lamere 2016 Figure 8 “Model for the role of H3K4 methylation during CD4
|
|
|
|
- T-cell activation.
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:Lamere2016-Fig8"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-Lamere 2016 Figure 8
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "LaMere2016"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-,
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-Model for the role of H3K4 methylation during CD4 T-cell activation.
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-Reproduced with permission.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
In H3K4me2, H3K4me3, and
|
|
In H3K4me2, H3K4me3, and
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
@@ -9252,14 +9127,91 @@ SVA
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-PCoA
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+PCoA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ to reveal interesting behaviors in the data that were previously only detectabl
|
|
|
|
+e by a detailed manual analysis.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/LaMere2016_fig8.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ width 60col%
|
|
|
|
+ groupId colwidth
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Lamere 2016 Figure 8 “Model for the role of H3K4 methylation during CD4
|
|
|
|
+ T-cell activation.
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:Lamere2016-Fig8"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Lamere 2016 Figure 8
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "LaMere2016"
|
|
|
|
+literal "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+,
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+Model for the role of H3K4 methylation during CD4 T-cell activation.
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+Reproduced with permission.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- to reveal interesting behaviors in the data that were previously only detectabl
|
|
|
|
-e by a detailed manual analysis.
|
|
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -9464,104 +9416,6 @@ TSS
|
|
Workflow
|
|
Workflow
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset ERT
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\backslash
|
|
|
|
-afterpage{
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\backslash
|
|
|
|
-begin{landscape}
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/rulegraphs/rulegraph-all.pdf
|
|
|
|
- lyxscale 50
|
|
|
|
- width 100col%
|
|
|
|
- height 95theight%
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Argument 1
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Dependency graph of steps in reproducible workflow.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:rulegraph"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-Dependency graph of steps in reproducible workflow.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset ERT
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\backslash
|
|
|
|
-end{landscape}
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-}
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
The analyses described in this chapter were organized into a reproducible
|
|
The analyses described in this chapter were organized into a reproducible
|
|
workflow using the Snakemake workflow management system
|
|
workflow using the Snakemake workflow management system
|
|
@@ -9698,6 +9552,104 @@ noprefix "false"
|
|
have completed, thereby automating the entire workflow from start to finish.
|
|
have completed, thereby automating the entire workflow from start to finish.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+afterpage{
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+begin{landscape}
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/rulegraphs/rulegraph-all.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ width 100col%
|
|
|
|
+ height 95theight%
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Dependency graph of steps in reproducible workflow.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:rulegraph"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Dependency graph of steps in reproducible workflow.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+end{landscape}
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+}
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
In addition to simply making it easier to organize the steps in the analysis,
|
|
In addition to simply making it easier to organize the steps in the analysis,
|
|
structuring the analysis as a workflow allowed for some analysis strategies
|
|
structuring the analysis as a workflow allowed for some analysis strategies
|
|
@@ -10678,6 +10630,49 @@ DNA methylation arrays are a relatively new kind of assay that uses microarrays
|
|
by thymidines and interrogates the level of unmethylated DNA.
|
|
by thymidines and interrogates the level of unmethylated DNA.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+After normalization, these two probe intensities are summarized in one of
|
|
|
|
+ two ways, each with advantages and disadvantages.
|
|
|
|
+ β
|
|
|
|
+\series bold
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+values, interpreted as fraction of DNA copies methylated, range from 0 to
|
|
|
|
+ 1.
|
|
|
|
+ β
|
|
|
|
+\series bold
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+values are conceptually easy to interpret, but the constrained range makes
|
|
|
|
+ them unsuitable for linear modeling, and their error distributions are
|
|
|
|
+ highly non-normal, which also frustrates linear modeling.
|
|
|
|
+ M-values, interpreted as the log ratio of methylated to unmethylated copies,
|
|
|
|
+ are computed by mapping the beta values from
|
|
|
|
+\begin_inset Formula $[0,1]$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ onto
|
|
|
|
+\begin_inset Formula $(-\infty,+\infty)$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ using a sigmoid curve (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:Sigmoid-beta-m-mapping"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ This transformation results in values with better statistical properties:
|
|
|
|
+ the unconstrained range is suitable for linear modeling, and the error
|
|
|
|
+ distributions are more normal.
|
|
|
|
+ Hence, most linear modeling and other statistical testing on methylation
|
|
|
|
+ arrays is performed using M-values.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
@@ -10713,18 +10708,13 @@ Sigmoid shape of the mapping between β and M values.
|
|
|
|
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
-name "fig:Sigmoid-beta-m-mapping"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-Sigmoid shape of the mapping between β and M values.
|
|
|
|
-\end_layout
|
|
|
|
|
|
+name "fig:Sigmoid-beta-m-mapping"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
|
|
+\series bold
|
|
|
|
+Sigmoid shape of the mapping between β and M values.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -10732,47 +10722,9 @@ Sigmoid shape of the mapping between β and M values.
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-After normalization, these two probe intensities are summarized in one of
|
|
|
|
- two ways, each with advantages and disadvantages.
|
|
|
|
- β
|
|
|
|
-\series bold
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-values, interpreted as fraction of DNA copies methylated, range from 0 to
|
|
|
|
- 1.
|
|
|
|
- β
|
|
|
|
-\series bold
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
-values are conceptually easy to interpret, but the constrained range makes
|
|
|
|
- them unsuitable for linear modeling, and their error distributions are
|
|
|
|
- highly non-normal, which also frustrates linear modeling.
|
|
|
|
- M-values, interpreted as the log ratio of methylated to unmethylated copies,
|
|
|
|
- are computed by mapping the beta values from
|
|
|
|
-\begin_inset Formula $[0,1]$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- onto
|
|
|
|
-\begin_inset Formula $(-\infty,+\infty)$
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- using a sigmoid curve (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:Sigmoid-beta-m-mapping"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
-).
|
|
|
|
- This transformation results in values with better statistical properties:
|
|
|
|
- the unconstrained range is suitable for linear modeling, and the error
|
|
|
|
- distributions are more normal.
|
|
|
|
- Hence, most linear modeling and other statistical testing on methylation
|
|
|
|
- arrays is performed using M-values.
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -12028,11 +11980,78 @@ Reconsider subsection organization?
|
|
Separate normalization with RMA introduces unwanted biases in classification
|
|
Separate normalization with RMA introduces unwanted biases in classification
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+To demonstrate the problem with non-single-channel normalization methods,
|
|
|
|
+ we considered the problem of training a classifier to distinguish
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+TX
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ from
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+AR
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ using the samples from the internal set as training data, evaluating performanc
|
|
|
|
+e on the external set.
|
|
|
|
+ First, training and evaluation were performed after normalizing all array
|
|
|
|
+ samples together as a single set using
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, and second, the internal samples were normalized separately from the external
|
|
|
|
+ samples and the training and evaluation were repeated.
|
|
|
|
+ For each sample in the validation set, the classifier probabilities from
|
|
|
|
+ both classifiers were plotted against each other (Fig.
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:Classifier-probabilities-RMA"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ As expected, separate normalization biases the classifier probabilities,
|
|
|
|
+ resulting in several misclassifications.
|
|
|
|
+ In this case, the bias from separate normalization causes the classifier
|
|
|
|
+ to assign a lower probability of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+AR
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ to every sample.
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -12096,32 +12115,55 @@ The PAM classifier algorithm was trained on the training set of arrays to
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+fRMA and SCAN maintain classification performance while eliminating dependence
|
|
|
|
+ on normalization strategy
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-To demonstrate the problem with non-single-channel normalization methods,
|
|
|
|
- we considered the problem of training a classifier to distinguish
|
|
|
|
|
|
+For internal validation, the 6 methods' AUC values ranged from 0.816 to 0.891,
|
|
|
|
+ as shown in Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:AUC-PAM"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ Among the non-single-channel normalizations, dChip outperformed
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-TX
|
|
|
|
|
|
+RMA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- from
|
|
|
|
|
|
+, while
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-AR
|
|
|
|
|
|
+GRSN
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- using the samples from the internal set as training data, evaluating performanc
|
|
|
|
-e on the external set.
|
|
|
|
- First, training and evaluation were performed after normalizing all array
|
|
|
|
- samples together as a single set using
|
|
|
|
|
|
+ reduced the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+AUC
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ values for both dChip and
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -12131,48 +12173,147 @@ RMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, and second, the internal samples were normalized separately from the external
|
|
|
|
- samples and the training and evaluation were repeated.
|
|
|
|
- For each sample in the validation set, the classifier probabilities from
|
|
|
|
- both classifiers were plotted against each other (Fig.
|
|
|
|
-
|
|
|
|
|
|
+.
|
|
|
|
+ Both single-channel methods,
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ and
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+SCAN
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, slightly outperformed
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, with
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ ahead of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+SCAN
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ However, the difference between
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ and
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ is still quite small.
|
|
|
|
+ Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:Classifier-probabilities-RMA"
|
|
|
|
|
|
+reference "fig:ROC-PAM-int"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- As expected, separate normalization biases the classifier probabilities,
|
|
|
|
- resulting in several misclassifications.
|
|
|
|
- In this case, the bias from separate normalization causes the classifier
|
|
|
|
- to assign a lower probability of
|
|
|
|
|
|
+ shows that the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+ROC
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ curves for
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, dChip, and
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ look very similar and relatively smooth, while both
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-AR
|
|
|
|
|
|
+GRSN
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- to every sample.
|
|
|
|
-
|
|
|
|
|
|
+ curves and the curve for
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+SCAN
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-fRMA and SCAN maintain classification performance while eliminating dependence
|
|
|
|
- on normalization strategy
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ have a more jagged appearance.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -12307,7 +12448,7 @@ ROC curves were generated for PAM classification of AR vs TX after 6 different
|
|
\begin_inset Float table
|
|
\begin_inset Float table
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -12904,49 +13045,39 @@ noprefix "false"
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-For internal validation, the 6 methods' AUC values ranged from 0.816 to 0.891,
|
|
|
|
- as shown in Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:AUC-PAM"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- Among the non-single-channel normalizations, dChip outperformed
|
|
|
|
|
|
+For external validation, as expected, all the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-RMA
|
|
|
|
|
|
+AUC
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, while
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GRSN
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ values are lower than the internal validations, ranging from 0.642 to 0.750
|
|
|
|
+ (Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:AUC-PAM"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- reduced the
|
|
|
|
|
|
+).
|
|
|
|
+ With or without
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-AUC
|
|
|
|
|
|
+GRSN
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- values for both dChip and
|
|
|
|
|
|
+,
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -12956,28 +13087,29 @@ RMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- Both single-channel methods,
|
|
|
|
|
|
+ shows its dominance over dChip in this more challenging test.
|
|
|
|
+ Unlike in the internal validation,
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-fRMA
|
|
|
|
|
|
+GRSN
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- and
|
|
|
|
|
|
+ actually improves the classifier performance for
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-SCAN
|
|
|
|
|
|
+RMA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, slightly outperformed
|
|
|
|
|
|
+, although it does not for dChip.
|
|
|
|
+ Once again, both single-channel methods perform about on par with
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -12997,7 +13129,7 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- ahead of
|
|
|
|
|
|
+ performing slightly better and
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13007,39 +13139,18 @@ SCAN
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- However, the difference between
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-RMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- and
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- is still quite small.
|
|
|
|
|
|
+ performing a bit worse.
|
|
Figure
|
|
Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:ROC-PAM-int"
|
|
|
|
|
|
+reference "fig:ROC-PAM-ext"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- shows that the
|
|
|
|
|
|
+ shows the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13049,115 +13160,20 @@ ROC
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- curves for
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-RMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-, dChip, and
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- look very similar and relatively smooth, while both
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GRSN
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- curves and the curve for
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-SCAN
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- have a more jagged appearance.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-For external validation, as expected, all the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-AUC
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- values are lower than the internal validations, ranging from 0.642 to 0.750
|
|
|
|
- (Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:AUC-PAM"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-).
|
|
|
|
- With or without
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GRSN
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-,
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-RMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows its dominance over dChip in this more challenging test.
|
|
|
|
- Unlike in the internal validation,
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GRSN
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- actually improves the classifier performance for
|
|
|
|
|
|
+ curves for the external validation test.
|
|
|
|
+ As expected, none of them are as clean-looking as the internal validation
|
|
|
|
+
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-RMA
|
|
|
|
|
|
+ROC
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, although it does not for dChip.
|
|
|
|
- Once again, both single-channel methods perform about on par with
|
|
|
|
|
|
+ curves.
|
|
|
|
+ The curves for
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13167,7 +13183,7 @@ RMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, with
|
|
|
|
|
|
+, RMA+GRSN, and
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13177,83 +13193,94 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- performing slightly better and
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-SCAN
|
|
|
|
|
|
+ all look similar, while the other curves look more divergent.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- performing a bit worse.
|
|
|
|
- Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:ROC-PAM-ext"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-ROC
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+fRMA with custom-generated vectors enables single-channel normalization
|
|
|
|
+ on hthgu133pluspm platform
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- curves for the external validation test.
|
|
|
|
- As expected, none of them are as clean-looking as the internal validation
|
|
|
|
-
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+In order to enable use of
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-ROC
|
|
|
|
|
|
+fRMA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- curves.
|
|
|
|
- The curves for
|
|
|
|
|
|
+ to normalize hthgu133pluspm, a custom set of
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-RMA
|
|
|
|
|
|
+fRMA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, RMA+GRSN, and
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+ vectors was created.
|
|
|
|
+ First, an appropriate batch size was chosen by looking at the number of
|
|
|
|
+ batches and number of samples included as a function of batch size (Figure
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:frmatools-batch-size"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ For a given batch size, all batches with fewer samples that the chosen
|
|
|
|
+ size must be ignored during training, while larger batches must be randomly
|
|
|
|
+ downsampled to the chosen size.
|
|
|
|
+ Hence, the number of samples included for a given batch size equals the
|
|
|
|
+ batch size times the number of batches with at least that many samples.
|
|
|
|
+ From Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:batch-size-samples"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- all look similar, while the other curves look more divergent.
|
|
|
|
-\end_layout
|
|
|
|
|
|
+, it is apparent that that a batch size of 8 maximizes the number of samples
|
|
|
|
+ included in training.
|
|
|
|
+ Increasing the batch size beyond this causes too many smaller batches to
|
|
|
|
+ be excluded, reducing the total number of samples for both tissue types.
|
|
|
|
+ However, a batch size of 8 is not necessarily optimal.
|
|
|
|
+ The article introducing frmaTools concluded that it was highly advantageous
|
|
|
|
+ to use a smaller batch size in order to include more batches, even at the
|
|
|
|
+ expense of including fewer total samples in training
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "McCall2011"
|
|
|
|
+literal "false"
|
|
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-fRMA with custom-generated vectors enables single-channel normalization
|
|
|
|
- on hthgu133pluspm platform
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ To strike an appropriate balance between more batches and more samples,
|
|
|
|
+ a batch size of 5 was chosen.
|
|
|
|
+ For both blood and biopsy samples, this increased the number of batches
|
|
|
|
+ included by 10, with only a modest reduction in the number of samples compared
|
|
|
|
+ to a batch size of 8.
|
|
|
|
+ With a batch size of 5, 26 batches of biopsy samples and 46 batches of
|
|
|
|
+ blood samples were available.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -13393,7 +13420,7 @@ For batch sizes ranging from 3 to 15, the number of batches (a) and samples
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-In order to enable use of
|
|
|
|
|
|
+Since
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13403,7 +13430,14 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- to normalize hthgu133pluspm, a custom set of
|
|
|
|
|
|
+ training requires equal-size batches, larger batches are downsampled randomly.
|
|
|
|
+ This introduces a nondeterministic step in the generation of normalization
|
|
|
|
+ vectors.
|
|
|
|
+ To show that this randomness does not substantially change the outcome,
|
|
|
|
+ the random downsampling and subsequent vector learning was repeated 5 times,
|
|
|
|
+ with a different random seed each time.
|
|
|
|
+ 20 samples were selected at random as a test set and normalized with each
|
|
|
|
+ of the 5 sets of
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13413,58 +13447,67 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- vectors was created.
|
|
|
|
- First, an appropriate batch size was chosen by looking at the number of
|
|
|
|
- batches and number of samples included as a function of batch size (Figure
|
|
|
|
-
|
|
|
|
|
|
+ normalization vectors as well as ordinary RMA, and the normalized expression
|
|
|
|
+ values were compared across normalizations.
|
|
|
|
+ Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:frmatools-batch-size"
|
|
|
|
|
|
+reference "fig:m-bx-violin"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- For a given batch size, all batches with fewer samples that the chosen
|
|
|
|
- size must be ignored during training, while larger batches must be randomly
|
|
|
|
- downsampled to the chosen size.
|
|
|
|
- Hence, the number of samples included for a given batch size equals the
|
|
|
|
- batch size times the number of batches with at least that many samples.
|
|
|
|
- From Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:batch-size-samples"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+ shows a summary of these comparisons for biopsy samples.
|
|
|
|
+ Comparing RMA to each of the 5
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, it is apparent that that a batch size of 8 maximizes the number of samples
|
|
|
|
- included in training.
|
|
|
|
- Increasing the batch size beyond this causes too many smaller batches to
|
|
|
|
- be excluded, reducing the total number of samples for both tissue types.
|
|
|
|
- However, a batch size of 8 is not necessarily optimal.
|
|
|
|
- The article introducing frmaTools concluded that it was highly advantageous
|
|
|
|
- to use a smaller batch size in order to include more batches, even at the
|
|
|
|
- expense of including fewer total samples in training
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "McCall2011"
|
|
|
|
-literal "false"
|
|
|
|
|
|
+ normalizations, the distribution of log ratios is somewhat wide, indicating
|
|
|
|
+ that the normalizations disagree on the expression values of a fair number
|
|
|
|
+ of probe sets.
|
|
|
|
+ In contrast, comparisons of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- To strike an appropriate balance between more batches and more samples,
|
|
|
|
- a batch size of 5 was chosen.
|
|
|
|
- For both blood and biopsy samples, this increased the number of batches
|
|
|
|
- included by 10, with only a modest reduction in the number of samples compared
|
|
|
|
- to a batch size of 8.
|
|
|
|
- With a batch size of 5, 26 batches of biopsy samples and 46 batches of
|
|
|
|
- blood samples were available.
|
|
|
|
|
|
+ against
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, the vast majority of probe sets have very small log ratios, indicating
|
|
|
|
+ a very high agreement between the normalized values generated by the two
|
|
|
|
+ normalizations.
|
|
|
|
+ This shows that the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+fRMA
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ normalization's behavior is not very sensitive to the random downsampling
|
|
|
|
+ of larger batches during training.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -13505,7 +13548,6 @@ name "fig:m-bx-violin"
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
Violin plot of inter-normalization log ratios for biopsy samples.
|
|
Violin plot of inter-normalization log ratios for biopsy samples.
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -13551,7 +13593,6 @@ name "fig:m-pax-violin"
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
Violin plot of inter-normalization log ratios for blood samples.
|
|
Violin plot of inter-normalization log ratios for blood samples.
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -13606,24 +13647,44 @@ Each of 20 randomly selected samples was normalized with RMA and with 5
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Since
|
|
|
|
|
|
+Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:ma-bx-rma-frma"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ shows an MA plot of the RMA-normalized values against the fRMA-normalized
|
|
|
|
+ values for the same probe sets and arrays, corresponding to the first row
|
|
|
|
+ of Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:m-bx-violin"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ This MA plot shows that not only is there a wide distribution of M-values,
|
|
|
|
+ but the trend of M-values is dependent on the average normalized intensity.
|
|
|
|
+ This is expected, since the overall trend represents the differences in
|
|
|
|
+ the quantile normalization step.
|
|
|
|
+ When running
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-fRMA
|
|
|
|
|
|
+RMA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- training requires equal-size batches, larger batches are downsampled randomly.
|
|
|
|
- This introduces a nondeterministic step in the generation of normalization
|
|
|
|
- vectors.
|
|
|
|
- To show that this randomness does not substantially change the outcome,
|
|
|
|
- the random downsampling and subsequent vector learning was repeated 5 times,
|
|
|
|
- with a different random seed each time.
|
|
|
|
- 20 samples were selected at random as a test set and normalized with each
|
|
|
|
- of the 5 sets of
|
|
|
|
|
|
+, only the quantiles for these specific 20 arrays are used, while for
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13633,20 +13694,18 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- normalization vectors as well as ordinary RMA, and the normalized expression
|
|
|
|
- values were compared across normalizations.
|
|
|
|
|
|
+ the quantile distribution is taking from all arrays used in training.
|
|
Figure
|
|
Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:m-bx-violin"
|
|
|
|
|
|
+reference "fig:ma-bx-frma-frma"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- shows a summary of these comparisons for biopsy samples.
|
|
|
|
- Comparing RMA to each of the 5
|
|
|
|
|
|
+ shows a similar MA plot comparing 2 different
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13656,20 +13715,54 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- normalizations, the distribution of log ratios is somewhat wide, indicating
|
|
|
|
- that the normalizations disagree on the expression values of a fair number
|
|
|
|
- of probe sets.
|
|
|
|
- In contrast, comparisons of
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
+ normalizations, corresponding to the 6th row of Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:m-bx-violin"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
-\end_layout
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ The MA plot is very tightly centered around zero with no visible trend.
|
|
|
|
+ Figures
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:m-pax-violin"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+,
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:MA-PAX-rma-frma"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, and
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:ma-bx-frma-frma"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- against
|
|
|
|
|
|
+ show exactly the same information for the blood samples, once again comparing
|
|
|
|
+ the normalized expression values between normalizations for all probe sets
|
|
|
|
+ across 20 randomly selected test arrays.
|
|
|
|
+ Once again, there is a wider distribution of log ratios between RMA-normalized
|
|
|
|
+ values and fRMA-normalized, and a much tighter distribution when comparing
|
|
|
|
+ different
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13679,10 +13772,7 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, the vast majority of probe sets have very small log ratios, indicating
|
|
|
|
- a very high agreement between the normalized values generated by the two
|
|
|
|
- normalizations.
|
|
|
|
- This shows that the
|
|
|
|
|
|
+ normalizations to each other, indicating that the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -13692,15 +13782,15 @@ fRMA
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- normalization's behavior is not very sensitive to the random downsampling
|
|
|
|
- of larger batches during training.
|
|
|
|
|
|
+ training process is robust to random batch downsampling for the blood samples
|
|
|
|
+ as well.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -13777,7 +13867,6 @@ name "fig:ma-bx-frma-frma"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
fRMA vs fRMA for biopsy samples.
|
|
fRMA vs fRMA for biopsy samples.
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -13929,148 +14018,51 @@ fRMA vs fRMA
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:ma-bx-rma-frma"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows an MA plot of the RMA-normalized values against the fRMA-normalized
|
|
|
|
- values for the same probe sets and arrays, corresponding to the first row
|
|
|
|
- of Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:m-bx-violin"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- This MA plot shows that not only is there a wide distribution of M-values,
|
|
|
|
- but the trend of M-values is dependent on the average normalized intensity.
|
|
|
|
- This is expected, since the overall trend represents the differences in
|
|
|
|
- the quantile normalization step.
|
|
|
|
- When running
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-RMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-, only the quantiles for these specific 20 arrays are used, while for
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- the quantile distribution is taking from all arrays used in training.
|
|
|
|
- Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:ma-bx-frma-frma"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows a similar MA plot comparing 2 different
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+SVA, voom, and array weights improve model fit for methylation array data
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- normalizations, corresponding to the 6th row of Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:m-bx-violin"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- The MA plot is very tightly centered around zero with no visible trend.
|
|
|
|
- Figures
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:m-pax-violin"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-,
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:MA-PAX-rma-frma"
|
|
|
|
|
|
+reference "fig:meanvar-basic"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, and
|
|
|
|
|
|
+ shows the relationship between the mean M-value and the standard deviation
|
|
|
|
+ calculated for each probe in the methylation array data set.
|
|
|
|
+ A few features of the data are apparent.
|
|
|
|
+ First, the data are very strongly bimodal, with peaks in the density around
|
|
|
|
+ M-values of +4 and -4.
|
|
|
|
+ These modes correspond to methylation sites that are nearly 100% methylated
|
|
|
|
+ and nearly 100% unmethylated, respectively.
|
|
|
|
+ The strong bimodality indicates that a majority of probes interrogate sites
|
|
|
|
+ that fall into one of these two categories.
|
|
|
|
+ The points in between these modes represent sites that are either partially
|
|
|
|
+ methylated in many samples, or are fully methylated in some samples and
|
|
|
|
+ fully unmethylated in other samples, or some combination.
|
|
|
|
+ The next visible feature of the data is the W-shaped variance trend.
|
|
|
|
+ The upticks in the variance trend on either side are expected, based on
|
|
|
|
+ the sigmoid transformation exaggerating small differences at extreme M-values
|
|
|
|
+ (Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:ma-bx-frma-frma"
|
|
|
|
|
|
+reference "fig:Sigmoid-beta-m-mapping"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- show exactly the same information for the blood samples, once again comparing
|
|
|
|
- the normalized expression values between normalizations for all probe sets
|
|
|
|
- across 20 randomly selected test arrays.
|
|
|
|
- Once again, there is a wider distribution of log ratios between RMA-normalized
|
|
|
|
- values and fRMA-normalized, and a much tighter distribution when comparing
|
|
|
|
- different
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- normalizations to each other, indicating that the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-fRMA
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- training process is robust to random batch downsampling for the blood samples
|
|
|
|
- as well.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-SVA, voom, and array weights improve model fit for methylation array data
|
|
|
|
|
|
+).
|
|
|
|
+ However, the uptick in the center is interesting: it indicates that sites
|
|
|
|
+ that are not constitutively methylated or unmethylated have a higher variance.
|
|
|
|
+ This could be a genuine biological effect, or it could be spurious noise
|
|
|
|
+ that is only observable at sites with varying methylation.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -14100,7 +14092,7 @@ begin{landscape}
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\begin_inset Flex TODO Note (inline)
|
|
\begin_inset Flex TODO Note (inline)
|
|
@@ -14319,49 +14311,6 @@ end{landscape}
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:meanvar-basic"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows the relationship between the mean M-value and the standard deviation
|
|
|
|
- calculated for each probe in the methylation array data set.
|
|
|
|
- A few features of the data are apparent.
|
|
|
|
- First, the data are very strongly bimodal, with peaks in the density around
|
|
|
|
- M-values of +4 and -4.
|
|
|
|
- These modes correspond to methylation sites that are nearly 100% methylated
|
|
|
|
- and nearly 100% unmethylated, respectively.
|
|
|
|
- The strong bimodality indicates that a majority of probes interrogate sites
|
|
|
|
- that fall into one of these two categories.
|
|
|
|
- The points in between these modes represent sites that are either partially
|
|
|
|
- methylated in many samples, or are fully methylated in some samples and
|
|
|
|
- fully unmethylated in other samples, or some combination.
|
|
|
|
- The next visible feature of the data is the W-shaped variance trend.
|
|
|
|
- The upticks in the variance trend on either side are expected, based on
|
|
|
|
- the sigmoid transformation exaggerating small differences at extreme M-values
|
|
|
|
- (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:Sigmoid-beta-m-mapping"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-).
|
|
|
|
- However, the uptick in the center is interesting: it indicates that sites
|
|
|
|
- that are not constitutively methylated or unmethylated have a higher variance.
|
|
|
|
- This could be a genuine biological effect, or it could be spurious noise
|
|
|
|
- that is only observable at sites with varying methylation.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
In Figure
|
|
In Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
@@ -14416,42 +14365,109 @@ absorbed
|
|
Figure
|
|
Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:meanvar-sva-voomaw"
|
|
|
|
|
|
+reference "fig:meanvar-sva-voomaw"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ shows the mean-variance trend after fitting the model with the observation
|
|
|
|
+ weights assigned by voom based on the mean-variance trend shown in Figure
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:meanvar-sva-aw"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ As expected, the weights exactly counteract the trend in the data, resulting
|
|
|
|
+ in a nearly flat trend centered vertically at 1 (i.e.
|
|
|
|
+ 0 on the log scale).
|
|
|
|
+ This shows that the observations with extreme M-values have been appropriately
|
|
|
|
+ down-weighted to account for the fact that the noise in those observations
|
|
|
|
+ has been amplified by the non-linear M-value transformation.
|
|
|
|
+ In turn, this gives relatively more weight to observations in the middle
|
|
|
|
+ region, which are more likely to correspond to probes measuring interesting
|
|
|
|
+ biology (not constitutively methylated or unmethylated).
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+To determine whether any of the known experimental factors had an impact
|
|
|
|
+ on data quality, the sample quality weights estimated from the data were
|
|
|
|
+ tested for association with each of the experimental factors (Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:weight-covariate-tests"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- shows the mean-variance trend after fitting the model with the observation
|
|
|
|
- weights assigned by voom based on the mean-variance trend shown in Figure
|
|
|
|
-
|
|
|
|
|
|
+).
|
|
|
|
+ Diabetes diagnosis was found to have a potentially significant association
|
|
|
|
+ with the sample weights, with a t-test p-value of
|
|
|
|
+\begin_inset Formula $1.06\times10^{-3}$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:meanvar-sva-aw"
|
|
|
|
|
|
+reference "fig:diabetes-sample-weights"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ shows the distribution of sample weights grouped by diabetes diagnosis.
|
|
|
|
+ The samples from patients with
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+T2D
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ were assigned significantly lower weights than those from patients with
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+T1D
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
.
|
|
.
|
|
- As expected, the weights exactly counteract the trend in the data, resulting
|
|
|
|
- in a nearly flat trend centered vertically at 1 (i.e.
|
|
|
|
- 0 on the log scale).
|
|
|
|
- This shows that the observations with extreme M-values have been appropriately
|
|
|
|
- down-weighted to account for the fact that the noise in those observations
|
|
|
|
- has been amplified by the non-linear M-value transformation.
|
|
|
|
- In turn, this gives relatively more weight to observations in the middle
|
|
|
|
- region, which are more likely to correspond to probes measuring interesting
|
|
|
|
- biology (not constitutively methylated or unmethylated).
|
|
|
|
|
|
+ This indicates that the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+T2D
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ samples had an overall higher variance on average across all probes.
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float table
|
|
\begin_inset Float table
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -14675,7 +14691,7 @@ t
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\begin_inset Flex TODO Note (inline)
|
|
\begin_inset Flex TODO Note (inline)
|
|
@@ -14744,10 +14760,6 @@ literal "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -14756,77 +14768,102 @@ literal "false"
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-To determine whether any of the known experimental factors had an impact
|
|
|
|
- on data quality, the sample quality weights estimated from the data were
|
|
|
|
- tested for association with each of the experimental factors (Table
|
|
|
|
|
|
+Table
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "tab:weight-covariate-tests"
|
|
|
|
|
|
+reference "tab:methyl-num-signif"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- Diabetes diagnosis was found to have a potentially significant association
|
|
|
|
- with the sample weights, with a t-test p-value of
|
|
|
|
-\begin_inset Formula $1.06\times10^{-3}$
|
|
|
|
|
|
+ shows the number of significantly differentially methylated probes reported
|
|
|
|
+ by each analysis for each comparison of interest at an
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+FDR
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- Figure
|
|
|
|
|
|
+ of 10%.
|
|
|
|
+ As expected, the more elaborate analyses, B and C, report more significant
|
|
|
|
+ probes than the more basic analysis A, consistent with the conclusions
|
|
|
|
+ above that the data contain hidden systematic variations that must be modeled.
|
|
|
|
+ Table
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:diabetes-sample-weights"
|
|
|
|
|
|
+reference "tab:methyl-est-nonnull"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- shows the distribution of sample weights grouped by diabetes diagnosis.
|
|
|
|
- The samples from patients with
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-T2D
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ shows the estimated number differentially methylated probes for each test
|
|
|
|
+ from each analysis.
|
|
|
|
+ This was computed by estimating the proportion of null hypotheses that
|
|
|
|
+ were true using the method of
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Phipson2013Thesis"
|
|
|
|
+literal "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- were assigned significantly lower weights than those from patients with
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-T1D
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ and subtracting that fraction from the total number of probes, yielding
|
|
|
|
+ an estimate of the number of null hypotheses that are false based on the
|
|
|
|
+ distribution of p-values across the entire dataset.
|
|
|
|
+ Note that this does not identify which null hypotheses should be rejected
|
|
|
|
+ (i.e.
|
|
|
|
+ which probes are significant); it only estimates the true number of such
|
|
|
|
+ probes.
|
|
|
|
+ Once again, analyses B and C result it much larger estimates for the number
|
|
|
|
+ of differentially methylated probes.
|
|
|
|
+ In this case, analysis C, the only analysis that includes voom, estimates
|
|
|
|
+ the largest number of differentially methylated probes for all 3 contrasts.
|
|
|
|
+ If the assumptions of all the methods employed hold, then this represents
|
|
|
|
+ a gain in statistical power over the simpler analysis A.
|
|
|
|
+ Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:meth-p-value-histograms"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- This indicates that the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-T2D
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ shows the p-value distributions for each test, from which the numbers in
|
|
|
|
+ Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:methyl-est-nonnull"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- samples had an overall higher variance on average across all probes.
|
|
|
|
-
|
|
|
|
|
|
+ were generated.
|
|
|
|
+ The distributions for analysis A all have a dip in density near zero, which
|
|
|
|
+ is a strong sign of a poor model fit.
|
|
|
|
+ The histograms for analyses B and C are more well-behaved, with a uniform
|
|
|
|
+ component stretching all the way from 0 to 1 representing the probes for
|
|
|
|
+ which the null hypotheses is true (no differential methylation), and a
|
|
|
|
+ zero-biased component representing the probes for which the null hypothesis
|
|
|
|
+ is false (differentially methylated).
|
|
|
|
+ These histograms do not indicate any major issues with the model fit.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset Float table
|
|
\begin_inset Float table
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -15409,10 +15446,6 @@ AR vs.
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -15790,113 +15823,21 @@ literal "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
.
|
|
.
|
|
- the blue line is only shown in each plot if the estimate of
|
|
|
|
-\begin_inset Formula $\hat{\pi}_{0}$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- for that p-value distribution is different from 1.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:methyl-num-signif"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows the number of significantly differentially methylated probes reported
|
|
|
|
- by each analysis for each comparison of interest at an
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-FDR
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- of 10%.
|
|
|
|
- As expected, the more elaborate analyses, B and C, report more significant
|
|
|
|
- probes than the more basic analysis A, consistent with the conclusions
|
|
|
|
- above that the data contain hidden systematic variations that must be modeled.
|
|
|
|
- Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:methyl-est-nonnull"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- shows the estimated number differentially methylated probes for each test
|
|
|
|
- from each analysis.
|
|
|
|
- This was computed by estimating the proportion of null hypotheses that
|
|
|
|
- were true using the method of
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Phipson2013Thesis"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- and subtracting that fraction from the total number of probes, yielding
|
|
|
|
- an estimate of the number of null hypotheses that are false based on the
|
|
|
|
- distribution of p-values across the entire dataset.
|
|
|
|
- Note that this does not identify which null hypotheses should be rejected
|
|
|
|
- (i.e.
|
|
|
|
- which probes are significant); it only estimates the true number of such
|
|
|
|
- probes.
|
|
|
|
- Once again, analyses B and C result it much larger estimates for the number
|
|
|
|
- of differentially methylated probes.
|
|
|
|
- In this case, analysis C, the only analysis that includes voom, estimates
|
|
|
|
- the largest number of differentially methylated probes for all 3 contrasts.
|
|
|
|
- If the assumptions of all the methods employed hold, then this represents
|
|
|
|
- a gain in statistical power over the simpler analysis A.
|
|
|
|
- Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:meth-p-value-histograms"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+ the blue line is only shown in each plot if the estimate of
|
|
|
|
+\begin_inset Formula $\hat{\pi}_{0}$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ for that p-value distribution is different from 1.
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- shows the p-value distributions for each test, from which the numbers in
|
|
|
|
- Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:methyl-est-nonnull"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- were generated.
|
|
|
|
- The distributions for analysis A all have a dip in density near zero, which
|
|
|
|
- is a strong sign of a poor model fit.
|
|
|
|
- The histograms for analyses B and C are more well-behaved, with a uniform
|
|
|
|
- component stretching all the way from 0 to 1 representing the probes for
|
|
|
|
- which the null hypotheses is true (no differential methylation), and a
|
|
|
|
- zero-biased component representing the probes for which the null hypothesis
|
|
|
|
- is false (differentially methylated).
|
|
|
|
- These histograms do not indicate any major issues with the model fit.
|
|
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -17824,6 +17765,120 @@ Results
|
|
Globin blocking yields a larger and more consistent fraction of useful reads
|
|
Globin blocking yields a larger and more consistent fraction of useful reads
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+The objective of the present study was to validate a new protocol for deep
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RNA-seq
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ of whole blood drawn into PaxGene tubes from cynomolgus monkeys undergoing
|
|
|
|
+ islet transplantation, with particular focus on minimizing the loss of
|
|
|
|
+ useful sequencing space to uninformative globin reads.
|
|
|
|
+ The details of the analysis with respect to transplant outcomes and the
|
|
|
|
+ impact of mesenchymal stem cell treatment will be reported in a separate
|
|
|
|
+ manuscript (in preparation).
|
|
|
|
+ To focus on the efficacy of our
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ protocol, 37 blood samples, 16 from pre-transplant and 21 from post-transplant
|
|
|
|
+ time points, were each prepped once with and once without
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+oligo
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, and were then sequenced on an Illumina NextSeq500 instrument.
|
|
|
|
+ The number of reads aligning to each gene in the cynomolgus genome was
|
|
|
|
+ counted.
|
|
|
|
+ Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:Fractions-of-reads"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ summarizes the distribution of read fractions among the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ and non-GB libraries.
|
|
|
|
+ In the libraries with no
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, globin reads made up an average of 44.6% of total input reads, while reads
|
|
|
|
+ assigned to all other genes made up an average of 26.3%.
|
|
|
|
+ The remaining reads either aligned to intergenic regions (that include
|
|
|
|
+ long non-coding RNAs) or did not align with any annotated transcripts in
|
|
|
|
+ the current build of the cynomolgus genome.
|
|
|
|
+ In the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ libraries, globin reads made up only 3.48% and reads assigned to all other
|
|
|
|
+ genes increased to 50.4%.
|
|
|
|
+ Thus,
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ resulted in a 92.2% reduction in globin reads and a 91.6% increase in yield
|
|
|
|
+ of useful non-globin reads.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\begin_inset ERT
|
|
\begin_inset ERT
|
|
status open
|
|
status open
|
|
@@ -17852,7 +17907,7 @@ begin{landscape}
|
|
placement p
|
|
placement p
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -18479,35 +18534,43 @@ end{landscape}
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-The objective of the present study was to validate a new protocol for deep
|
|
|
|
|
|
+This reduction is not quite as efficient as the previous analysis showed
|
|
|
|
+ for human samples by DeepSAGE (<0.4% globin reads after globin reduction)
|
|
|
|
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "Mastrokolias2012"
|
|
|
|
+literal "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ Nonetheless, this degree of globin reduction is sufficient to nearly double
|
|
|
|
+ the yield of useful reads.
|
|
|
|
+ Thus,
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-RNA-seq
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- of whole blood drawn into PaxGene tubes from cynomolgus monkeys undergoing
|
|
|
|
- islet transplantation, with particular focus on minimizing the loss of
|
|
|
|
- useful sequencing space to uninformative globin reads.
|
|
|
|
- The details of the analysis with respect to transplant outcomes and the
|
|
|
|
- impact of mesenchymal stem cell treatment will be reported in a separate
|
|
|
|
- manuscript (in preparation).
|
|
|
|
- To focus on the efficacy of our
|
|
|
|
|
|
+ cuts the required sequencing effort (and costs) to achieve a target coverage
|
|
|
|
+ depth by almost 50%.
|
|
|
|
+ Consistent with this near doubling of yield, the average difference in
|
|
|
|
+ un-normalized
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+logCPM
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- protocol, 37 blood samples, 16 from pre-transplant and 21 from post-transplant
|
|
|
|
- time points, were each prepped once with and once without
|
|
|
|
|
|
+ across all genes between the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18517,20 +18580,24 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term (pl)
|
|
|
|
|
|
+ libraries and non-GB libraries is approximately 1 (mean = 1.01, median =
|
|
|
|
+ 1.08), an overall 2-fold increase.
|
|
|
|
+ Un-normalized values are used here because the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-oligo
|
|
|
|
|
|
+TMM
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, and were then sequenced on an Illumina NextSeq500 instrument.
|
|
|
|
- The number of reads aligning to each gene in the cynomolgus genome was
|
|
|
|
- counted.
|
|
|
|
- Table
|
|
|
|
|
|
+ normalization correctly identifies this 2-fold difference as biologically
|
|
|
|
+ irrelevant and removes it.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Another important aspect is that the standard deviations in Table
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
reference "tab:Fractions-of-reads"
|
|
reference "tab:Fractions-of-reads"
|
|
@@ -18540,33 +18607,7 @@ noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- summarizes the distribution of read fractions among the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- and non-GB libraries.
|
|
|
|
- In the libraries with no
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-, globin reads made up an average of 44.6% of total input reads, while reads
|
|
|
|
- assigned to all other genes made up an average of 26.3%.
|
|
|
|
- The remaining reads either aligned to intergenic regions (that include
|
|
|
|
- long non-coding RNAs) or did not align with any annotated transcripts in
|
|
|
|
- the current build of the cynomolgus genome.
|
|
|
|
- In the
|
|
|
|
|
|
+ are uniformly smaller in the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18576,9 +18617,11 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- libraries, globin reads made up only 3.48% and reads assigned to all other
|
|
|
|
- genes increased to 50.4%.
|
|
|
|
- Thus,
|
|
|
|
|
|
+ samples than the non-GB ones, indicating much greater consistency of yield.
|
|
|
|
+ This is best seen in the percentage of non-globin reads as a fraction of
|
|
|
|
+ total reads aligned to annotated genes (genic reads).
|
|
|
|
+ For the non-GB samples, this measure ranges from 10.9% to 80.9%, while for
|
|
|
|
+ the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18588,48 +18631,31 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- resulted in a 92.2% reduction in globin reads and a 91.6% increase in yield
|
|
|
|
- of useful non-globin reads.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-This reduction is not quite as efficient as the previous analysis showed
|
|
|
|
- for human samples by DeepSAGE (<0.4% globin reads after globin reduction)
|
|
|
|
-
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "Mastrokolias2012"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- Nonetheless, this degree of globin reduction is sufficient to nearly double
|
|
|
|
- the yield of useful reads.
|
|
|
|
- Thus,
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ samples it ranges from 81.9% to 99.9% (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:Fraction-of-genic-reads"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- cuts the required sequencing effort (and costs) to achieve a target coverage
|
|
|
|
- depth by almost 50%.
|
|
|
|
- Consistent with this near doubling of yield, the average difference in
|
|
|
|
- un-normalized
|
|
|
|
|
|
+).
|
|
|
|
+ This means that for applications where it is critical that each sample
|
|
|
|
+ achieve a specified minimum coverage in order to provide useful information,
|
|
|
|
+ it would be necessary to budget up to 10 times the sequencing depth per
|
|
|
|
+ sample without
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-logCPM
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- across all genes between the
|
|
|
|
|
|
+, even though the average yield improvement for
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18639,20 +18665,21 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- libraries and non-GB libraries is approximately 1 (mean = 1.01, median =
|
|
|
|
- 1.08), an overall 2-fold increase.
|
|
|
|
- Un-normalized values are used here because the
|
|
|
|
|
|
+ is only 2-fold, because every sample has a chance of being 90% globin and
|
|
|
|
+ 10% useful reads.
|
|
|
|
+ Hence, the more consistent behavior of
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-TMM
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- normalization correctly identifies this 2-fold difference as biologically
|
|
|
|
- irrelevant and removes it.
|
|
|
|
|
|
+ samples makes planning an experiment easier and more efficient because
|
|
|
|
+ it eliminates the need to over-sequence every sample in order to guard
|
|
|
|
+ against the worst case of a high-globin fraction.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -18723,18 +18750,26 @@ Fraction of genic reads in each sample aligned to non-globin genes, with
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+Globin blocking lowers the noise floor and allows detection of about 2000
|
|
|
|
+ more low-expression genes
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Another important aspect is that the standard deviations in Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:Fractions-of-reads"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Remove redundant titles from figures
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- are uniformly smaller in the
|
|
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Since
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18744,24 +18779,26 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- samples than the non-GB ones, indicating much greater consistency of yield.
|
|
|
|
- This is best seen in the percentage of non-globin reads as a fraction of
|
|
|
|
- total reads aligned to annotated genes (genic reads).
|
|
|
|
- For the non-GB samples, this measure ranges from 10.9% to 80.9%, while for
|
|
|
|
- the
|
|
|
|
|
|
+ yields more usable sequencing depth, it should also allow detection of
|
|
|
|
+ more genes at any given threshold.
|
|
|
|
+ When we looked at the distribution of average normalized
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+logCPM
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- samples it ranges from 81.9% to 99.9% (Figure
|
|
|
|
|
|
+ values across all libraries for genes with at least one read assigned to
|
|
|
|
+ them, we observed the expected bimodal distribution, with a high-abundance
|
|
|
|
+ "signal" peak representing detected genes and a low-abundance "noise" peak
|
|
|
|
+ representing genes whose read count did not rise above the noise floor
|
|
|
|
+ (Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:Fraction-of-genic-reads"
|
|
|
|
|
|
+reference "fig:logcpm-dists"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
@@ -18769,10 +18806,8 @@ noprefix "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
).
|
|
).
|
|
- This means that for applications where it is critical that each sample
|
|
|
|
- achieve a specified minimum coverage in order to provide useful information,
|
|
|
|
- it would be necessary to budget up to 10 times the sequencing depth per
|
|
|
|
- sample without
|
|
|
|
|
|
+ Consistent with the 2-fold increase in raw counts assigned to non-globin
|
|
|
|
+ genes, the signal peak for
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18782,7 +18817,10 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, even though the average yield improvement for
|
|
|
|
|
|
+ samples is shifted to the right relative to the non-GB signal peak.
|
|
|
|
+ When all the samples are normalized together, this difference is normalized
|
|
|
|
+ out, lining up the signal peaks, and this reveals that, as expected, the
|
|
|
|
+ noise floor for the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18792,9 +18830,8 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- is only 2-fold, because every sample has a chance of being 90% globin and
|
|
|
|
- 10% useful reads.
|
|
|
|
- Hence, the more consistent behavior of
|
|
|
|
|
|
+ samples is about 2-fold lower.
|
|
|
|
+ This greater separation between signal and noise peaks in the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18804,27 +18841,8 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- samples makes planning an experiment easier and more efficient because
|
|
|
|
- it eliminates the need to over-sequence every sample in order to guard
|
|
|
|
- against the worst case of a high-globin fraction.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-Globin blocking lowers the noise floor and allows detection of about 2000
|
|
|
|
- more low-expression genes
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Remove redundant titles from figures
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
|
|
+ samples means that low-expression genes should be more easily detected
|
|
|
|
+ and more precisely quantified than in the non-GB samples.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -18895,10 +18913,6 @@ Distributions of average group gene abundances when normalized separately
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -18907,7 +18921,17 @@ Distributions of average group gene abundances when normalized separately
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Since
|
|
|
|
|
|
+Based on these distributions, we selected a detection threshold of
|
|
|
|
+\begin_inset Formula $-1$
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, which is approximately the leftmost edge of the trough between the signal
|
|
|
|
+ and noise peaks.
|
|
|
|
+ This represents the most liberal possible detection threshold that doesn't
|
|
|
|
+ call substantial numbers of noise genes as detected.
|
|
|
|
+ Among the full dataset, 13429 genes were detected at this threshold, and
|
|
|
|
+ 22276 were not.
|
|
|
|
+ When considering the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18917,35 +18941,20 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- yields more usable sequencing depth, it should also allow detection of
|
|
|
|
- more genes at any given threshold.
|
|
|
|
- When we looked at the distribution of average normalized
|
|
|
|
|
|
+ libraries and non-GB libraries separately and re-computing normalization
|
|
|
|
+ factors independently within each group, 14535 genes were detected in the
|
|
|
|
+
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-logCPM
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- values across all libraries for genes with at least one read assigned to
|
|
|
|
- them, we observed the expected bimodal distribution, with a high-abundance
|
|
|
|
- "signal" peak representing detected genes and a low-abundance "noise" peak
|
|
|
|
- representing genes whose read count did not rise above the noise floor
|
|
|
|
- (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:logcpm-dists"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-).
|
|
|
|
- Consistent with the 2-fold increase in raw counts assigned to non-globin
|
|
|
|
- genes, the signal peak for
|
|
|
|
|
|
+ libraries while only 12460 were detected in the non-GB libraries.
|
|
|
|
+ Thus,
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18955,10 +18964,8 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- samples is shifted to the right relative to the non-GB signal peak.
|
|
|
|
- When all the samples are normalized together, this difference is normalized
|
|
|
|
- out, lining up the signal peaks, and this reveals that, as expected, the
|
|
|
|
- noise floor for the
|
|
|
|
|
|
+ allowed the detection of 2000 extra genes that were buried under the noise
|
|
|
|
+ floor without
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18968,8 +18975,8 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- samples is about 2-fold lower.
|
|
|
|
- This greater separation between signal and noise peaks in the
|
|
|
|
|
|
+.
|
|
|
|
+ This pattern of at least 2000 additional genes detected with
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -18979,8 +18986,18 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- samples means that low-expression genes should be more easily detected
|
|
|
|
- and more precisely quantified than in the non-GB samples.
|
|
|
|
|
|
+ was also consistent across a wide range of possible detection thresholds,
|
|
|
|
+ from -2 to 3 (see Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:Gene-detections"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -19052,8 +19069,51 @@ noprefix "false"
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+Globin blocking does not add significant additional noise or decrease sample
|
|
|
|
+ quality
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+One potential worry is that the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ protocol could perturb the levels of non-globin genes.
|
|
|
|
+ There are two kinds of possible perturbations: systematic and random.
|
|
|
|
+ The former is not a major concern for detection of differential expression,
|
|
|
|
+ since a 2-fold change in every sample has no effect on the relative fold
|
|
|
|
+ change between samples.
|
|
|
|
+ In contrast, random perturbations would increase the noise and obscure
|
|
|
|
+ the signal in the dataset, reducing the capacity to detect differential
|
|
|
|
+ expression.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
+
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
|
|
+Standardize on
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
|
|
|
|
+log2
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ notation
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -19062,40 +19122,53 @@ noprefix "false"
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Based on these distributions, we selected a detection threshold of
|
|
|
|
-\begin_inset Formula $-1$
|
|
|
|
|
|
+The data do indeed show small systematic perturbations in gene levels (Figure
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:MA-plot"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, which is approximately the leftmost edge of the trough between the signal
|
|
|
|
- and noise peaks.
|
|
|
|
- This represents the most liberal possible detection threshold that doesn't
|
|
|
|
- call substantial numbers of noise genes as detected.
|
|
|
|
- Among the full dataset, 13429 genes were detected at this threshold, and
|
|
|
|
- 22276 were not.
|
|
|
|
- When considering the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
|
|
+).
|
|
|
|
+ Other than the 3 designated alpha and beta globin genes, two other genes
|
|
|
|
+ stand out as having especially large negative
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+logFC
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+: HBD and LOC1021365.
|
|
|
|
+ HBD, delta globin, is most likely targeted by the blocking
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+oligo
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- libraries and non-GB libraries separately and re-computing normalization
|
|
|
|
- factors independently within each group, 14535 genes were detected in the
|
|
|
|
-
|
|
|
|
|
|
+ due to high sequence homology with the other globin genes.
|
|
|
|
+ LOC1021365 is the aforementioned
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+ncRNA
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- libraries while only 12460 were detected in the non-GB libraries.
|
|
|
|
- Thus,
|
|
|
|
|
|
+ that is reverse-complementary to one of the alpha-like genes and that would
|
|
|
|
+ be expected to be removed during the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -19105,19 +19178,21 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- allowed the detection of 2000 extra genes that were buried under the noise
|
|
|
|
- floor without
|
|
|
|
|
|
+ step.
|
|
|
|
+ All other genes appear in a cluster centered vertically at 0, and the vast
|
|
|
|
+ majority of genes in this cluster show an absolute
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+logFC
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- This pattern of at least 2000 additional genes detected with
|
|
|
|
|
|
+ of 0.5 or less.
|
|
|
|
+ Nevertheless, many of these small perturbations are still statistically
|
|
|
|
+ significant, indicating that the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -19127,44 +19202,18 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- was also consistent across a wide range of possible detection thresholds,
|
|
|
|
- from -2 to 3 (see Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:Gene-detections"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-).
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-Globin blocking does not add significant additional noise or decrease sample
|
|
|
|
- quality
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-One potential worry is that the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
|
|
+
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+oligo
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- protocol could perturb the levels of non-globin genes.
|
|
|
|
- There are two kinds of possible perturbations: systematic and random.
|
|
|
|
- The former is not a major concern for detection of differential expression,
|
|
|
|
- since a 2-fold change in every sample has no effect on the relative fold
|
|
|
|
- change between samples.
|
|
|
|
- In contrast, random perturbations would increase the noise and obscure
|
|
|
|
- the signal in the dataset, reducing the capacity to detect differential
|
|
|
|
- expression.
|
|
|
|
|
|
+ likely cause very small but non-zero systematic perturbations in measured
|
|
|
|
+ gene expression levels.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -19282,28 +19331,50 @@ edgeR
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Standardize on
|
|
|
|
-\begin_inset Quotes eld
|
|
|
|
|
|
+Give these numbers the LaTeX math treatment
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-log2
|
|
|
|
-\begin_inset Quotes erd
|
|
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+To evaluate the possibility of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- notation
|
|
|
|
|
|
+ causing random perturbations and reducing sample quality, we computed the
|
|
|
|
+ Pearson correlation between
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+logCPM
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ values for every pair of samples with and without
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-The data do indeed show small systematic perturbations in gene levels (Figure
|
|
|
|
-
|
|
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ and plotted them against each other (Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
-reference "fig:MA-plot"
|
|
|
|
|
|
+reference "fig:gene-abundance-correlations"
|
|
plural "false"
|
|
plural "false"
|
|
caps "false"
|
|
caps "false"
|
|
noprefix "false"
|
|
noprefix "false"
|
|
@@ -19311,41 +19382,54 @@ noprefix "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
).
|
|
).
|
|
- Other than the 3 designated alpha and beta globin genes, two other genes
|
|
|
|
- stand out as having especially large negative
|
|
|
|
-\begin_inset Flex Glossary Term (pl)
|
|
|
|
|
|
+ The plot indicated that the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-logFC
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-: HBD and LOC1021365.
|
|
|
|
- HBD, delta globin, is most likely targeted by the blocking
|
|
|
|
-\begin_inset Flex Glossary Term (pl)
|
|
|
|
|
|
+ libraries have higher sample-to-sample correlations than the non-GB libraries.
|
|
|
|
+ Parametric and nonparametric tests for differences between the correlations
|
|
|
|
+ with and without
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-oligo
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- due to high sequence homology with the other globin genes.
|
|
|
|
- LOC1021365 is the aforementioned
|
|
|
|
|
|
+ both confirmed that this difference was highly significant (2-sided paired
|
|
|
|
+ t-test: t = 37.2, df = 665, P ≪ 2.2e-16; 2-sided Wilcoxon sign-rank test:
|
|
|
|
+ V = 2195, P ≪ 2.2e-16).
|
|
|
|
+ Performing the same tests on the Spearman correlations gave the same conclusion
|
|
|
|
+ (t-test: t = 26.8, df = 665, P ≪ 2.2e-16; sign-rank test: V = 8781, P ≪ 2.2e-16).
|
|
|
|
+ The
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+edgeR
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ package was used to compute the overall
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-ncRNA
|
|
|
|
|
|
+BCV
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- that is reverse-complementary to one of the alpha-like genes and that would
|
|
|
|
- be expected to be removed during the
|
|
|
|
|
|
+ for
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -19355,42 +19439,72 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- step.
|
|
|
|
- All other genes appear in a cluster centered vertically at 0, and the vast
|
|
|
|
- majority of genes in this cluster show an absolute
|
|
|
|
|
|
+ and non-GB libraries, and found that
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-logFC
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- of 0.5 or less.
|
|
|
|
- Nevertheless, many of these small perturbations are still statistically
|
|
|
|
- significant, indicating that the
|
|
|
|
|
|
+ resulted in a negligible increase in the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+BCV
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\begin_inset Flex Glossary Term (pl)
|
|
|
|
|
|
+ (0.417 with GB vs.
|
|
|
|
+ 0.400 without).
|
|
|
|
+ The near equality of the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-oligo
|
|
|
|
|
|
+BCV
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- likely cause very small but non-zero systematic perturbations in measured
|
|
|
|
- gene expression levels.
|
|
|
|
|
|
+ for both sets indicates that the higher correlations in the GB libraries
|
|
|
|
+ are most likely a result of the increased yield of useful reads, which
|
|
|
|
+ reduces the contribution of Poisson counting uncertainty to the overall
|
|
|
|
+ variance of the
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+logCPM
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ values
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "McCarthy2012"
|
|
|
|
+literal "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
|
|
+ This improves the precision of expression measurements and more than offsets
|
|
|
|
+ the negligible increase in
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+BCV
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -19435,129 +19549,63 @@ name "fig:gene-abundance-correlations"
|
|
|
|
|
|
\series bold
|
|
\series bold
|
|
Comparison of inter-sample gene abundance correlations with and without
|
|
Comparison of inter-sample gene abundance correlations with and without
|
|
- GB.
|
|
|
|
-
|
|
|
|
-\series default
|
|
|
|
- All libraries were normalized together as described in Figure 2, and genes
|
|
|
|
- with an average logCPM less than
|
|
|
|
-\begin_inset Formula $-1$
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- were filtered out.
|
|
|
|
- Each gene’s logCPM was computed in each library using
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-edgeR
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-'s
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-cpm
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- function.
|
|
|
|
- For each pair of biological samples, the Pearson correlation between those
|
|
|
|
- samples' GB libraries was plotted against the correlation between the same
|
|
|
|
- samples’ non-GB libraries.
|
|
|
|
- Each point represents an unique pair of samples.
|
|
|
|
- The solid gray line shows a quantile-quantile plot of distribution of GB
|
|
|
|
- correlations vs.
|
|
|
|
- that of non-GB correlations.
|
|
|
|
- The thin dashed line is the identity line, provided for reference.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Give these numbers the LaTeX math treatment
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ GB.
|
|
|
|
|
|
|
|
+\series default
|
|
|
|
+ All libraries were normalized together as described in Figure 2, and genes
|
|
|
|
+ with an average logCPM less than
|
|
|
|
+\begin_inset Formula $-1$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-To evaluate the possibility of
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
|
|
+ were filtered out.
|
|
|
|
+ Each gene’s logCPM was computed in each library using
|
|
|
|
+\begin_inset Flex Code
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-GB
|
|
|
|
|
|
+edgeR
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- causing random perturbations and reducing sample quality, we computed the
|
|
|
|
- Pearson correlation between
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
|
|
+'s
|
|
|
|
+\begin_inset Flex Code
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-logCPM
|
|
|
|
|
|
+cpm
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- values for every pair of samples with and without
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
|
|
+ function.
|
|
|
|
+ For each pair of biological samples, the Pearson correlation between those
|
|
|
|
+ samples' GB libraries was plotted against the correlation between the same
|
|
|
|
+ samples’ non-GB libraries.
|
|
|
|
+ Each point represents an unique pair of samples.
|
|
|
|
+ The solid gray line shows a quantile-quantile plot of distribution of GB
|
|
|
|
+ correlations vs.
|
|
|
|
+ that of non-GB correlations.
|
|
|
|
+ The thin dashed line is the identity line, provided for reference.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- and plotted them against each other (Figure
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "fig:gene-abundance-correlations"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- The plot indicated that the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\end_inset
|
|
|
|
|
|
+\begin_layout Subsection
|
|
|
|
+More differentially expressed genes are detected with globin blocking
|
|
|
|
+\end_layout
|
|
|
|
|
|
- libraries have higher sample-to-sample correlations than the non-GB libraries.
|
|
|
|
- Parametric and nonparametric tests for differences between the correlations
|
|
|
|
- with and without
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+To compare performance on differential gene expression tests, we took subsets
|
|
|
|
+ of both the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -19567,32 +19615,35 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- both confirmed that this difference was highly significant (2-sided paired
|
|
|
|
- t-test: t = 37.2, df = 665, P ≪ 2.2e-16; 2-sided Wilcoxon sign-rank test:
|
|
|
|
- V = 2195, P ≪ 2.2e-16).
|
|
|
|
- Performing the same tests on the Spearman correlations gave the same conclusion
|
|
|
|
- (t-test: t = 26.8, df = 665, P ≪ 2.2e-16; sign-rank test: V = 8781, P ≪ 2.2e-16).
|
|
|
|
- The
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
|
|
+ and non-GB libraries with exactly one pre-transplant and one post-transplant
|
|
|
|
+ sample for each animal that had paired samples available for analysis (N=7
|
|
|
|
+ animals, N=14 samples in each subset).
|
|
|
|
+ The same test for pre- vs.
|
|
|
|
+ post-transplant differential gene expression was performed on the same
|
|
|
|
+ 7 pairs of samples from
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-edgeR
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- package was used to compute the overall
|
|
|
|
|
|
+ libraries and non-GB libraries, in each case using an
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-BCV
|
|
|
|
|
|
+FDR
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- for
|
|
|
|
|
|
+ of 10% as the threshold of significance.
|
|
|
|
+ Out of 12954 genes that passed the detection threshold in both subsets,
|
|
|
|
+ 358 were called significantly differentially expressed in the same direction
|
|
|
|
+ in both sets; 1063 were differentially expressed in the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -19602,7 +19653,8 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- and non-GB libraries, and found that
|
|
|
|
|
|
+ set only; 296 were differentially expressed in the non-GB set only; 2 genes
|
|
|
|
+ were called significantly up in the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -19612,19 +19664,20 @@ GB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- resulted in a negligible increase in the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-BCV
|
|
|
|
-\end_layout
|
|
|
|
|
|
+ set but significantly down in the non-GB set; and the remaining 11235 were
|
|
|
|
+ not called differentially expressed in either set.
|
|
|
|
+ These data are summarized in Table
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "tab:Comparison-of-significant"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- (0.417 with GB vs.
|
|
|
|
- 0.400 without).
|
|
|
|
- The near equality of the
|
|
|
|
|
|
+.
|
|
|
|
+ The differences in
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -19634,44 +19687,31 @@ BCV
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- for both sets indicates that the higher correlations in the GB libraries
|
|
|
|
- are most likely a result of the increased yield of useful reads, which
|
|
|
|
- reduces the contribution of Poisson counting uncertainty to the overall
|
|
|
|
- variance of the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
|
|
+ calculated by
|
|
|
|
+\begin_inset Flex Code
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-logCPM
|
|
|
|
|
|
+edgeR
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- values
|
|
|
|
-\begin_inset CommandInset citation
|
|
|
|
-LatexCommand cite
|
|
|
|
-key "McCarthy2012"
|
|
|
|
-literal "false"
|
|
|
|
-
|
|
|
|
|
|
+ for these subsets of samples were negligible (
|
|
|
|
+\begin_inset Formula $\textrm{BCV}=0.302$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
- This improves the precision of expression measurements and more than offsets
|
|
|
|
- the negligible increase in
|
|
|
|
|
|
+ for
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-BCV
|
|
|
|
|
|
+GB
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Subsection
|
|
|
|
-More differentially expressed genes are detected with globin blocking
|
|
|
|
|
|
+ and 0.297 for non-GB).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -20103,124 +20143,9 @@ Comparison of significantly differentially expressed genes with and without
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-To compare performance on differential gene expression tests, we took subsets
|
|
|
|
- of both the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- and non-GB libraries with exactly one pre-transplant and one post-transplant
|
|
|
|
- sample for each animal that had paired samples available for analysis (N=7
|
|
|
|
- animals, N=14 samples in each subset).
|
|
|
|
- The same test for pre- vs.
|
|
|
|
- post-transplant differential gene expression was performed on the same
|
|
|
|
- 7 pairs of samples from
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- libraries and non-GB libraries, in each case using an
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-FDR
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- of 10% as the threshold of significance.
|
|
|
|
- Out of 12954 genes that passed the detection threshold in both subsets,
|
|
|
|
- 358 were called significantly differentially expressed in the same direction
|
|
|
|
- in both sets; 1063 were differentially expressed in the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- set only; 296 were differentially expressed in the non-GB set only; 2 genes
|
|
|
|
- were called significantly up in the
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- set but significantly down in the non-GB set; and the remaining 11235 were
|
|
|
|
- not called differentially expressed in either set.
|
|
|
|
- These data are summarized in Table
|
|
|
|
-\begin_inset CommandInset ref
|
|
|
|
-LatexCommand ref
|
|
|
|
-reference "tab:Comparison-of-significant"
|
|
|
|
-plural "false"
|
|
|
|
-caps "false"
|
|
|
|
-noprefix "false"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-.
|
|
|
|
- The differences in
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-BCV
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- calculated by
|
|
|
|
-\begin_inset Flex Code
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-edgeR
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- for these subsets of samples were negligible (
|
|
|
|
-\begin_inset Formula $\textrm{BCV}=0.302$
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- for
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
|
|
|
|
- and 0.297 for non-GB).
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -20418,7 +20343,7 @@ literal "false"
|
|
The DeepSAGE method involves two different restriction enzymes that purify
|
|
The DeepSAGE method involves two different restriction enzymes that purify
|
|
and then tag small fragments of transcripts at specific locations and thus
|
|
and then tag small fragments of transcripts at specific locations and thus
|
|
significantly reduces the complexity of the transcriptome.
|
|
significantly reduces the complexity of the transcriptome.
|
|
- Therefore, we could not determine how DeepSAGE results would translate
|
|
|
|
|
|
+ Therefore, we could not assume that the DeepSAGE result would translate
|
|
to the common strategy in the field for assaying the entire transcript
|
|
to the common strategy in the field for assaying the entire transcript
|
|
population by whole-transcriptome
|
|
population by whole-transcriptome
|
|
\begin_inset Formula $3^{\prime}$
|
|
\begin_inset Formula $3^{\prime}$
|