|
@@ -47,7 +47,7 @@
|
|
% This one breaks subfigs so it's disabled
|
|
% This one breaks subfigs so it's disabled
|
|
% https://tex.stackexchange.com/questions/65680/automatically-bold-first-sentence-of-a-floats-caption
|
|
% https://tex.stackexchange.com/questions/65680/automatically-bold-first-sentence-of-a-floats-caption
|
|
|
|
|
|
-\usepackage[automake,nonumberlist,nohypertypes={abbreviation}]{glossaries-extra}
|
|
|
|
|
|
+\usepackage[automake=immediate,nonumberlist,nohypertypes={abbreviation}]{glossaries-extra}
|
|
\setabbreviationstyle{long-short}
|
|
\setabbreviationstyle{long-short}
|
|
\loadglsentries{abbrevs.tex}
|
|
\loadglsentries{abbrevs.tex}
|
|
\makeglossaries
|
|
\makeglossaries
|
|
@@ -637,6 +637,32 @@ Thanks again for your help, and happy reading!
|
|
Introduction
|
|
Introduction
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+glsresetall
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Reintroduce all abbreviations
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Section
|
|
\begin_layout Section
|
|
\begin_inset CommandInset label
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
LatexCommand label
|
|
@@ -1234,21 +1260,38 @@ RNA-seq
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- reads for each annotated gene.
|
|
|
|
- In abstract terms, each dependent variable being measured is referred to
|
|
|
|
- as a feature.
|
|
|
|
- The simplest approach to analyzing such data would be to fit the same model
|
|
|
|
|
|
+ reads for each annotated gene, and there are tens of thousands of genes
|
|
|
|
+ in the human genome.
|
|
|
|
+ Since many assays measure other things than gene expression, the abstract
|
|
|
|
+ term
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+feature
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ is used to refer to each dependent variable being measured, which may include
|
|
|
|
+ any genomic element, such as genes, promoters, peaks, enhancers, exons,
|
|
|
|
+ etc.
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+The simplest approach to analyzing such data would be to fit the same model
|
|
independently to each feature.
|
|
independently to each feature.
|
|
However, this is undesirable for most genomics data sets.
|
|
However, this is undesirable for most genomics data sets.
|
|
Genomics assays like high-throughput sequencing are expensive, and often
|
|
Genomics assays like high-throughput sequencing are expensive, and often
|
|
the process of generating the samples is also quite expensive and time-consumin
|
|
the process of generating the samples is also quite expensive and time-consumin
|
|
g.
|
|
g.
|
|
This expense limits the sample sizes typically employed in genomics experiments
|
|
This expense limits the sample sizes typically employed in genomics experiments
|
|
-, and as a result the statistical power of the linear model for each individual
|
|
|
|
- feature is likewise limited.
|
|
|
|
- However, because thousands of features from the same samples are analyzed
|
|
|
|
- together, there is an opportunity to improve the statistical power of the
|
|
|
|
- analysis by exploiting shared patterns of variation across features.
|
|
|
|
|
|
+, so a typical genomic data set has far more features being measured than
|
|
|
|
+ observations (samples) per feature.
|
|
|
|
+ As a result, the statistical power of the linear model for each individual
|
|
|
|
+ feature is likewise limited by the small number of samples.
|
|
|
|
+ However, because thousands of features from the same set of samples are
|
|
|
|
+ analyzed together, there is an opportunity to improve the statistical power
|
|
|
|
+ of the analysis by exploiting shared patterns of variation across features.
|
|
This is the core feature of
|
|
This is the core feature of
|
|
\begin_inset Flex Code
|
|
\begin_inset Flex Code
|
|
status open
|
|
status open
|
|
@@ -1285,19 +1328,6 @@ RNA-seq
|
|
modeling is appropriate.
|
|
modeling is appropriate.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-Include an eBayes example figure
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
The central challenge when fitting a linear model is to estimate the variance
|
|
The central challenge when fitting a linear model is to estimate the variance
|
|
of the data accurately.
|
|
of the data accurately.
|
|
@@ -1330,7 +1360,17 @@ squeeze
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
the distribution of estimated variances toward a single common value that
|
|
the distribution of estimated variances toward a single common value that
|
|
- represents the variance of an average feature in the data
|
|
|
|
|
|
+ represents the variance of an average feature in the data (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:ebayes-example"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+)
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
key "Smyth2004"
|
|
key "Smyth2004"
|
|
@@ -1359,9 +1399,80 @@ limma
|
|
|
|
|
|
assumes that extreme variances are less common than variances close to
|
|
assumes that extreme variances are less common than variances close to
|
|
the common value.
|
|
the common value.
|
|
- The variance estimates from this empirical Bayes procedure are shown empiricall
|
|
|
|
-y to yield greater statistical power than either the individual feature
|
|
|
|
- variances or the single common value.
|
|
|
|
|
|
+ The squeezed variance estimates from this empirical Bayes procedure are
|
|
|
|
+ shown empirically to yield greater statistical power than either the individual
|
|
|
|
+ feature variances or the single common value.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/Intro/eBayes.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ width 100col%
|
|
|
|
+ groupId colfullwidth
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Example of empirical Bayes squeezing of per-gene variances.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:ebayes-example"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Example of empirical Bayes squeezing of per-gene variances.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+ A smooth trend line (red) is fitted to the individual gene variances (light
|
|
|
|
+ blue) as a function of average gene abundance (logCPM).
|
|
|
|
+ Then the individual gene variances are
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+squeezed
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ toward the trend (dark blue).
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -1614,7 +1725,6 @@ literal "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
.
|
|
.
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -1703,8 +1813,8 @@ RNA-seq
|
|
\begin_inset Formula $n$
|
|
\begin_inset Formula $n$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- is held constant, then the resulting distribution is a gamma-distributed
|
|
|
|
- mixture of Poisson distributions, which is equivalent to the
|
|
|
|
|
|
+ is held constant, then the result is a gamma-distributed mixture of Poisson
|
|
|
|
+ distributions, which is equivalent to the
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -1715,7 +1825,7 @@ NB
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
distribution.
|
|
distribution.
|
|
- The choice of a gamma distribution for the mixing weights is arbitrary,
|
|
|
|
|
|
+ The assumption of a gamma distribution for the mixing weights is arbitrary,
|
|
motivated by the convenience of the numerically tractable
|
|
motivated by the convenience of the numerically tractable
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
@@ -1726,6 +1836,10 @@ NB
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
+ distribution and the need to select
|
|
|
|
+\emph on
|
|
|
|
+some
|
|
|
|
+\emph default
|
|
distribution, since the true shape of the distribution of biological variance
|
|
distribution, since the true shape of the distribution of biological variance
|
|
is unknown.
|
|
is unknown.
|
|
\end_layout
|
|
\end_layout
|
|
@@ -2125,8 +2239,8 @@ not
|
|
\emph default
|
|
\emph default
|
|
also be identified in a second replicate.
|
|
also be identified in a second replicate.
|
|
Where the more familiar false discovery rate measures the degree of corresponde
|
|
Where the more familiar false discovery rate measures the degree of corresponde
|
|
-nce between a data-derived ranked list and the true list of significant
|
|
|
|
- features,
|
|
|
|
|
|
+nce between a data-derived ranked list and the (unknown) true list of significan
|
|
|
|
+t features,
|
|
\begin_inset Flex Glossary Term
|
|
\begin_inset Flex Glossary Term
|
|
status open
|
|
status open
|
|
|
|
|
|
@@ -2178,7 +2292,89 @@ crossover point
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
between the signal and the noise by determining how far down the list the
|
|
between the signal and the noise by determining how far down the list the
|
|
- correspondence between feature ranks breaks down.
|
|
|
|
|
|
+ rank consistency breaks down into randomness (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:Example-IDR"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/IDR/D4659vsD5053_epic-PAGE1-CROP.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ width 100col%
|
|
|
|
+ groupId colfullwidth
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Argument 1
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Example IDR consistency plot.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:Example-IDR"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Example IDR consistency plot.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+ Peak calls in two replicates are ranked from highest score (top and right)
|
|
|
|
+ to lowest score (bottom and left).
|
|
|
|
+ IDR identifies reproducible peaks, which rank highly in both replicates
|
|
|
|
+ (light blue), separating them from
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+noise
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ peak calls whose ranking is not reproducible between replicates (dark blue).
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -2428,6 +2624,32 @@ literal "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
.
|
|
.
|
|
|
|
+ The effect of such normalizations is to center the distribution of
|
|
|
|
+\begin_inset Flex Glossary Term (pl)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+logFC
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ at zero.
|
|
|
|
+ Note that if a true global difference in gene expression is present in
|
|
|
|
+ the data, this difference will be normalized out as well, since it is indisting
|
|
|
|
+uishable from composition bias.
|
|
|
|
+ In other words,
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+RNA-seq
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ cannot measure absolute gene expression, only gene expression as a fraction
|
|
|
|
+ of total reads.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -2475,8 +2697,18 @@ ChIP-seq
|
|
sample has a bimodal distribution of read counts: a low-abundance mode
|
|
sample has a bimodal distribution of read counts: a low-abundance mode
|
|
representing background regions and a high-abundance mode representing
|
|
representing background regions and a high-abundance mode representing
|
|
signal regions.
|
|
signal regions.
|
|
- This offers two potential normalization targets: equalizing background
|
|
|
|
- coverage or equalizing signal coverage.
|
|
|
|
|
|
+ This offers two mutually incompatible normalization strategies: equalizing
|
|
|
|
+ background coverage or equalizing signal coverage (Figure
|
|
|
|
+\begin_inset CommandInset ref
|
|
|
|
+LatexCommand ref
|
|
|
|
+reference "fig:chipseq-norm-example"
|
|
|
|
+plural "false"
|
|
|
|
+caps "false"
|
|
|
|
+noprefix "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
If the experiment is well controlled and ChIP efficiency is known to be
|
|
If the experiment is well controlled and ChIP efficiency is known to be
|
|
consistent across all samples, then normalizing the background coverage
|
|
consistent across all samples, then normalizing the background coverage
|
|
to be equal across all samples is a reasonable strategy.
|
|
to be equal across all samples is a reasonable strategy.
|
|
@@ -2517,9 +2749,68 @@ logFC
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- is zero across all abundance levels.
|
|
|
|
- Hence, the simpler scaling normalization based on background or signal
|
|
|
|
- regions are generally preferred whenever possible.
|
|
|
|
|
|
+ is zero across all abundance levels.
|
|
|
|
+ Hence, the simpler scaling normalization based on background or signal
|
|
|
|
+ regions are generally preferred whenever possible.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/CD4-csaw/ChIP-seq/H3K4me2-sample-MAplot-bins-CROP.png
|
|
|
|
+ lyxscale 25
|
|
|
|
+ width 100col%
|
|
|
|
+ groupId colwidth-raster
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:chipseq-norm-example"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Example MA plot of ChIP-seq read counts in 10kb bins for two arbitrary samples.
|
|
|
|
+
|
|
|
|
+\series default
|
|
|
|
+The distribution of bins is bimodal along the x axis (average abundance),
|
|
|
|
+ with the left mode representing
|
|
|
|
+\begin_inset Quotes eld
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+background
|
|
|
|
+\begin_inset Quotes erd
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ regions with no protein binding and the right mode representing bound regions.
|
|
|
|
+ The modes are also separated on the y axis (logFC), motivating two conflicting
|
|
|
|
+ normalization strategies: background normalization (red) and signal normalizati
|
|
|
|
+on (blue and green, two similar signal normalizations).
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Subsection
|
|
\begin_layout Subsection
|
|
@@ -2660,11 +2951,42 @@ Benjamini-Hochberg + pval dist
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-\begin_inset Flex TODO Note (inline)
|
|
|
|
|
|
+\begin_inset Float figure
|
|
|
|
+wide false
|
|
|
|
+sideways false
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Include figure showing uniform and non-uniform components of p-value dist
|
|
|
|
|
|
+\align center
|
|
|
|
+\begin_inset Graphics
|
|
|
|
+ filename graphics/Intro/med-pval-hist-colored-CROP.pdf
|
|
|
|
+ lyxscale 50
|
|
|
|
+ width 100col%
|
|
|
|
+ groupId colfullwidth
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset Caption Standard
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset CommandInset label
|
|
|
|
+LatexCommand label
|
|
|
|
+name "fig:Example-pval-hist"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\series bold
|
|
|
|
+Example p-value histogram.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -2739,6 +3061,16 @@ glsresetall
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Reintroduce all abbreviations
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -4122,59 +4454,6 @@ Strand cross-correlation plots for ChIP-seq data, before and after blacklisting.
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Standard
|
|
|
|
-\begin_inset Note Note
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Float figure
|
|
|
|
-wide false
|
|
|
|
-sideways false
|
|
|
|
-status collapsed
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\align center
|
|
|
|
-\begin_inset Graphics
|
|
|
|
- filename graphics/CD4-csaw/ChIP-seq/H3K4me2-sample-MAplot-bins-CROP.png
|
|
|
|
- lyxscale 25
|
|
|
|
- width 100col%
|
|
|
|
- groupId colwidth-raster
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-\begin_inset Caption Standard
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-
|
|
|
|
-\series bold
|
|
|
|
-\begin_inset CommandInset label
|
|
|
|
-LatexCommand label
|
|
|
|
-name "fig:MA-plot-bigbins"
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-MA plot of H3K4me2 read counts in 10kb bins for two arbitrary samples.
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -10633,6 +10912,16 @@ glsresetall
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Reintroduce all abbreviations
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Section
|
|
\begin_layout Section
|
|
@@ -17029,6 +17318,16 @@ glsresetall
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Reintroduce all abbreviations
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -17038,7 +17337,11 @@ status open
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
Choose between above and the paper title: Optimizing yield of deep RNA sequencin
|
|
Choose between above and the paper title: Optimizing yield of deep RNA sequencin
|
|
g for gene expression profiling by globin reduction of peripheral blood
|
|
g for gene expression profiling by globin reduction of peripheral blood
|
|
- samples from cynomolgus monkeys (Macaca fascicularis).
|
|
|
|
|
|
+ samples from cynomolgus monkeys (
|
|
|
|
+\emph on
|
|
|
|
+Macaca fascicularis
|
|
|
|
+\emph default
|
|
|
|
+).
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -19273,52 +19576,11 @@ noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-).
|
|
|
|
- This means that for applications where it is critical that each sample
|
|
|
|
- achieve a specified minimum coverage in order to provide useful information,
|
|
|
|
- it would be necessary to budget up to 10 times the sequencing depth per
|
|
|
|
- sample without
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
-, even though the average yield improvement for
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- is only 2-fold, because every sample has a chance of being 90% globin and
|
|
|
|
- 10% useful reads.
|
|
|
|
- Hence, the more consistent behavior of
|
|
|
|
-\begin_inset Flex Glossary Term
|
|
|
|
-status open
|
|
|
|
-
|
|
|
|
-\begin_layout Plain Layout
|
|
|
|
-GB
|
|
|
|
-\end_layout
|
|
|
|
-
|
|
|
|
-\end_inset
|
|
|
|
-
|
|
|
|
- samples makes planning an experiment easier and more efficient because
|
|
|
|
- it eliminates the need to over-sequence every sample in order to guard
|
|
|
|
- against the worst case of a high-globin fraction.
|
|
|
|
-\end_layout
|
|
|
|
|
|
|
|
-\begin_layout Standard
|
|
|
|
\begin_inset Float figure
|
|
\begin_inset Float figure
|
|
wide false
|
|
wide false
|
|
sideways false
|
|
sideways false
|
|
-status open
|
|
|
|
|
|
+status collapsed
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\align center
|
|
@@ -19381,6 +19643,54 @@ Fraction of genic reads in each sample aligned to non-globin genes, with
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
|
|
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Float lost issues
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+).
|
|
|
|
+ This means that for applications where it is critical that each sample
|
|
|
|
+ achieve a specified minimum coverage in order to provide useful information,
|
|
|
|
+ it would be necessary to budget up to 10 times the sequencing depth per
|
|
|
|
+ sample without
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+, even though the average yield improvement for
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ is only 2-fold, because every sample has a chance of being 90% globin and
|
|
|
|
+ 10% useful reads.
|
|
|
|
+ Hence, the more consistent behavior of
|
|
|
|
+\begin_inset Flex Glossary Term
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+GB
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ samples makes planning an experiment easier and more efficient because
|
|
|
|
+ it eliminates the need to over-sequence every sample in order to guard
|
|
|
|
+ against the worst case of a high-globin fraction.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Subsection
|
|
\begin_layout Subsection
|
|
@@ -21242,6 +21552,32 @@ status open
|
|
Future Directions
|
|
Future Directions
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+glsresetall
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Reintroduce all abbreviations
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
\begin_inset Flex TODO Note (inline)
|
|
\begin_inset Flex TODO Note (inline)
|
|
status open
|
|
status open
|
|
@@ -21265,6 +21601,32 @@ If there are any chapter-independent future directions, put them here.
|
|
Closing remarks
|
|
Closing remarks
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset ERT
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\backslash
|
|
|
|
+glsresetall
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\begin_inset Note Note
|
|
|
|
+status collapsed
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Reintroduce all abbreviations
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
\align center
|
|
\align center
|
|
\begin_inset ERT
|
|
\begin_inset ERT
|