Browse Source

First draft complete

Ryan C. Thompson 5 years ago
parent
commit
16732a33b8
1 changed files with 84 additions and 1 deletions
  1. 84 1
      thesis.lyx

+ 84 - 1
thesis.lyx

@@ -1124,13 +1124,96 @@ literal "false"
 \end_layout
 
 \begin_layout Subsubsection
-sva and ComBat for batch correction
+ComBat and SVA for correction of known and unknown batch effects
+\end_layout
+
+\begin_layout Standard
+In addition to well-understood effects that can be easily normalized out,
+ a data set often contains confounding biological effects that must be accounted
+ for in the modeling step.
+ For instance, in an experiment with pre-treatment and post-treatment samples
+ of cells from several different donors, donor variability represents a
+ known batch effect.
+ The most straightforward correction for known batches is to estimate the
+ mean for each batch independently and subtract out the differences, so
+ that all batches have identical means for each feature.
+ However, as with variance estimation, estimating the differences in batch
+ means is not necessarily robust at the feature level, so the ComBat method
+ adds empirical Bayes squeezing of the batch mean differences toward a common
+ value, analogous to limma's empirical Bayes squeezing of feature variance
+ estimates 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Johnson2007"
+literal "false"
+
+\end_inset
+
+.
+ Effectively, ComBat assumes that modest differences between batch means
+ are real batch effects, but extreme differences between batch means are
+ more likely to be the result of outlier observations that happen to line
+ up with the batches rather than a genuine batch effect.
+ The result is a batch correction that is more robust against outliers than
+ simple subtraction of mean differences subtraction.
+\end_layout
+
+\begin_layout Standard
+In some data sets, unknown batch effects may be present due to inherent
+ variability in in the data, either caused by technical or biological effects.
+ Examples of unknown batch effects include variations in enrichment efficiency
+ between ChIP-seq samples, variations in populations of different cell types,
+ and the effects of uncontrolled environmental factors on gene expression
+ in humans or live animals.
+ In an ordinary linear model context, unknown batch effects cannot be inferred
+ and must be treated as random noise.
+ However, in high-throughput experiments, once again information can be
+ shared across features to identify patterns of un-modeled variation that
+ are repeated in many features.
+ One attractive strategy would be to perform singular value decomposition
+ (SVD) on the matrix os linear model residuals (which contain all the un-modeled
+ variation in the data) and take the first few singular vectors as batch
+ effects.
+ While this can be effective, it makes the unreasonable assumption that
+ all batch effects are uncorrelated with any of the effects being modeled.
+ Surrogate variable analysis (SVA) starts with this approach, but takes
+ some additional steps to identify batch effects in the full data that are
+ both highly correlated with the singular vectors in the residuals and least
+ correlated with the effects of interest 
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Leek2007"
+literal "false"
+
+\end_inset
+
+.
+ Since the final batch effects are estimated from the full data, moderate
+ correlations between the batch effects and effects of interest are allowed,
+ which gives SVA much more freedom to estimate the true extent of the batch
+ effects compared to simple residual SVD.
+ Once the surrogate variables are estimated, they can be included as coefficient
+s in the linear model in a similar fashion to known batch effects in order
+ to subtract out their effects on each feature's abundance.
 \end_layout
 
 \begin_layout Subsubsection
 Factor analysis: PCA, MDS, MOFA
 \end_layout
 
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+Not sure if this merits a subsection here.
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
 \begin_layout Itemize
 Batch-corrected PCA is informative, but careful application is required
  to avoid bias