Explorar el Código

Finish Chapter 3 discussion

Ryan C. Thompson hace 6 años
padre
commit
7749458525
Se han modificado 1 ficheros con 137 adiciones y 7 borrados
  1. 137 7
      thesis.lyx

+ 137 - 7
thesis.lyx

@@ -4840,16 +4840,146 @@ How to bring up that these custom vectors were used in another project by
 \end_layout
 
 \begin_layout Subsection
-voom
+Methylation array data can be successfully analyzed using existing techniques,
+ but machine learning poses additional challenges
 \end_layout
 
-\begin_layout Itemize
-Methods like voom designed for RNA-seq can also help with array analysis
-\end_layout
+\begin_layout Standard
+Both analysis strategies B and C both yield a reasonable analysis, with
+ a mean-variance trend that matches the expected behavior for the non-linear
+ M-value transformation (Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:meanvar-sva-aw"
+plural "false"
+caps "false"
+noprefix "false"
 
-\begin_layout Itemize
-Extracting and modeling confounders common to many features improves model
- correspondence to known biology
+\end_inset
+
+) and well-behaved p-value distributions (Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:meth-p-value-histograms"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+).
+ These two analyses also yield similar numbers of significant probes (Table
+ 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "tab:methyl-num-signif"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+) and similar estimates of the number of differentially methylated probes
+ (Table 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "tab:methyl-est-nonnull"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+).
+ The main difference between these two analyses is the method used to account
+ for the mean-variance trend.
+ In analysis B, the trend is estimated and applied at the probe level: each
+ probe's estimated variance is squeezed toward the trend using an empirical
+ Bayes procedure (Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:meanvar-sva-aw"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+).
+ In analysis C, the trend is still estimated at the probe level, but instead
+ of estimating a single variance value shared across all observations for
+ a given probe, the voom method computes an initial estiamte of the variance
+ for each observation individually based on where its model-fitted M-value
+ falls on the trend line and then assigns inverse-variance weights to model
+ the difference in variance between observations.
+ An overall variance is still estimated for each probe using the same empirical
+ Bayes method, but now the residual trend is flat (Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:meanvar-sva-voomaw"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+), and the mean-variance trend is modeled by scaling the probe's estimated
+ variance for each observation using the weights computed by voom.
+ The difference between these two methods is analogous to the difference
+ between a t-test with equal variance and a t-test with unequal variance,
+ except that the unequal group variances used in the latter test are estimated
+ based on the mean-variance trend from all the probes rather than the data
+ for the specific probe being tested, thus stabilizing the group variance
+ estimates by sharing information between probes.
+ In practice, allowing voom to model the variance using observation weights
+ in this manner allows the linear model fit to concentrate statistical power
+ where it will do the most good.
+ For example, if a particular probe's M-values are always at the extreme
+ of the M-value range (e.g.
+ less than -4) for ADNR samples, but the M-values for that probe in TX and
+ CAN samples are within the flat region of the mean-variance trend (between
+ -3 and +3), voom is able to down-weight the contribution of the high-variance
+ M-values from the ADNR samples in order to gain more statistical power
+ while testing for differential methylation between TX and CAN.
+ In contrast, modeling the mean-variance trend only at the probe level would
+ combine the high-variance ADNR samples and lower-variance samples from
+ other conditions and estimate an intermediate variance for this probe.
+ In practice, analysis B shows that this approach is adequate, but the voom
+ approach in analysis C is at least as good on all model fit criteria and
+ yields a larger estimate for the number of differentially methylated genes.
+\end_layout
+
+\begin_layout Standard
+The significant association of diebetes diagnosis with sample quality is
+ interesting.
+ The samples with Type 2 diabetes tended to have more variation, averaged
+ across all probes, than those with Type 1 diabetes.
+ This is consistent with the consensus that type 2 disbetes and the associated
+ metabolic syndrome represent a broad dysregulation of the body's endocrine
+ signalling related to metabolism [citation needed].
+ This dysregulation could easily manifest as a greater degree of variation
+ in the DNA methylation patterns of affected tissues.
+ In contrast, Type 1 disbetes has a more specific cause and effect, so a
+ less variable methylation signature is expected.
+\end_layout
+
+\begin_layout Standard
+This preliminary anlaysis suggests that some degree of differential methylation
+ exists between TX and each of the three types of transplant disfunction
+ studied.
+ Hence, it may be feasible to train a classifier to diagnose transplant
+ disfunction from DNA methylation array data.
+ However, the major importance of both SVA and sample quality weighting
+ for proper modeling of this data poses significant challenges for any attempt
+ at a machine learning on data of similar quality.
+ While these are easily used in a modeling context with full sample information,
+ neither of these methods is directly applicable in a machine learning context,
+ where the diagnosis is not known ahead of time.
+ If a machine learning approach for methylation-based diagnosis is to be
+ pursued, it will either require machine-learning-friendly methods to address
+ the same systematic trends in the data that SVA and sample quality weighting
+ address, or it will require higher quality data with substantially less
+ systematic perturbation of the data.
 \end_layout
 
 \begin_layout Chapter