6 лет назад · 7749458525
--- a/thesis.lyx
+++ b/thesis.lyx
@@ -4840,16 +4840,146 @@ How to bring up that these custom vectors were used in another project by
 
				 \end_layout
			
 
				 
			
 
				 \begin_layout Subsection
			
 
				-voom
			
 
				+Methylation array data can be successfully analyzed using existing techniques,
			
 
				+ but machine learning poses additional challenges
			
 
				 \end_layout
			
 
				 
			
 
				-\begin_layout Itemize
			
 
				-Methods like voom designed for RNA-seq can also help with array analysis
			
 
				-\end_layout
			
 
				+\begin_layout Standard
			
 
				+Both analysis strategies B and C both yield a reasonable analysis, with
			
 
				+ a mean-variance trend that matches the expected behavior for the non-linear
			
 
				+ M-value transformation (Figure 
			
 
				+\begin_inset CommandInset ref
			
 
				+LatexCommand ref
			
 
				+reference "fig:meanvar-sva-aw"
			
 
				+plural "false"
			
 
				+caps "false"
			
 
				+noprefix "false"
			
 
				 
			
 
				-\begin_layout Itemize
			
 
				-Extracting and modeling confounders common to many features improves model
			
 
				- correspondence to known biology
			
 
				+\end_inset
			
 
				+
			
 
				+) and well-behaved p-value distributions (Figure 
			
 
				+\begin_inset CommandInset ref
			
 
				+LatexCommand ref
			
 
				+reference "fig:meth-p-value-histograms"
			
 
				+plural "false"
			
 
				+caps "false"
			
 
				+noprefix "false"
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+).
			
 
				+ These two analyses also yield similar numbers of significant probes (Table
			
 
				+ 
			
 
				+\begin_inset CommandInset ref
			
 
				+LatexCommand ref
			
 
				+reference "tab:methyl-num-signif"
			
 
				+plural "false"
			
 
				+caps "false"
			
 
				+noprefix "false"
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+) and similar estimates of the number of differentially methylated probes
			
 
				+ (Table 
			
 
				+\begin_inset CommandInset ref
			
 
				+LatexCommand ref
			
 
				+reference "tab:methyl-est-nonnull"
			
 
				+plural "false"
			
 
				+caps "false"
			
 
				+noprefix "false"
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+).
			
 
				+ The main difference between these two analyses is the method used to account
			
 
				+ for the mean-variance trend.
			
 
				+ In analysis B, the trend is estimated and applied at the probe level: each
			
 
				+ probe's estimated variance is squeezed toward the trend using an empirical
			
 
				+ Bayes procedure (Figure 
			
 
				+\begin_inset CommandInset ref
			
 
				+LatexCommand ref
			
 
				+reference "fig:meanvar-sva-aw"
			
 
				+plural "false"
			
 
				+caps "false"
			
 
				+noprefix "false"
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+).
			
 
				+ In analysis C, the trend is still estimated at the probe level, but instead
			
 
				+ of estimating a single variance value shared across all observations for
			
 
				+ a given probe, the voom method computes an initial estiamte of the variance
			
 
				+ for each observation individually based on where its model-fitted M-value
			
 
				+ falls on the trend line and then assigns inverse-variance weights to model
			
 
				+ the difference in variance between observations.
			
 
				+ An overall variance is still estimated for each probe using the same empirical
			
 
				+ Bayes method, but now the residual trend is flat (Figure 
			
 
				+\begin_inset CommandInset ref
			
 
				+LatexCommand ref
			
 
				+reference "fig:meanvar-sva-voomaw"
			
 
				+plural "false"
			
 
				+caps "false"
			
 
				+noprefix "false"
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+), and the mean-variance trend is modeled by scaling the probe's estimated
			
 
				+ variance for each observation using the weights computed by voom.
			
 
				+ The difference between these two methods is analogous to the difference
			
 
				+ between a t-test with equal variance and a t-test with unequal variance,
			
 
				+ except that the unequal group variances used in the latter test are estimated
			
 
				+ based on the mean-variance trend from all the probes rather than the data
			
 
				+ for the specific probe being tested, thus stabilizing the group variance
			
 
				+ estimates by sharing information between probes.
			
 
				+ In practice, allowing voom to model the variance using observation weights
			
 
				+ in this manner allows the linear model fit to concentrate statistical power
			
 
				+ where it will do the most good.
			
 
				+ For example, if a particular probe's M-values are always at the extreme
			
 
				+ of the M-value range (e.g.
			
 
				+ less than -4) for ADNR samples, but the M-values for that probe in TX and
			
 
				+ CAN samples are within the flat region of the mean-variance trend (between
			
 
				+ -3 and +3), voom is able to down-weight the contribution of the high-variance
			
 
				+ M-values from the ADNR samples in order to gain more statistical power
			
 
				+ while testing for differential methylation between TX and CAN.
			
 
				+ In contrast, modeling the mean-variance trend only at the probe level would
			
 
				+ combine the high-variance ADNR samples and lower-variance samples from
			
 
				+ other conditions and estimate an intermediate variance for this probe.
			
 
				+ In practice, analysis B shows that this approach is adequate, but the voom
			
 
				+ approach in analysis C is at least as good on all model fit criteria and
			
 
				+ yields a larger estimate for the number of differentially methylated genes.
			
 
				+\end_layout
			
 
				+
			
 
				+\begin_layout Standard
			
 
				+The significant association of diebetes diagnosis with sample quality is
			
 
				+ interesting.
			
 
				+ The samples with Type 2 diabetes tended to have more variation, averaged
			
 
				+ across all probes, than those with Type 1 diabetes.
			
 
				+ This is consistent with the consensus that type 2 disbetes and the associated
			
 
				+ metabolic syndrome represent a broad dysregulation of the body's endocrine
			
 
				+ signalling related to metabolism [citation needed].
			
 
				+ This dysregulation could easily manifest as a greater degree of variation
			
 
				+ in the DNA methylation patterns of affected tissues.
			
 
				+ In contrast, Type 1 disbetes has a more specific cause and effect, so a
			
 
				+ less variable methylation signature is expected.
			
 
				+\end_layout
			
 
				+
			
 
				+\begin_layout Standard
			
 
				+This preliminary anlaysis suggests that some degree of differential methylation
			
 
				+ exists between TX and each of the three types of transplant disfunction
			
 
				+ studied.
			
 
				+ Hence, it may be feasible to train a classifier to diagnose transplant
			
 
				+ disfunction from DNA methylation array data.
			
 
				+ However, the major importance of both SVA and sample quality weighting
			
 
				+ for proper modeling of this data poses significant challenges for any attempt
			
 
				+ at a machine learning on data of similar quality.
			
 
				+ While these are easily used in a modeling context with full sample information,
			
 
				+ neither of these methods is directly applicable in a machine learning context,
			
 
				+ where the diagnosis is not known ahead of time.
			
 
				+ If a machine learning approach for methylation-based diagnosis is to be
			
 
				+ pursued, it will either require machine-learning-friendly methods to address
			
 
				+ the same systematic trends in the data that SVA and sample quality weighting
			
 
				+ address, or it will require higher quality data with substantially less
			
 
				+ systematic perturbation of the data.
			
 
				 \end_layout
			
 
				 
			
 
				 \begin_layout Chapter