6 年之前 · 40b0e7f13e
--- a/ROC-TXvsAR-external-AUC.xlsx
+++ b/ROC-TXvsAR-external-AUC.xlsx
--- a/ROC-TXvsAR-internal-AUC.xlsx
+++ b/ROC-TXvsAR-internal-AUC.xlsx
--- a/graphics/PAM/README.md
+++ b/graphics/PAM/README.md
@@ -0,0 +1,26 @@
 
															+(This was written back in 2013, and I can't necessarily vouch for any
														
 
															+of the claims within it.)
														
 
															+
														
 
															+# Questions
														
 
															+
														
 
															+* Overarching question: Can we accurately distinguish AR from TX?
														
 
															+* Can we work well in "clinical" mode, i.e. classifying single samples?
														
 
															+  * How to normalize new sample with training set?
														
 
															+  * How to avoid recalculating classifier for each sample?
														
 
															+* Can we perform well on an external validation set (GEO data)?
														
 
															+  * Are the same genes predictive in both datasets?
														
 
															+  * Can a classifier trained on our data perform well on GEO data?
														
 
															+
														
 
															+# Experiments
														
 
															+
														
 
															+* pam-analysis.R 
														
 
															+    * How important is it to normalize to the training set? (RMA separate vs together)
														
 
															+    * Conclusion: must normalize together. Separate introduced bias
														
 
															+      toward one class or the other.
														
 
															+    * Question: how to do it with a single sample?
														
 
															+* pam-analysis-norm.R
														
 
															+    * Can single-channel normalization improve classification results? Yes.
														
 
															+    * Try PAM with RMA and two single-channel normalizations
														
 
															+    * fRMA improves cross-dataset accuracy from 65% to 71%.
														
 
															+* limma-analysis-norm.R
														
 
															+    * What is the source of the variation
														
--- a/graphics/PAM/external-roc-frma.pdf
+++ b/graphics/PAM/external-roc-frma.pdf
--- a/graphics/frma-pax-bx/M-BX-violin.pdf
+++ b/graphics/frma-pax-bx/M-BX-violin.pdf
--- a/graphics/frma-pax-bx/M-PAX-violin.pdf
+++ b/graphics/frma-pax-bx/M-PAX-violin.pdf
--- a/graphics/frma-pax-bx/MA-BX-RMA.fRMA.pdf
+++ b/graphics/frma-pax-bx/MA-BX-RMA.fRMA.pdf
--- a/graphics/frma-pax-bx/MA-BX-fRMA.fRMA.pdf
+++ b/graphics/frma-pax-bx/MA-BX-fRMA.fRMA.pdf
--- a/graphics/frma-pax-bx/MA-PAX-RMA.fRMA.pdf
+++ b/graphics/frma-pax-bx/MA-PAX-RMA.fRMA.pdf
--- a/graphics/frma-pax-bx/MA-PAX-fRMA.fRMA.pdf
+++ b/graphics/frma-pax-bx/MA-PAX-fRMA.fRMA.pdf
--- a/refs.bib
+++ b/refs.bib
--- a/thesis.lyx
+++ b/thesis.lyx
@@ -890,7 +890,7 @@ The choice of pre-processing algorithms used in the analysis of an array
 
															 \end_layout
														
 
															 \begin_layout Subsection
														
 
															-Frozen RMA for clinical microarray classifiers
														
 
															+Normalization for clinical microarray classifiers must be single-channel
														
 
															 \end_layout
														
 
															 \begin_layout Subsubsection
														
@@ -941,10 +941,19 @@ exist
 
															  This would ensure that each array's normalization is independent of every
														
 
															  other array, and that arrays normalized separately can still be compared
														
 
															  to each other without bias.
														
 
															+ Such a normalization is commonly referred to as 
														
 
															+\begin_inset Quotes eld
														
 
															+\end_inset
														
 
															+
														
 
															+single-channel normalization
														
 
															+\begin_inset Quotes erd
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															 \end_layout
														
 
															 \begin_layout Subsubsection
														
 
															-Frozen RMA satisfies clinical normalization requirements
														
 
															+Several strategies are available to meet clinical normalization requirements
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
@@ -985,16 +994,33 @@ One important limitation of fRMA is that it requires a separate reference
 
															  samples on that platform 
														
 
															 \begin_inset CommandInset citation
														
 
															 LatexCommand cite
														
 
															-key "HudsonK.&RemediosC.2010"
														
 
															+key "McCall2011"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+One other option is the aptly-named Single Channel Array Normalization (SCAN),
														
 
															+ which adapts a normalization method originally designed for tiling arrays
														
 
															+ 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "Piccolo2012"
														
 
															 literal "false"
														
 
															 \end_inset
														
 
															 .
														
 
															+ SCAN is truly single-channel in that it does not require a set of normalization
														
 
															+ paramters estimated from an external set of reference samples like fRMA
														
 
															+ does.
														
 
															 \end_layout
														
 
															 \begin_layout Subsection
														
 
															-Adapting voom to model heteroskedasticity in methylation array data
														
 
															+Heteroskedasticity must be accounted for in methylation array data 
														
 
															 \end_layout
														
 
															 \begin_layout Subsubsection
														
@@ -1156,13 +1182,14 @@ Methods
 
															 \end_layout
														
 
															 \begin_layout Subsection
														
 
															-fRMA
														
 
															+Evaluation of classifier performance with different normalization methods
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
 
															-For testing RMA against fRMA, a data set of 157 hgu133plus2 arrays was used,
														
 
															- consisting of blood samples from kidney transplant patients whose grafts
														
 
															- had been graded as TX, AR, or ADNR via biopsy and histology 
														
 
															+For testing different normalizations, a data set of 157 hgu133plus2 arrays
														
 
															+ was used, consisting of blood samples from kidney transplant patients whose
														
 
															+ grafts had been graded as TX, AR, or ADNR via biopsy and histology (46
														
 
															+ TX, 69 AR, 42 ADNR) 
														
 
															 \begin_inset CommandInset citation
														
 
															 LatexCommand cite
														
 
															 key "Kurian2014"
														
@@ -1171,10 +1198,9 @@ literal "true"
 
															 \end_inset
														
 
															 .
														
 
															- These were split into a training set (23 TX, 35 AR, 21 ADNR) and a validation
														
 
															- set (23 TX, 34 AR, 21 ADNR).
														
 
															- Additionally, an external validation was gathered from public GEO data
														
 
															- (37 TX, 38 AR, no ADNR).
														
 
															+ Additionally, an external validation set of 75 samples was gathered from
														
 
															+ public GEO data (37 TX, 38 AR, no ADNR).
														
 
															+ 
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
@@ -1192,20 +1218,154 @@ Find appropriate GEO identifiers if possible.
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-Expression array normalization for detecting acute rejection
														
 
															+\begin_layout Standard
														
 
															+To evaluate the effect of each normalization on classifier performance,
														
 
															+ the same classifier training and validation procedure was used after each
														
 
															+ normalization method.
														
 
															+ The PAM package was used to train a nearest shrunken centroid classifier
														
 
															+ on the training set and select the appropriate threshold for centroid shrinking.
														
 
															+ Then the trained classifier was used to predict the class probabilities
														
 
															+ of each validation sample.
														
 
															+ From these class probabilities, ROC curves and area-under-curve (AUC) values
														
 
															+ were generated 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "Turck2011"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ Each normalization was tested on two different sets of training and validation
														
 
															+ samples.
														
 
															+ For internal validation, the 115 TX and AR arrays in the internal set were
														
 
															+ split at random into two equal sized sets, one for training and one for
														
 
															+ validation, each containing the same numbers of TX and AR samples as the
														
 
															+ other set.
														
 
															+ For external validation, the full set of 115 TX and AR samples were used
														
 
															+ as a training set, and the 75 external TX and AR samples were used as the
														
 
															+ validation set.
														
 
															+ Thus, 2 ROC curves and AUC values were generated for each normalization
														
 
															+ method: one internal and one external.
														
 
															+ Because the external validation set contains no ADNR samples, only classificati
														
 
															+on of TX and AR samples was considered.
														
 
															+ The ADNR samples were included during normalization but excluded from all
														
 
															+ classifier training and validation.
														
 
															+ This ensures that the performance on internal and external validation sets
														
 
															+ is directly comparable.
														
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-Use frozen RMA, a single-channel variant of RMA
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Flex TODO Note (inline)
														
 
															+status collapsed
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+Summarize the get.best.threshold algorithm for PAM threshold selection
														
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-Generate custom fRMA normalization vectors for each tissue (biopsy, blood)
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															 \end_layout
														
 
															-\begin_layout Subsubsection
														
 
															-Methylation arrays
														
 
															+\begin_layout Standard
														
 
															+Six different normalization strategies were evaluated.
														
 
															+ First, 2 well-known non-single-channel normalization methods were considered:
														
 
															+ RMA and dChip 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "Li2001,Irizarry2003a"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ Since RMA produces expression values on a log2 scale and dChip does not,
														
 
															+ the values from dChip were log2 transformed after normalization.
														
 
															+ Next, RMA and dChip followed by Global Rank-invariant Set Normalization
														
 
															+ (GRSN) were tested 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "Pelz2008"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ Post-processing with GRSN does not turn RMA or dChip into single-channel
														
 
															+ methods, but it may help mitigate batch effects and is therefore useful
														
 
															+ as a benchmark.
														
 
															+ Lastly, the two single-channel normalization methods, fRMA and SCAN, were
														
 
															+ tested 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "McCall2010,Piccolo2012"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ When evaluting internal validation performance, only the 157 internal samples
														
 
															+ were normalized; when evaluating external validation performance, all 157
														
 
															+ internal samples and 75 external samples were normalized together.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+For demonstrating the problem with separate normalization of training and
														
 
															+ validation data, one additional normalization was performed: the internal
														
 
															+ and external sets were each normalized separately using RMA, and the normalized
														
 
															+ data for each set were combined into a single set with no further attempts
														
 
															+ at normalizing between the two sets.
														
 
															+ The represents approximately how RMA would have to be used in a clinical
														
 
															+ setting, where the samples to be classified are not available at the time
														
 
															+ the classifier is trained.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+Generating custom fRMA vectors for hthgu133pluspm array platform
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+In order to enable fRMA normalization for the hthgu133pluspm array platform,
														
 
															+ custom fRMA normalization vectors were trained using the frmaTools package
														
 
															+ 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "McCall2011"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ Separate vectors were created for two types of samples: kidney graft biopsy
														
 
															+ samples and blood samples from graft recipients.
														
 
															+ For training, a 341 kidney biopsy samples from 2 data sets and 965 blood
														
 
															+ samples from 5 data sets were used as the reference set.
														
 
															+ Arrays were groups into batches based on unique combinations of sample
														
 
															+ type (blood or biopsy), diagnosis (TX, AR, etc.), data set, and scan date.
														
 
															+ Thus, each batch represents arrays of the same kind that were run together
														
 
															+ on the same day.
														
 
															+ For estimating the probe inverse variance weights, frmaTools requires equal-siz
														
 
															+ed batches, which means a batch size must be chosen, and then batches smaller
														
 
															+ than that size must be ignored, while batches larger than the chosen size
														
 
															+ must be downsampled.
														
 
															+ This downsampling is performed randomly, so the sampling process is repeated
														
 
															+ 5 times and the resulting normalizations are compared to each other.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+To evaluate the consistency of the generated normalization vectors, the
														
 
															+ 5 fRMA vector sets generated from 5 random batch samplings were each used
														
 
															+ to normalize the same 20 randomly selected samples from each tissue.
														
 
															+ Then the normalized expression values for each probe on each array were
														
 
															+ compared across all normalizations.
														
 
															+ Each fRMA normalization was also compared against the normalized expression
														
 
															+ values obtained by normalizing the same 20 samples with ordinary RMA.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+Modeling methylation array M-value heteroskedasticy with modified voom implement
														
 
															+ation
														
 
															 \end_layout
														
 
															 \begin_layout Itemize
														
@@ -1238,15 +1398,981 @@ Improve subsection titles in this section
 
															 \end_inset
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Subsection
														
 
															-fRMA eliminates unwanted dependence of classifier training on normalization
														
 
															- strategy caused by RMA
														
 
															-\end_layout
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+fRMA eliminates unwanted dependence of classifier training on normalization
														
 
															+ strategy caused by RMA
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsubsection
														
 
															+Separate normalization with RMA introduces unwanted biases in classification
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Float figure
														
 
															+wide false
														
 
															+sideways false
														
 
															+status collapsed
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Graphics
														
 
															+	filename graphics/PAM/predplot.pdf
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "fig:Classifier-probabilities-RMA"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\series bold
														
 
															+Classifier probabilities on validation samples when normalized with RMA
														
 
															+ together vs.
														
 
															+ separately.
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+To demonstrate the problem with non-single-channel methods, we considered
														
 
															+ the problem of training a classifier to distinguish TX from AR using the
														
 
															+ samples from the internal set as training data, evaluating performance
														
 
															+ on the external set.
														
 
															+ First, training and evaluation were performed after normalizing all array
														
 
															+ samples together as a single set using RMA, and second, the internal samples
														
 
															+ were normalized separately from the external samples and the training and
														
 
															+ evaluation were repeated.
														
 
															+ For each sample in the validation set, the classifier probabilities from
														
 
															+ both classifiers were plotted against each other (Fig.
														
 
															+ 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:Classifier-probabilities-RMA"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+).
														
 
															+ As expected, separate normalization biases the classifier probabilities,
														
 
															+ resulting in several misclassifications.
														
 
															+ In this case, the bias from separate normalization causes the classifier
														
 
															+ to assign a lower probability of AR to every sample.
														
 
															+ 
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsubsection
														
 
															+fRMA and SCAN achieve maintain classification performance while eliminating
														
 
															+ dependence on normalization strategy
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Float figure
														
 
															+wide false
														
 
															+sideways false
														
 
															+status collapsed
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Graphics
														
 
															+	filename graphics/PAM/ROC-TXvsAR-internal.pdf
														
 
															+	width 100col%
														
 
															+	groupId colwidth
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "fig:ROC-PAM-int"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ROC curves for PAM on internal validation data using different normalization
														
 
															+ strategies
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Float table
														
 
															+wide false
														
 
															+sideways false
														
 
															+status collapsed
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Tabular
														
 
															+<lyxtabular version="3" rows="7" columns="4">
														
 
															+<features tabularvalignment="middle">
														
 
															+<column alignment="center" valignment="top">
														
 
															+<column alignment="center" valignment="top">
														
 
															+<column alignment="center" valignment="top">
														
 
															+<column alignment="center" valignment="top">
														
 
															+<row>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+Normalization
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+Single-channel
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+Internal Validation AUC
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+External Validation AUC
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+<row>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+RMA
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+No
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.852
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.713
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+<row>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+dChip
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+No
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.891
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.657
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+<row>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+RMA + GRSN
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+No
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.816
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.750
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+<row>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+dChip + GRSN
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+No
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.875
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.642
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+<row>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+fRMA
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+Yes
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.863
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.718
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+<row>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+SCAN
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+Yes
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.853
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
														
 
															+\begin_inset Text
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+
														
 
															+\family roman
														
 
															+\series medium
														
 
															+\shape up
														
 
															+\size normal
														
 
															+\emph off
														
 
															+\bar no
														
 
															+\strikeout off
														
 
															+\xout off
														
 
															+\uuline off
														
 
															+\uwave off
														
 
															+\noun off
														
 
															+\color none
														
 
															+0.689
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+</cell>
														
 
															+</row>
														
 
															+</lyxtabular>
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "tab:AUC-PAM"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\series bold
														
 
															+AUC values for internal and external validation with 6 different normalization
														
 
															+ strategies.
														
 
															+
														
 
															+\series default
														
 
															+ Only fRMA and SCAN are single-channel normalizations.
														
 
															+ The other 4 normalizations are for comparison.
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+For internal validation, the 6 methods' AUC values ranged from 0.816 to 0.891,
														
 
															+ as shown in Table 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "tab:AUC-PAM"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ Among the non-single-channel normalizations, dChip outperformed RMA, while
														
 
															+ GRSN reduced the AUC values for both dChip and RMA.
														
 
															+ Both single-channel methods, fRMA and SCAN, slightly outperformed RMA,
														
 
															+ with fRMA ahead of SCAN.
														
 
															+ However, the difference between RMA and fRMA is still quite small.
														
 
															+ Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:ROC-PAM-int"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ shows that the ROC curves for RMA, dChip, and fRMA look very similar and
														
 
															+ relatively smooth, while both GRSN curves and the curve for SCAN have a
														
 
															+ more jagged appearance.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Float figure
														
 
															+wide false
														
 
															+sideways false
														
 
															+status collapsed
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Graphics
														
 
															+	filename graphics/PAM/ROC-TXvsAR-external.pdf
														
 
															+	width 100col%
														
 
															+	groupId colwidth
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "fig:ROC-PAM-ext"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ROC curve for PAM on external validation data using different normalization
														
 
															+ strategies
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+For external validation, as expected, all the AUC values are lower than
														
 
															+ the internal validations, ranging from 0.642 to 0.750 (Table 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "tab:AUC-PAM"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+).
														
 
															+ With or without GRSN, RMA shows its dominance over dChip in this more challengi
														
 
															+ng test.
														
 
															+ Unlike in the internal validation, GRSN actually improves the classifier
														
 
															+ performance for RMA, although it does not for dChip.
														
 
															+ Once again, both single-channel methods perform about on par with RMA,
														
 
															+ with fRMA performing slightly better and SCAN performing a bit worse.
														
 
															+ Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:ROC-PAM-ext"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ shows the ROC curves for the external validation test.
														
 
															+ As expected, none of them are as clean-looking as the internal validation
														
 
															+ ROC curves.
														
 
															+ The curves for RMA, RMA+GRSN, and fRMA all look similar, while the other
														
 
															+ curves look more divergent.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+fRMA with custom-generated vectors enables normalization on hthgu133pluspm
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Float figure
														
 
															+wide false
														
 
															+sideways false
														
 
															+status open
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Graphics
														
 
															+	filename graphics/frma-pax-bx/batchsize_batches.pdf
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "fig:batch-size-batches"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\series bold
														
 
															+Effect of batch size selection on number of batches included in fRMA probe
														
 
															+ weight learning.
														
 
															+ 
														
 
															+\series default
														
 
															+For batch sizes ranging from 3 to 15, the number of batches with at least
														
 
															+ that many samples was plotted for biopsy (BX) and blood (PAX) samples.
														
 
															+ The selected batch size, 5, is marked with a dotted vertical line.
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Float figure
														
 
															+wide false
														
 
															+sideways false
														
 
															+status open
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Graphics
														
 
															+	filename graphics/frma-pax-bx/batchsize_samples.pdf
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "fig:batch-size-samples"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\series bold
														
 
															+Effect of batch size selection on number of samples included in fRMA probe
														
 
															+ weight learning.
														
 
															+ 
														
 
															+\series default
														
 
															+For batch sizes ranging from 3 to 15, the number of samples included in
														
 
															+ probe weight training was plotted for biopsy (BX) and blood (PAX) samples.
														
 
															+ The selected batch size, 5, is marked with a dotted vertical line.
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+In order to enable use of fRMA to normalize hthgu133pluspm, a custom set
														
 
															+ of fRMA vectors was created.
														
 
															+ First, an appropriate batch size was chosen by looking at the number of
														
 
															+ batches and number of samples included as a function of batch size (Figures
														
 
															+ 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:batch-size-batches"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ and 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:batch-size-samples"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+, respectively).
														
 
															+ For a given batch size, all batches with fewer samples that the chosen
														
 
															+ size must be ignored during training, while larger batches must be randomly
														
 
															+ downsampled to the chosen size.
														
 
															+ Hence, the number of samples included for a given batch size equals the
														
 
															+ batch size times the number of batches with at least that many samples.
														
 
															+ From Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:batch-size-samples"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+, it is apparent that that a batch size of 8 maximizes the number of samples
														
 
															+ included in training.
														
 
															+ Increasing the batch size beyond this causes too many smaller batches to
														
 
															+ be excluded, reducing the total number of samples for both tissue types.
														
 
															+ However, a batch size of 8 is not necessarily optimal.
														
 
															+ The article introducing frmaTools concluded that it was highly advantageous
														
 
															+ to use a smaller batch size in order to include more batches, even at the
														
 
															+ expense of including fewer total samples in training 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "McCall2011"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															-\begin_layout Subsubsection
														
 
															-Separate normalization with RMA introduces unwanted biases in classification
														
 
															+.
														
 
															+ To strike an appropriate balance between more batches and more samples,
														
 
															+ a batch size of 5 was chosen.
														
 
															+ For both blood and biopsy samples, this increased the number of batches
														
 
															+ included by 10, with only a modest reduction in the number of samples compared
														
 
															+ to a batch size of 8.
														
 
															+ With a batch size of 5, 26 batches of biopsy samples and 46 batches of
														
 
															+ blood samples were available.
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
@@ -1257,7 +2383,9 @@ status collapsed
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset Graphics
														
 
															-	filename graphics/PAM/predplot.pdf
														
 
															+	filename graphics/frma-pax-bx/M-BX-violin.pdf
														
 
															+	lyxscale 30
														
 
															+	groupId m-violin
														
 
															 \end_inset
														
@@ -1270,15 +2398,19 @@ status collapsed
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset CommandInset label
														
 
															 LatexCommand label
														
 
															-name "fig:Classifier-probabilities-RMA"
														
 
															+name "fig:m-bx-violin"
														
 
															 \end_inset
														
 
															 \series bold
														
 
															-Classifier probabilities on validation samples when normalized with RMA
														
 
															- together vs.
														
 
															- separately.
														
 
															+Violin plot of log ratios between normalizations for 20 biopsy samples.
														
 
															+ 
														
 
															+\series default
														
 
															+Each of 20 randomly selected biopsy samples was normalized with RMA and
														
 
															+ with 5 different sets of fRMA vectors.
														
 
															+ This shows the distribution of log ratios between normalized expression
														
 
															+ values, aggregated across all 20 arrays.
														
 
															 \end_layout
														
 
															 \end_inset
														
@@ -1292,63 +2424,78 @@ Classifier probabilities on validation samples when normalized with RMA
 
															 \end_layout
														
 
															 \begin_layout Standard
														
 
															-The initial data set for testing fRMA consisted of 157 hgu133plus2 arrays,
														
 
															- split into a training set (23 TX, 35 AR, 21 ADNR) and a validation set
														
 
															- (23 TX, 34 AR, 21 ADNR), along with an external validation set gathered
														
 
															- from public GEO data (37 TX, 38 AR, no ADNR) 
														
 
															-\begin_inset CommandInset citation
														
 
															-LatexCommand cite
														
 
															-key "Kurian2014"
														
 
															-literal "true"
														
 
															-
														
 
															-\end_inset
														
 
															-
														
 
															-.
														
 
															- To demonstrate the problem, we considered the problem of training a classifier
														
 
															- to distinguish TX from AR using the TX and AR samples from the training
														
 
															- set and validation set as training data, evaluating performance on the
														
 
															- external validation set.
														
 
															- First, training and evaluation were performed after normalizing all array
														
 
															- samples together as a single set using RMA, and second, the internal samples
														
 
															- were normalized separately from the external samples and the training and
														
 
															- evaluation were repeated.
														
 
															- For each sample in the validation set, the classifier probabilities from
														
 
															- both classifiers were plotted against each other (Fig.
														
 
															- 
														
 
															+Since fRMA training requires equal-size batches, larger batches are downsampled
														
 
															+ randomly.
														
 
															+ This introduces a nondeterministic step in the generation of normalization
														
 
															+ vectors.
														
 
															+ To show that this randomness does not substantially change the outcome,
														
 
															+ the random downsampling and subsequent vector learning was repeated 5 times,
														
 
															+ with a different random seed each time.
														
 
															+ 20 samples were selected at random as a test set and normalized with each
														
 
															+ of the 5 sets of fRMA normalization vectors as well as ordinary RMA, and
														
 
															+ the normalized expression values were compared across normalizations.
														
 
															+ Figure 
														
 
															 \begin_inset CommandInset ref
														
 
															 LatexCommand ref
														
 
															-reference "fig:Classifier-probabilities-RMA"
														
 
															+reference "fig:m-bx-violin"
														
 
															 plural "false"
														
 
															 caps "false"
														
 
															 noprefix "false"
														
 
															 \end_inset
														
 
															-).
														
 
															- As expected, separate normalization biases the classifier probabilities,
														
 
															- resulting in several misclassifications.
														
 
															- In this case, the bias from separate normalization causes the classifier
														
 
															- to assign a lower probability of AR to every sample.
														
 
															- Because it is not feasible to normalize all samples together in a clinical
														
 
															- context, this shows that an alternative to RMA is required.
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_layout Subsubsection
														
 
															-fRMA achieves equal classification performance while eliminating dependence
														
 
															- on normalization strategy
														
 
															+ shows a summary of these comparisons for biopsy samples.
														
 
															+ Comparing RMA to each of the 5 fRMA normalizations, the distribution of
														
 
															+ log ratios is somewhat wide, indicating that the normalizations disagree
														
 
															+ on the expression values of a fair number of probe sets.
														
 
															+ In contrast, comparisons of fRMA against fRMA, the vast mojority of probe
														
 
															+ sets have very small log ratios, indicating a very high agreement between
														
 
															+ the normalized values generated by the two normalizations.
														
 
															+ This shows that the fRMA normalization's behavior is not very sensitive
														
 
															+ to the random downsampling of larger batches during training.
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
 
															-\begin_inset Flex TODO Note (inline)
														
 
															-status open
														
 
															+\begin_inset Float figure
														
 
															+wide false
														
 
															+sideways false
														
 
															+status collapsed
														
 
															 \begin_layout Plain Layout
														
 
															-Cite ROCR: bioinformatics.oxfordjournals.org/cgi/content/abstract/21/20/3940
														
 
															+\begin_inset Graphics
														
 
															+	filename graphics/frma-pax-bx/MA-BX-RMA.fRMA.pdf
														
 
															+	lyxscale 50
														
 
															+	groupId ma-frma
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															 \end_layout
														
 
															 \begin_layout Plain Layout
														
 
															-Or maybe pROC? https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-21
														
 
															-05-12-77
														
 
															+\begin_inset Caption Standard
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "fig:ma-bx-rma-frma"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															+\series bold
														
 
															+Representative MA plot comparing RMA against fRMA for 20 biopsy samples.
														
 
															+ 
														
 
															+\series default
														
 
															+Averages and log ratios were computed for every probe in each of 20 biopsy
														
 
															+ samples between RMA normalization and fRMA.
														
 
															+ Density of points is represented by darkness of shading, and individual
														
 
															+ outlier points are plotted.
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+
														
 
															 \end_layout
														
 
															 \end_inset
														
@@ -1360,11 +2507,13 @@ Or maybe pROC? https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471
 
															 \begin_inset Float figure
														
 
															 wide false
														
 
															 sideways false
														
 
															-status open
														
 
															+status collapsed
														
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset Graphics
														
 
															-	filename graphics/PAM/external-roc-frma.pdf
														
 
															+	filename graphics/frma-pax-bx/MA-BX-fRMA.fRMA.pdf
														
 
															+	lyxscale 50
														
 
															+	groupId ma-frma
														
 
															 \end_inset
														
@@ -1377,12 +2526,20 @@ status open
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset CommandInset label
														
 
															 LatexCommand label
														
 
															-name "fig:ROC-curve-PAM"
														
 
															+name "fig:ma-bx-frma-frma"
														
 
															 \end_inset
														
 
															-ROC curve for PAM on external validation data, normalizing with RMA and
														
 
															- fRMA
														
 
															+
														
 
															+\series bold
														
 
															+Representative MA plot comparing different fRMA vectors for 20 biopsy samples.
														
 
															+ 
														
 
															+\series default
														
 
															+Averages and log ratios were computed for every probe in each of 20 biopsy
														
 
															+ samples between fRMA normalizations using vectors from two different batch
														
 
															+ samplings.
														
 
															+ Density of points is represented by darkness of shading, and individual
														
 
															+ outlier points are plotted.
														
 
															 \end_layout
														
 
															 \end_inset
														
@@ -1395,45 +2552,98 @@ ROC curve for PAM on external validation data, normalizing with RMA and
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-fRMA eliminates this issue by normalizing each sample independently to the
														
 
															- same quantile distribution and summarizing probes using the same weights.
														
 
															-\end_layout
														
 
															+\begin_layout Standard
														
 
															+Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:ma-bx-rma-frma"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															-\begin_layout Itemize
														
 
															-Classifier performance on validation set is identical for 
														
 
															-\begin_inset Quotes eld
														
 
															 \end_inset
														
 
															-RMA together
														
 
															-\begin_inset Quotes erd
														
 
															+ shows an MA plot of the RMA-normalized values against the fRMA-normalized
														
 
															+ values for the same probe sets and arrays, corresponding to the first row
														
 
															+ of Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:m-bx-violin"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															 \end_inset
														
 
															- and fRMA, so switching to clinically applicable normalization does not
														
 
															- sacrifice accuracy
														
 
															-\end_layout
														
 
															+.
														
 
															+ This MA plot shows that not only is there a wide distribution of M-values,
														
 
															+ but the trend of M-values is dependent on the average normalized intensity.
														
 
															+ This is expected, since the overall trend represents the differences in
														
 
															+ the quantile normalization step.
														
 
															+ When running RMA, only the quantiles for these specific 20 arrays are used,
														
 
															+ while for fRMA the quantile distribution is taking from all arrays used
														
 
															+ in training.
														
 
															+ Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:ma-bx-frma-frma"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															-\begin_layout Standard
														
 
															-\begin_inset Flex TODO Note (inline)
														
 
															-status open
														
 
															+\end_inset
														
 
															-\begin_layout Plain Layout
														
 
															-Check the published paper for any other possibly relevant figures to include
														
 
															- here.
														
 
															-\end_layout
														
 
															+ shows a similar MA plot comparing 2 different fRMA normalizations, correspondin
														
 
															+g to the 6th row of Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:m-bx-violin"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															 \end_inset
														
 
															+.
														
 
															+ The MA plot is very tightly centered around zero with no visible trend.
														
 
															+ Figures 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:m-pax-violin"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															-\begin_layout Subsection
														
 
															-fRMA with custom-generated vectors
														
 
															-\end_layout
														
 
															+, 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:MA-PAX-rma-frma"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															-\begin_layout Itemize
														
 
															-Non-standard platform hthgu133pluspm - no pre-built fRMA vectors available,
														
 
															- so custom vectors must be learned from in-house data
														
 
															+\end_inset
														
 
															+
														
 
															+, and 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:ma-bx-frma-frma"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ show exactly the same information for the blood samples, once again comparing
														
 
															+ the normalized expression values between normalizations for all probe sets
														
 
															+ across 20 randomly selected test arrays.
														
 
															+ Once again, there is a wider distribution of log ratios between RMA-normalized
														
 
															+ values and fRMA-normalized, and a much tighter distribution when comparing
														
 
															+ different fRMA normalizations to each other, indicating that the fRMA training
														
 
															+ process is robust to random batch downsampling for the blood samples as
														
 
															+ well.
														
 
															 \end_layout
														
 
															 \begin_layout Standard
														
@@ -1444,7 +2654,9 @@ status collapsed
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset Graphics
														
 
															-	filename graphics/frma-pax-bx/batchsize_batches.pdf
														
 
															+	filename graphics/frma-pax-bx/M-PAX-violin.pdf
														
 
															+	lyxscale 30
														
 
															+	groupId m-violin
														
 
															 \end_inset
														
@@ -1457,12 +2669,19 @@ status collapsed
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset CommandInset label
														
 
															 LatexCommand label
														
 
															-name "fig:batch-size-batches"
														
 
															+name "fig:m-pax-violin"
														
 
															 \end_inset
														
 
															-Effect of batch size selection on number of batches included in fRMA probe
														
 
															- weight learning
														
 
															+
														
 
															+\series bold
														
 
															+Violin plot of log ratios between normalizations for 20 blood samples.
														
 
															+ 
														
 
															+\series default
														
 
															+Each of 20 randomly selected blood samples was normalized with RMA and with
														
 
															+ 5 different sets of fRMA vectors.
														
 
															+ This shows the distribution of log ratios between normalized expression
														
 
															+ values, aggregated across all 20 arrays.
														
 
															 \end_layout
														
 
															 \end_inset
														
@@ -1483,7 +2702,9 @@ status collapsed
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset Graphics
														
 
															-	filename graphics/frma-pax-bx/batchsize_samples.pdf
														
 
															+	filename graphics/frma-pax-bx/MA-PAX-RMA.fRMA.pdf
														
 
															+	lyxscale 50
														
 
															+	groupId ma-frma
														
 
															 \end_inset
														
@@ -1496,12 +2717,19 @@ status collapsed
 
															 \begin_layout Plain Layout
														
 
															 \begin_inset CommandInset label
														
 
															 LatexCommand label
														
 
															-name "fig:batch-size-samples"
														
 
															+name "fig:MA-PAX-rma-frma"
														
 
															 \end_inset
														
 
															-Effect of batch size selection on number of samples included in fRMA probe
														
 
															- weight learning
														
 
															+
														
 
															+\series bold
														
 
															+Representative MA plot comparing RMA against fRMA for 20 blood samples.
														
 
															+ 
														
 
															+\series default
														
 
															+Averages and log ratios were computed for every probe in each of 20 blood
														
 
															+ samples between RMA normalization and fRMA.
														
 
															+ Density of points is represented by darkness of shading, and individual
														
 
															+ outlier points are plotted.
														
 
															 \end_layout
														
 
															 \end_inset
														
@@ -1509,71 +2737,57 @@ Effect of batch size selection on number of samples included in fRMA probe
 
															 \end_layout
														
 
															-\end_inset
														
 
															-
														
 
															+\begin_layout Plain Layout
														
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-Large body of data available for training fRMA: 341 kidney graft biopsy
														
 
															- samples, 965 blood samples from graft recipients
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															-\begin_deeper
														
 
															-\begin_layout Itemize
														
 
															-But not all samples can be used (see trade-off figure)
														
 
															-\end_layout
														
 
															-\begin_layout Itemize
														
 
															-Figure showing trade-off between more samples per group and fewer groups
														
 
															- with that may samples, to justify choice of number of samples per group
														
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-pre-generated normalization vectors use ~850 samples
														
 
															-\begin_inset Flex TODO Note (Margin)
														
 
															+\begin_layout Standard
														
 
															+\begin_inset Float figure
														
 
															+wide false
														
 
															+sideways false
														
 
															 status collapsed
														
 
															 \begin_layout Plain Layout
														
 
															-Look up the exact numbers
														
 
															-\end_layout
														
 
															+\begin_inset Graphics
														
 
															+	filename graphics/frma-pax-bx/MA-PAX-fRMA.fRMA.pdf
														
 
															+	lyxscale 50
														
 
															+	groupId ma-frma
														
 
															 \end_inset
														
 
															-\begin_inset CommandInset citation
														
 
															-LatexCommand cite
														
 
															-key "McCall2010"
														
 
															-literal "false"
														
 
															+\end_layout
														
 
															-\end_inset
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset Caption Standard
														
 
															-, but are designed to be general across all tissues.
														
 
															- The samples we have are suitable for tissue-specific normalization vectors.
														
 
															-\end_layout
														
 
															+\begin_layout Plain Layout
														
 
															+\begin_inset CommandInset label
														
 
															+LatexCommand label
														
 
															+name "fig:MA-PAX-frma-frma"
														
 
															-\end_deeper
														
 
															-\begin_layout Itemize
														
 
															-Figure: MA plot, RMA vs fRMA, to show that the normalization is appreciably
														
 
															- and non-linearly different
														
 
															-\end_layout
														
 
															+\end_inset
														
 
															-\begin_layout Itemize
														
 
															-Figure MA plot, fRMA vs fRMA with different randomly-chosen sample subsets
														
 
															- to show consistency
														
 
															-\end_layout
														
 
															-\begin_layout Itemize
														
 
															-custom fRMA normalization improved cross-validated classifier performance
														
 
															+\series bold
														
 
															+Representative MA plot comparing different fRMA vectors for 20 blood samples.
														
 
															+ 
														
 
															+\series default
														
 
															+Averages and log ratios were computed for every probe in each of 20 blood
														
 
															+ samples between fRMA normalizations using vectors from two different batch
														
 
															+ samplings.
														
 
															+ Density of points is represented by darkness of shading, and individual
														
 
															+ outlier points are plotted.
														
 
															 \end_layout
														
 
															-\begin_layout Standard
														
 
															-\begin_inset Flex TODO Note (inline)
														
 
															-status open
														
 
															+\end_inset
														
 
															+
														
 
															-\begin_layout Plain Layout
														
 
															-Get a figure from Tom showing classifier performance improvement (compared
														
 
															- to all-sample RMA, I guess?), if possible
														
 
															 \end_layout
														
 
															 \end_inset
														
@@ -1617,17 +2831,110 @@ Figure and/or table showing improved p-value historgrams/number of significant
 
															 Discussion
														
 
															 \end_layout
														
 
															-\begin_layout Itemize
														
 
															-fRMA enables classifying new samples without re-normalizing the entire data
														
 
															- set
														
 
															+\begin_layout Subsection
														
 
															+fRMA achieves clinically applicable normalization without sacrificing classifica
														
 
															+tion performance
														
 
															 \end_layout
														
 
															-\begin_deeper
														
 
															-\begin_layout Itemize
														
 
															-Critical for translating a classifier into clinical practice
														
 
															+\begin_layout Standard
														
 
															+As shown in Figure 
														
 
															+\begin_inset CommandInset ref
														
 
															+LatexCommand ref
														
 
															+reference "fig:Classifier-probabilities-RMA"
														
 
															+plural "false"
														
 
															+caps "false"
														
 
															+noprefix "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+, improper normalization, particularly separate normalization of training
														
 
															+ and test samples, leads to unwanted biases in classification.
														
 
															+ In a controlled experimental context, it is always possible to correct
														
 
															+ this issue by normalizing all experimental samples together.
														
 
															+ However, because it is not feasible to normalize all samples together in
														
 
															+ a clinical context, a single-channel normalization is required is required.
														
 
															+ 
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+The major concern in using a single-channel normalization is that non-single-cha
														
 
															+nnel methods can share information between arrays to improve the normalization,
														
 
															+ and single-channel methods risk sacrificing the gains in normalization
														
 
															+ accuracy that come from this information sharing.
														
 
															+ In the case of RMA, this information sharing is accomplished through quantile
														
 
															+ normalization and median polish steps.
														
 
															+ The need for information sharing in quantile normalization can easily be
														
 
															+ removed by learning a fixed set of quantiles from external data and normalizing
														
 
															+ each array to these fixed quantiles, instead of the quantiles of the data
														
 
															+ itself.
														
 
															+ As long as the fixed quantiles are reasonable, the result will be similar
														
 
															+ to standard RMA.
														
 
															+ However, there is no analogous way to eliminate cross-array information
														
 
															+ sharing in the median polish step, so fRMA replaces this with a weighted
														
 
															+ average of probes on each array, with the weights learned from external
														
 
															+ data.
														
 
															+ This step of fRMA has the greatest potential to diverge from RMA un undesirable
														
 
															+ ways.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+However, when run on real data, fRMA performed at least as well as RMA in
														
 
															+ both the internal validation and external validation tests.
														
 
															+ This shows that fRMA can be used to normalize individual clinical samples
														
 
															+ in a class prediction context without sacrificing the classifier performance
														
 
															+ that would be obtained by using the more well-established RMA for normalization.
														
 
															+ The other single-channel normalization method considered, SCAN, showed
														
 
															+ some loss of AUC in the external validation test.
														
 
															+ Based on these results, fRMA is the preferred normalization for clinical
														
 
															+ samples in a class prediction context.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+Robust fRMA vectors can be generated for new array platforms
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Standard
														
 
															+The published fRMA normalization vectors for the hgu133plus2 platform were
														
 
															+ generated from a set of about 850 samples 
														
 
															+\begin_inset Flex TODO Note (Margin)
														
 
															+status collapsed
														
 
															+
														
 
															+\begin_layout Plain Layout
														
 
															+Look up the exact numbers
														
 
															+\end_layout
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+ chosen from a wide range of tissues, which the authors determined was sufficien
														
 
															+t to generate a robust set of normalization vectors that could be applied
														
 
															+ across all tissues 
														
 
															+\begin_inset CommandInset citation
														
 
															+LatexCommand cite
														
 
															+key "McCall2010"
														
 
															+literal "false"
														
 
															+
														
 
															+\end_inset
														
 
															+
														
 
															+.
														
 
															+ Since we only had hthgu133pluspm for 2 tissues of interest, our needs were
														
 
															+ more modest.
														
 
															+ Even using only 130 samples in 26 batches of 5 samples each for kidney
														
 
															+ biopsies, we were able to train a robust set of fRMA normalization vectors
														
 
															+ that were not meaningfully affected by the random selection of 5 samples
														
 
															+ from each batch.
														
 
															+ As expected, the training process was just as robust for the blood samples
														
 
															+ with 230 samples in 46 batches of 5 samples each.
														
 
															+ Because these vectors were each generated using training samples from a
														
 
															+ single tissue, they are not suitable for general use, unlike the vectors
														
 
															+ provided with fRMA itself.
														
 
															+ They are purpose-build for normalizing a specific type of sample on a specific
														
 
															+ platform.
														
 
															+\end_layout
														
 
															+
														
 
															+\begin_layout Subsection
														
 
															+voom
														
 
															 \end_layout
														
 
															-\end_deeper
														
 
															 \begin_layout Itemize
														
 
															 Methods like voom designed for RNA-seq can also help with array analysis
														
 
															 \end_layout
														
@@ -4031,19 +5338,9 @@ Also look at other types lymphocytes: CD8 T-cells, B-cells, NK cells
 
															 \end_deeper
														
 
															 \begin_layout Itemize
														
 
															-Investigate epigenetic regulation of lifespan extension in 
														
 
															-\emph on
														
 
															-C.
														
 
															- elegans
														
 
															-\end_layout
														
 
															-
														
 
															-\begin_deeper
														
 
															-\begin_layout Itemize
														
 
															-ChIP-seq of important transcriptional regulators to see how transcriptional
														
 
															- drift is prevented
														
 
															+Use CV or bootstrap to better evaluate classifiers
														
 
															 \end_layout
														
 
															-\end_deeper
														
 
															 \begin_layout Standard
														
 
															 \begin_inset ERT
														
 
															 status open