The scripts below were used to evaluate the consistency of the fRMA normalization vectors by repeating the training process with 5 different random samples and then comparing a random selection of arrays normalized by all five trained vectors as well as by ordinary RMA. This folder shows the results.
There are two pairs of scripts. The first pair, train.R
and test.R
, handle the tasks of (respectively) generating/training the main fRMA vectors and ensuring that they work by normalizing all the data with them. The second pair, consistency-train.R
and consistency-evaluate.R
, handle (respectively) training five separate fRMA vector sets and testing their consistency.
train.R
: Creating the fRMA vectorsThis script reads the sample metadata tables, assembles the full file lists for BX and PAX tissues, and trains a set of fRMA vectors for each tissue. It exports each of these vector sets to an installable R package.
test.R
: Testing the fRMA vectorsThis script simply loads all the arrays and normalizes them using the appropriate fRMA vectors that were generated by train.R
. It should be run after installing the packages produced by train.R
. It is simply used for testing to make sure the fRMA vectors work.
consistency-train.R
: Train several vector sets for each tissueThis script essentially does the same thing as train.R
, only it does it five times with five different subsamplings of the arrays to generate five different fRMA vector sets and saves them all in an R data file.
consistency-evaluate.R
: Verify consistency of fRMA vectorsThis script loads the data file from consistency-train.R
, then loads 20 random arrays from each tissue and normalizes them with all five fRMA vector sets, and also by ordinary RMA. It then produces plots of M vs A for every pair of normalizations. Unlike regular MA plots, these are not plotting arrays against each other, but rather arrays against themselves, but normalized using two different methods. So if two normalizations were perfectly consistent, the MA plot would be a flat horizontal line at M=0. It also produces boxplots and violin plots showing the M distribution for each of the pairwise comparisons.