(This was written back in 2013, and I can't necessarily vouch for any of the claims within it.) # Questions * Overarching question: Can we accurately distinguish AR from TX? * Can we work well in "clinical" mode, i.e. classifying single samples? * How to normalize new sample with training set? * How to avoid recalculating classifier for each sample? * Can we perform well on an external validation set (GEO data)? * Are the same genes predictive in both datasets? * Can a classifier trained on our data perform well on GEO data? # Experiments * pam-analysis.R * How important is it to normalize to the training set? (RMA separate vs together) * Conclusion: must normalize together. Separate introduced bias toward one class or the other. * Question: how to do it with a single sample? * pam-analysis-norm.R * Can single-channel normalization improve classification results? Yes. * Try PAM with RMA and two single-channel normalizations * fRMA improves cross-dataset accuracy from 65% to 71%. * limma-analysis-norm.R * What is the source of the variation