README.md 1.1 KB

(This was written back in 2013, and I can't necessarily vouch for any of the claims within it.)

Questions

  • Overarching question: Can we accurately distinguish AR from TX?
  • Can we work well in "clinical" mode, i.e. classifying single samples?
    • How to normalize new sample with training set?
    • How to avoid recalculating classifier for each sample?
  • Can we perform well on an external validation set (GEO data)?
    • Are the same genes predictive in both datasets?
    • Can a classifier trained on our data perform well on GEO data?

Experiments

  • pam-analysis.R
    • How important is it to normalize to the training set? (RMA separate vs together)
    • Conclusion: must normalize together. Separate introduced bias toward one class or the other.
    • Question: how to do it with a single sample?
  • pam-analysis-norm.R
    • Can single-channel normalization improve classification results? Yes.
    • Try PAM with RMA and two single-channel normalizations
    • fRMA improves cross-dataset accuracy from 65% to 71%.
  • limma-analysis-norm.R
    • What is the source of the variation