README.mkdn 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
  1. <!-- TODO: Update this -->
  2. This is a series of example plots and tables from a combined
  3. RNA-seq/ChIP-seq study on differences between naive and memory T-cell
  4. activation. You can view the (old and messy) code for these plots
  5. [here][1].
  6. [1]: https://github.com/DarwinAwardWinner/cd4-histone-paper-code
  7. - [`p-value distributions.pdf`](p-value distributions.pdf) is a series
  8. of p-value histograms for each of the contrasts tested. A contrast
  9. with no significant differential expression would exhibit a uniform
  10. distribution, while differential expression would be reflected by an
  11. excess of small p-values.
  12. - [`FPKM by Peak Status H3K4.pdf`](FPKM by Peak Status H3K4.pdf) shows
  13. the variation in gene expression based on the presence or absence of
  14. two histone marks in the gene promoters.
  15. - [`promoter-edger-topgenes3-ql.xlsx`](promoter-edger-topgenes3-ql.xlsx)
  16. is a spreadsheet of all promoters with differential histone
  17. modification in their promoters based on the ChIP-seq read counts.
  18. - [`Promoter Peak Distance Profile.pdf`](Promoter Peak Distance Profile.pdf)
  19. shows the distribution of distances from transcription
  20. start sites to the nearest peak for the three histone modifications
  21. studied. This was used to determine the "promoter radius" for read
  22. counting. Notably, the three histone marks do not all have the same
  23. promoter radius.
  24. - [`rnaseq-MDSPlots.pdf`](rnaseq-MDSPlots.pdf) shows a series of MDS
  25. plots (similar to PCA plots) before and after correction of a known
  26. batch effect. Note the implausible zigzag-shaped progression over
  27. time before correction, compared to the more plausible cyclic time
  28. progression after.
  29. - [`rnaseq-edgeR-vs-limma.pdf`](rnaseq-edgeR-vs-limma.pdf) and
  30. [`rnaseq-limma-weighted-vs-uw.pdf`](rnaseq-limma-weighted-vs-uw.pdf)
  31. show comparisons of p-values for all genes in each contrast of the
  32. RNA-seq data, comparing edgeR and limma-voom with/without sample
  33. quality weights. The final choice of method was limma-voom with
  34. sample quality weights.
  35. - [`rnaseq-maplots-limma-sampleweights.pdf`](rnaseq-maplots-limma-sampleweights.pdf)
  36. shows the MA plot for each contrast of the RNA-seq data
  37. There are also some plots from an in-progress analysis of the same
  38. data based on sliding windows, rather than just analyzing promoter
  39. regions. You can view the code for generating these plots [here][2],
  40. and you can view some presentation slides based on this analysis
  41. [here][3].
  42. [2]: https://github.com/DarwinAwardWinner/CD4-csaw
  43. [3]: ./ChIP-Seq presentation.pdf
  44. - [`CCF-plots.pdf`](CCF-plots.pdf) shows the cross-correlation
  45. functions of the ChIP-Seq data for 3 different histone marks, at
  46. several different levels of smoothing. This plot is used to
  47. determine the fragment size. You can also observe from the periodic
  48. wave-like pattern, indicating that multiple adjacent histones tend
  49. to share the same histone modification.
  50. - [`CCF-plots-noBL.pdf`](CCF-plots-noBL.pdf) shows the same plots as
  51. above, but without removing reads in so-called "blacklist" regions
  52. that typically contain high-coverage artifact signals. The result is
  53. a much messier plot, with many samples having an artifactual peak at
  54. the read length (dotted line) rather than the actual width of a
  55. histone (solid line).
  56. - [`site-profile-plots.pdf`](site-profile-plots.pdf) shows plots of
  57. the relative coverage depth profiles around local coverage maxima in
  58. the ChIP-Seq data. This plot is used to determine the footprint size
  59. of the protein being imunoprecipitated. Since this is histone mark
  60. data, the footprint size should match the size of a nucleosome,
  61. about 147 bp.
  62. - [`D4659vsD5053_idrplots.pdf`](D4659vsD5053_idrplots.pdf) shows an
  63. example plot from
  64. the
  65. [Irreproducible Discovery Rate](https://sites.google.com/site/anshulkundaje/projects/idr) analysis
  66. used to identify biologically reproducible peaks in the ChIP-Seq
  67. data. The plot shows the degree of consistency in the scores for
  68. overlapping peaks in two biological replicates. Peaks with
  69. consistently high-ranking scores in both replicates are considered
  70. reproducible.
  71. - The following reports show QC and exploratory analysis for 3 histone
  72. marks and
  73. RNA-seq:
  74. [H3K4me3](reports/ChIP-seq/H3K4me3-exploration.html),
  75. [H3K4me2](reports/ChIP-seq/H3K4me2-exploration.html),
  76. [H3K27me3](reports/ChIP-seq/H3K27me3-exploration.html),
  77. [RNA-seq](reports/RNA-seq/salmon_hg38.analysisSet_ensembl.85-exploration.html).
  78. The purpose of these reports is to ensure that the modelling
  79. assumptions and strategies are appropriate for the data. Sometimes
  80. several strategies are tested against each other, and the best
  81. performer is chosen for the subsequent differential
  82. expression/modification analysis.
  83. - The following reports show the differential expression/modification
  84. analyses and p-value histograms for the 3 histone marks and
  85. RNA-seq:
  86. [H3K4me3](reports/ChIP-seq/H3K4me3-diffmod.html),
  87. [H3K4me2](reports/ChIP-seq/H3K4me2-diffmod.html),
  88. [H3K27me3](reports/ChIP-seq/H3K27me3-diffmod.html),
  89. [RNA-seq](reports/RNA-seq/salmon_hg38.analysisSet_ensembl.85-diffexp.html)
  90. - The RNA-seq data were processed using 10 different combinations of
  91. quantification pipeline and transcriptome
  92. reference.
  93. [`rnaseq-compare.html`](reports/RNA-seq/rnaseq-compare.html) shows a
  94. series of comparisons designed to investigate the differences
  95. between these pipelines and references.