README.mkdn 4.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
  1. This is a series of example plots and tables from a combined
  2. RNA-seq/ChIP-seq study on differences between naive and memory T-cell
  3. activation. You can view the (old and messy) code for these plots
  4. [here][1].
  5. [1]: https://github.com/DarwinAwardWinner/cd4-histone-paper-code
  6. - [`p-value distributions.pdf`](p-value distributions.pdf) is a series
  7. of p-value histograms for each of the contrasts tested. A contrast
  8. with no significant differential expression would exhibit a uniform
  9. distribution, while differential expression would be reflected by an
  10. excess of small p-values.
  11. - [`FPKM by Peak Status H3K4.pdf`](FPKM by Peak Status H3K4.pdf) shows
  12. the variation in gene expression based on the presence or absence of
  13. two histone marks in the gene promoters.
  14. - [`promoter-edger-topgenes3-ql.xlsx`](promoter-edger-topgenes3-ql.xlsx)
  15. is a spreadsheet of all promoters with differential histone
  16. modification in their promoters based on the ChIP-seq read counts.
  17. - [`Promoter Peak Distance Profile.pdf`](Promoter Peak Distance Profile.pdf)
  18. shows the distribution of distances from transcription
  19. start sites to the nearest peak for the three histone modifications
  20. studied. This was used to determine the "promoter radius" for read
  21. counting. Notably, the three histone marks do not all have the same
  22. promoter radius.
  23. - [`rnaseq-MDSPlots.pdf`](rnaseq-MDSPlots.pdf) shows a series of MDS
  24. plots (similar to PCA plots) before and after correction of a known
  25. batch effect. Note the implausible zigzag-shaped progression over
  26. time before correction, compared to the more plausible cyclic time
  27. progression after.
  28. - [`rnaseq-edgeR-vs-limma.pdf`](rnaseq-edgeR-vs-limma.pdf) and
  29. [`rnaseq-limma-weighted-vs-uw.pdf`](rnaseq-limma-weighted-vs-uw.pdf)
  30. show comparisons of p-values for all genes in each contrast of the
  31. RNA-seq data, comparing edgeR and limma-voom with/without sample
  32. quality weights. The final choice of method was limma-voom with
  33. sample quality weights.
  34. - [`rnaseq-maplots-limma-sampleweights.pdf`](rnaseq-maplots-limma-sampleweights.pdf)
  35. shows the MA plot for each contrast of the RNA-seq data
  36. There are also some plots from an in-progress analysis of the same
  37. data based on sliding windows, rather than just analyzing promoter
  38. regions. You can view the code for generating these plots [here][2],
  39. and you can view some presentation slides based on this analysis
  40. [here][3].
  41. [2]: https://github.com/DarwinAwardWinner/CD4-csaw
  42. [3]: ./ChIP-Seq presentation.pdf
  43. - [`CCF-plots.pdf`](CCF-plots.pdf) shows the cross-correlation
  44. functions of several different histone marks, at several different
  45. levels of smoothing. This plot is used to determine the fragment
  46. size. You can also observe from the periodic wave-like pattern,
  47. indicating that multiple adjacent histones tend to share the same
  48. histone modification.
  49. - [`CCF-plots-noBL.pdf`](CCF-plots-noBL.pdf) show the same plots, but
  50. without removing reads in so-called "blacklist" regions that
  51. typically contain high-coverage artifact signals. The result is a
  52. much messier plot, with many samples having a peak at the read
  53. length (dotted line) rather than the actual width of a histone
  54. (solid line).
  55. - [`site-profile-plots.pdf`](site-profile-plots.pdf) shows plots of
  56. the relative coverage depth profiles around local coverage maxima.
  57. This plot is used to determine the footprint size of the protein
  58. being imunoprecipitated. Since this is histone mark data, the
  59. footprint size should match the size of a nucleosome, about 147 bp.
  60. - [`H3K4me3-window-abundance-vs-peaks.pdf`](H3K4me3-window-abundance-vs-peaks.pdf)
  61. shows the association between peak overlap status and abundance for
  62. all windows in the genome. As expected, windows that overlap a
  63. called peak tend to have a higher abundance than other windows.
  64. - [`H3K4me3 Selected Sample 10KB Bin MA Plots.pdf`](H3K4me3 Selected
  65. Sample 10KB Bin MA Plots.pdf) shows selected MA plots between
  66. samples demonstrating the effects of several different potential
  67. normalization methods.
  68. - [`H3K4me3-normfactors.pdf`](H3K4me3-normfactors.pdf) shows the
  69. associations between normalization factor and experimental variables
  70. for several different normalization methods.