123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101 |
- <!-- TODO: Update this -->
- This is a series of example plots and tables from a combined
- RNA-seq/ChIP-seq study on differences between naive and memory T-cell
- activation. You can view the (old and messy) code for these plots
- [here][1].
- [1]: https://github.com/DarwinAwardWinner/cd4-histone-paper-code
- - [`p-value distributions.pdf`](p-value distributions.pdf) is a series
- of p-value histograms for each of the contrasts tested. A contrast
- with no significant differential expression would exhibit a uniform
- distribution, while differential expression would be reflected by an
- excess of small p-values.
- - [`FPKM by Peak Status H3K4.pdf`](FPKM by Peak Status H3K4.pdf) shows
- the variation in gene expression based on the presence or absence of
- two histone marks in the gene promoters.
- - [`promoter-edger-topgenes3-ql.xlsx`](promoter-edger-topgenes3-ql.xlsx)
- is a spreadsheet of all promoters with differential histone
- modification in their promoters based on the ChIP-seq read counts.
- - [`Promoter Peak Distance Profile.pdf`](Promoter Peak Distance Profile.pdf)
- shows the distribution of distances from transcription
- start sites to the nearest peak for the three histone modifications
- studied. This was used to determine the "promoter radius" for read
- counting. Notably, the three histone marks do not all have the same
- promoter radius.
- - [`rnaseq-MDSPlots.pdf`](rnaseq-MDSPlots.pdf) shows a series of MDS
- plots (similar to PCA plots) before and after correction of a known
- batch effect. Note the implausible zigzag-shaped progression over
- time before correction, compared to the more plausible cyclic time
- progression after.
- - [`rnaseq-edgeR-vs-limma.pdf`](rnaseq-edgeR-vs-limma.pdf) and
- [`rnaseq-limma-weighted-vs-uw.pdf`](rnaseq-limma-weighted-vs-uw.pdf)
- show comparisons of p-values for all genes in each contrast of the
- RNA-seq data, comparing edgeR and limma-voom with/without sample
- quality weights. The final choice of method was limma-voom with
- sample quality weights.
- - [`rnaseq-maplots-limma-sampleweights.pdf`](rnaseq-maplots-limma-sampleweights.pdf)
- shows the MA plot for each contrast of the RNA-seq data
- There are also some plots from an in-progress analysis of the same
- data based on sliding windows, rather than just analyzing promoter
- regions. You can view the code for generating these plots [here][2],
- and you can view some presentation slides based on this analysis
- [here][3].
- [2]: https://github.com/DarwinAwardWinner/CD4-csaw
- [3]: ./ChIP-Seq presentation.pdf
- - [`CCF-plots.pdf`](CCF-plots.pdf) shows the cross-correlation
- functions of the ChIP-Seq data for 3 different histone marks, at
- several different levels of smoothing. This plot is used to
- determine the fragment size. You can also observe from the periodic
- wave-like pattern, indicating that multiple adjacent histones tend
- to share the same histone modification.
- - [`CCF-plots-noBL.pdf`](CCF-plots-noBL.pdf) shows the same plots as
- above, but without removing reads in so-called "blacklist" regions
- that typically contain high-coverage artifact signals. The result is
- a much messier plot, with many samples having an artifactual peak at
- the read length (dotted line) rather than the actual width of a
- histone (solid line).
- - [`site-profile-plots.pdf`](site-profile-plots.pdf) shows plots of
- the relative coverage depth profiles around local coverage maxima in
- the ChIP-Seq data. This plot is used to determine the footprint size
- of the protein being imunoprecipitated. Since this is histone mark
- data, the footprint size should match the size of a nucleosome,
- about 147 bp.
- - [`D4659vsD5053_idrplots.pdf`](D4659vsD5053_idrplots.pdf) shows an
- example plot from
- the
- [Irreproducible Discovery Rate](https://sites.google.com/site/anshulkundaje/projects/idr) analysis
- used to identify biologically reproducible peaks in the ChIP-Seq
- data. The plot shows the degree of consistency in the scores for
- overlapping peaks in two biological replicates. Peaks with
- consistently high-ranking scores in both replicates are considered
- reproducible.
- - The following reports show QC and exploratory analysis for 3 histone
- marks and
- RNA-seq:
- [H3K4me3](reports/ChIP-seq/H3K4me3-exploration.html),
- [H3K4me2](reports/ChIP-seq/H3K4me2-exploration.html),
- [H3K27me3](reports/ChIP-seq/H3K27me3-exploration.html),
- [RNA-seq](reports/RNA-seq/salmon_hg38.analysisSet_ensembl.85-exploration.html).
- The purpose of these reports is to ensure that the modelling
- assumptions and strategies are appropriate for the data. Sometimes
- several strategies are tested against each other, and the best
- performer is chosen for the subsequent differential
- expression/modification analysis.
- - The following reports show the differential expression/modification
- analyses and p-value histograms for the 3 histone marks and
- RNA-seq:
- [H3K4me3](reports/ChIP-seq/H3K4me3-diffmod.html),
- [H3K4me2](reports/ChIP-seq/H3K4me2-diffmod.html),
- [H3K27me3](reports/ChIP-seq/H3K27me3-diffmod.html),
- [RNA-seq](reports/RNA-seq/salmon_hg38.analysisSet_ensembl.85-diffexp.html)
- - The RNA-seq data were processed using 10 different combinations of
- quantification pipeline and transcriptome
- reference.
- [`rnaseq-compare.html`](reports/RNA-seq/rnaseq-compare.html) shows a
- series of comparisons designed to investigate the differences
- between these pipelines and references.
|