Ryan C. Thompson 330f2b34c6 Update CD4 examples 7 jaren geleden
..
reports 330f2b34c6 Update CD4 examples 7 jaren geleden
CCF-max-plot.pdf ede1d34b13 Add new reports and plots for CD4-csaw project 8 jaren geleden
CCF-plots-noBL.pdf ede1d34b13 Add new reports and plots for CD4-csaw project 8 jaren geleden
CCF-plots-relative-noBL.pdf ede1d34b13 Add new reports and plots for CD4-csaw project 8 jaren geleden
CCF-plots-relative.pdf ede1d34b13 Add new reports and plots for CD4-csaw project 8 jaren geleden
CCF-plots.pdf ede1d34b13 Add new reports and plots for CD4-csaw project 8 jaren geleden
ChIP-Seq presentation.pdf 966ff77444 Add ChIP-seq presentation slides 9 jaren geleden
D4659vsD5053_idrplots.pdf ede1d34b13 Add new reports and plots for CD4-csaw project 8 jaren geleden
FPKM by Peak Status H3K4.pdf 60d111c398 Initial import into git 9 jaren geleden
H3K4me3 Selected Sample 10KB Bin MA Plots.pdf e7049bac6a Add newer ChIP QC plots to examples 9 jaren geleden
H3K4me3-normfactors.pdf e7049bac6a Add newer ChIP QC plots to examples 9 jaren geleden
H3K4me3-window-abundance-vs-peaks.pdf e7049bac6a Add newer ChIP QC plots to examples 9 jaren geleden
Promoter Peak Distance Profile.pdf 60d111c398 Initial import into git 9 jaren geleden
README.mkdn ede1d34b13 Add new reports and plots for CD4-csaw project 8 jaren geleden
index.html 0df8e97622 Add output files so they show up on GH pages 8 jaren geleden
p-value distributions.pdf 60d111c398 Initial import into git 9 jaren geleden
promoter-edger-topgenes3-ql.xlsx 60d111c398 Initial import into git 9 jaren geleden
rnaseq-MDSPlots.pdf e7049bac6a Add newer ChIP QC plots to examples 9 jaren geleden
rnaseq-edgeR-vs-limma.pdf 60d111c398 Initial import into git 9 jaren geleden
rnaseq-limma-weighted-vs-uw.pdf 60d111c398 Initial import into git 9 jaren geleden
rnaseq-maplots-limma-sampleweights.pdf 60d111c398 Initial import into git 9 jaren geleden
site-profile-plots.pdf e7049bac6a Add newer ChIP QC plots to examples 9 jaren geleden

README.mkdn

This is a series of example plots and tables from a combined
RNA-seq/ChIP-seq study on differences between naive and memory T-cell
activation. You can view the (old and messy) code for these plots
[here][1].

[1]: https://github.com/DarwinAwardWinner/cd4-histone-paper-code

- [`p-value distributions.pdf`](p-value distributions.pdf) is a series
of p-value histograms for each of the contrasts tested. A contrast
with no significant differential expression would exhibit a uniform
distribution, while differential expression would be reflected by an
excess of small p-values.
- [`FPKM by Peak Status H3K4.pdf`](FPKM by Peak Status H3K4.pdf) shows
the variation in gene expression based on the presence or absence of
two histone marks in the gene promoters.
- [`promoter-edger-topgenes3-ql.xlsx`](promoter-edger-topgenes3-ql.xlsx)
is a spreadsheet of all promoters with differential histone
modification in their promoters based on the ChIP-seq read counts.
- [`Promoter Peak Distance Profile.pdf`](Promoter Peak Distance Profile.pdf)
shows the distribution of distances from transcription
start sites to the nearest peak for the three histone modifications
studied. This was used to determine the "promoter radius" for read
counting. Notably, the three histone marks do not all have the same
promoter radius.
- [`rnaseq-MDSPlots.pdf`](rnaseq-MDSPlots.pdf) shows a series of MDS
plots (similar to PCA plots) before and after correction of a known
batch effect. Note the implausible zigzag-shaped progression over
time before correction, compared to the more plausible cyclic time
progression after.
- [`rnaseq-edgeR-vs-limma.pdf`](rnaseq-edgeR-vs-limma.pdf) and
[`rnaseq-limma-weighted-vs-uw.pdf`](rnaseq-limma-weighted-vs-uw.pdf)
show comparisons of p-values for all genes in each contrast of the
RNA-seq data, comparing edgeR and limma-voom with/without sample
quality weights. The final choice of method was limma-voom with
sample quality weights.
- [`rnaseq-maplots-limma-sampleweights.pdf`](rnaseq-maplots-limma-sampleweights.pdf)
shows the MA plot for each contrast of the RNA-seq data

There are also some plots from an in-progress analysis of the same
data based on sliding windows, rather than just analyzing promoter
regions. You can view the code for generating these plots [here][2],
and you can view some presentation slides based on this analysis
[here][3].

[2]: https://github.com/DarwinAwardWinner/CD4-csaw
[3]: ./ChIP-Seq presentation.pdf

- [`CCF-plots.pdf`](CCF-plots.pdf) shows the cross-correlation
functions of the ChIP-Seq data for 3 different histone marks, at
several different levels of smoothing. This plot is used to
determine the fragment size. You can also observe from the periodic
wave-like pattern, indicating that multiple adjacent histones tend
to share the same histone modification.
- [`CCF-plots-noBL.pdf`](CCF-plots-noBL.pdf) shows the same plots as
above, but without removing reads in so-called "blacklist" regions
that typically contain high-coverage artifact signals. The result is
a much messier plot, with many samples having an artifactual peak at
the read length (dotted line) rather than the actual width of a
histone (solid line).
- [`site-profile-plots.pdf`](site-profile-plots.pdf) shows plots of
the relative coverage depth profiles around local coverage maxima in
the ChIP-Seq data. This plot is used to determine the footprint size
of the protein being imunoprecipitated. Since this is histone mark
data, the footprint size should match the size of a nucleosome,
about 147 bp.
- [`D4659vsD5053_idrplots.pdf`](D4659vsD5053_idrplots.pdf) shows an
example plot from
the
[Irreproducible Discovery Rate](https://sites.google.com/site/anshulkundaje/projects/idr) analysis
used to identify biologically reproducible peaks in the ChIP-Seq
data. The plot shows the degree of consistency in the scores for
overlapping peaks in two biological replicates. Peaks with
consistently high-ranking scores in both replicates are considered
reproducible.
- The following reports show QC and exploratory analysis for 3 histone
marks and
RNA-seq:
[H3K4me3](reports/ChIP-seq/H3K4me3-exploration.html),
[H3K4me2](reports/ChIP-seq/H3K4me2-exploration.html),
[H3K27me3](reports/ChIP-seq/H3K27me3-exploration.html),
[RNA-seq](reports/RNA-seq/salmon_hg38.analysisSet_ensembl.85-exploration.html).
The purpose of these reports is to ensure that the modelling
assumptions and strategies are appropriate for the data. Sometimes
several strategies are tested against each other, and the best
performer is chosen for the subsequent differential
expression/modification analysis.
- The following reports show the differential expression/modification
analyses and p-value histograms for the 3 histone marks and
RNA-seq:
[H3K4me3](reports/ChIP-seq/H3K4me3-diffmod.html),
[H3K4me2](reports/ChIP-seq/H3K4me2-diffmod.html),
[H3K27me3](reports/ChIP-seq/H3K27me3-diffmod.html),
[RNA-seq](reports/RNA-seq/salmon_hg38.analysisSet_ensembl.85-diffexp.html)
- The RNA-seq data were processed using 10 different combinations of
quantification pipeline and transcriptome
reference.
[`rnaseq-compare.html`](reports/RNA-seq/rnaseq-compare.html) shows a
series of comparisons designed to investigate the differences
between these pipelines and references.