ソースを参照

Finish draft of Ch2 discussion

Ryan C. Thompson 5 年 前
コミット
d5c46d72e9
1 ファイル変更106 行追加8 行削除
  1. 106 8
      thesis.lyx

+ 106 - 8
thesis.lyx

@@ -6219,7 +6219,7 @@ name "fig:rulegraph"
 
 
 \series bold
-Dependency graph of steps in reproducible workflow
+Dependency graph of steps in reproducible workflow.
 \end_layout
 
 \end_inset
@@ -6253,21 +6253,119 @@ end{landscape}
 
 \end_layout
 
-\begin_layout Itemize
-Discuss advantages of developing using a reproducible workflow
+\begin_layout Standard
+The analyses described in this chapter were organized into a reproducible
+ workflow using the Snakemake workflow management system.
+ As shown in Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:rulegraph"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, the workflow includes many steps with complex dependencies between them.
+ For example, the step that counts the number of ChIP-seq reads in 500
+\begin_inset space ~
+\end_inset
+
+bp windows in each promoter (the starting point for Figures 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me2-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K4me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, and 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:H3K27me3-neighborhood"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+), named 
+\begin_inset Formula $\texttt{chipseq\_count\_tss\_neighborhoods}$
+\end_inset
+
+, depends on the RNA-seq abundance estimates in order to select the most-used
+ TSS for each gene, the aligned ChIP-seq reads, the index for those reads,
+ and the blacklist of regions to be excluded from ChIP-seq analysis.
+ Each step declares its inputs and outputs, and Snakemake uses these to
+ determine the dependencies between steps.
+ Each step is marked as depending on all the steps whose outputs match its
+ inputs, generating the workflow graph in Figure 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:rulegraph"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+, which Snakemake uses to determine order in which to execute each step
+ so that each step is executed only after all of the steps it depends on
+ have completed, thereby automating the entire workflow from start to finish.
 \end_layout
 
-\begin_deeper
-\begin_layout Itemize
-Decision-making based on trying every option and running the workflow downstream
- to see the effects
+\begin_layout Standard
+In addition to simply making it easier to organize the steps in the analysis,
+ structuring the analysis as a workflow allowed for some analysis strategies
+ that would not have been practical otherwise.
+ For example, 5 different RNA-seq quantification methods were tested against
+ two different reference transcriptome annotations for a total of 10 different
+ quantifications of the same RNA-seq data.
+ These were then compared against each other in the exploratory data analysis
+ step, to determine that the results were not very sensitive to either the
+ choice of quantification method or the choice of annotation.
+ This was possible with a single script for the exploratory data analysis,
+ because Snakemake was able to automate running this script for every combinatio
+n of method and reference.
+ In a similar manner, two different peak calling methods were tested against
+ each other, and in this case it was determined that SICER was unambiguously
+ superior to MACS for all histone marks studied.
+ By enabling these types of comparisons, structuring the analysis as an
+ automated workflow allowed important analysis decisions to be made in a
+ data-driven way, by running every reasonable option through the downstream
+ steps, seeing the consequences of choosing each option, and deciding accordingl
+y.
 \end_layout
 
-\end_deeper
 \begin_layout Subsection
 Data quality issues limit conclusions
 \end_layout
 
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+Is this needed?
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
 \begin_layout Chapter
 Improving array-based diagnostics for transplant rejection by optimizing
  data preprocessing