Ver código fonte

Add some Ch3 future directions

Ryan C. Thompson 5 anos atrás
pai
commit
2f8cc6723f
2 arquivos alterados com 237 adições e 63 exclusões
  1. 45 35
      refs.bib
  2. 192 28
      thesis.lyx

Diferenças do arquivo suprimidas por serem muito extensas
+ 45 - 35
refs.bib


+ 192 - 28
thesis.lyx

@@ -257,7 +257,7 @@ status open
 
 \begin_layout Plain Layout
 Look into auto-generated nomenclature list: https://wiki.lyx.org/Tips/Nomenclature.
- Otherwise, do a manual pass for all abbreviations.
+ Otherwise, do a manual pass for all abbreviations at the end.
  Do nomenclature/abbreviations independently for each chapter.
 \end_layout
 
@@ -283,6 +283,14 @@ we did X
 \begin_inset Quotes eld
 \end_inset
 
+I did X
+\begin_inset Quotes erd
+\end_inset
+
+ vs 
+\begin_inset Quotes eld
+\end_inset
+
 X was done
 \begin_inset Quotes erd
 \end_inset
@@ -334,6 +342,19 @@ Do not include graphs, charts, tables, or illustrations in your abstract.
 \end_inset
 
 
+\end_layout
+
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+Obviously the abstract gets written last.
+\end_layout
+
+\end_inset
+
+
 \end_layout
 
 \begin_layout Chapter
@@ -8629,13 +8650,13 @@ literal "false"
 \begin_inset Float figure
 wide false
 sideways false
-status open
+status collapsed
 
 \begin_layout Plain Layout
 \begin_inset Float figure
 wide false
 sideways false
-status collapsed
+status open
 
 \begin_layout Plain Layout
 \align center
@@ -8726,6 +8747,12 @@ Violin plot of inter-normalization log ratios for blood samples.
 \begin_inset Caption Standard
 
 \begin_layout Plain Layout
+\begin_inset CommandInset label
+LatexCommand label
+name "fig:frma-violin"
+
+\end_inset
+
 
 \series bold
 Violin plot of log ratios between normalizations for 20 biopsy samples.
@@ -11142,6 +11169,115 @@ This preliminary anlaysis suggests that some degree of differential methylation
  systematic perturbation of the data.
 \end_layout
 
+\begin_layout Section
+Future Directions
+\end_layout
+
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+Some work was already being done with the existing fRMA vectors.
+ Do I mention that here?
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Subsection
+Improving fRMA to allow training from batches of unequal size
+\end_layout
+
+\begin_layout Standard
+Because the tools for building fRMA normalization vectors require equal-size
+ batches, many samples must be discarded from the training data.
+ This is undesirable for a few reasons.
+ First, more data is simply better, all other things being equal.
+ In this case, 
+\begin_inset Quotes eld
+\end_inset
+
+better
+\begin_inset Quotes erd
+\end_inset
+
+ means a more precise estimate of normalization parameters.
+ In addition, the samples to be discarded must be chosen arbitrarily, which
+ introduces an unnecessary element of randomness into the estimation process.
+ While the randomness can be made deterministic by setting a consistent
+ random seed, the need for equal size batches also introduces a need for
+ the analyst to decide on the appropriate trade-off between batch size and
+ the number of batches.
+ This introduces an unnecessary and undesirable 
+\begin_inset Quotes eld
+\end_inset
+
+researcher degree of freedom
+\begin_inset Quotes erd
+\end_inset
+
+ into the analysis, since the generated normalization vectors now depend
+ on the choice of batch size based on vague selection criteria and instinct,
+ which can unintentionally inproduce bias if the researcher chooses a batch
+ size based on what seems to yield the most favorable downstream results
+  
+\begin_inset CommandInset citation
+LatexCommand cite
+key "Simmons2011"
+literal "false"
+
+\end_inset
+
+.
+\end_layout
+
+\begin_layout Standard
+Fortunately, the requirement for equal-size batches is not inherent to the
+ fRMA algorithm but rather a limitation of the implementation in the frmaTools
+ package.
+ In personal communication, the package's author, Matthew McCall, has indicated
+ that with some work, it should be possible to improve the implementation
+ to work with batches of unequal sizes.
+ The current implementation ignores the batch size when calculating with-batch
+ and between-batch residual variances, since the batch size constant cancels
+ out later in the calculations as long as all batches are of equal size.
+ Hence, the calculations of these parameters would need to be modified to
+ remove this optimization and properly calculate the variances using the
+ full formula.
+ Once this modification is made, a new strategy would need to be developed
+ for assessing the stability of parameter estimates, since the random subsamplin
+g step is eliminated, meaning that different subsamplings can no longer
+ be compared as in Figures 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:frma-violin"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+ and 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "fig:Representative-MA-plots"
+plural "false"
+caps "false"
+noprefix "false"
+
+\end_inset
+
+.
+ Bootstrap resampling is likely a good candidate here: sample many training
+ sets of equal size from the existing training set with replacement, estimate
+ parameters from each resampled training set, and compare the estimated
+ parameters between bootstraps in order to quantify the variability in each
+ parameter's estimation.
+\end_layout
+
 \begin_layout Chapter
 Globin-blocking for more effective blood RNA-seq analysis in primate animal
  model
@@ -13850,8 +13986,8 @@ The high correlation between coverage depth observed between H3K4me2 and
 \emph on
 same
 \emph default
- lysine residue on the histone H3 polypeptide, which makes them mutually
- exclusive with each other on a given H3 subunit.
+ lysine residue on the histone H3 polypeptide, which means that they cannot
+ both be present on the same H3 subunit.
  Thus, the high correlation between them has several potential explanations.
  One possible reason is cell population heterogeneity: perhaps some genomic
  loci are frequently marked with H3K4me2 in some cells, while in other cells
@@ -13859,22 +13995,22 @@ same
  Another possibility is allele-specific modifications: the loci are marked
  in each diploid cell with H3K4me2 on one allele and H3K4me3 on the other
  allele.
- Lastly, since each histone consists of 2 of each subunit, it is possible
- that having one H3K4me2 mark and one H3K4me3 mark on a given histone represents
- a distinct epigenetic state with a different function than either double
- H3K4me2 or double H3K4me3.
+ Lastly, since each histone octamer contains 2 H3 subunits, it is possible
+ that having one H3K4me2 mark and one H3K4me3 mark on a given histone octamer
+ represents a distinct epigenetic state with a different function than either
+ double H3K4me2 or double H3K4me3.
  
 \end_layout
 
 \begin_layout Standard
 These three hypotheses could be disentangled by single-cell ChIP-seq.
  If the correlation between these two histone marks persists even within
- the reads for each individual cell, then population heterogeneity cannot
- explain the correlation.
+ the reads for each individual cell, then cell population heterogeneity
+ cannot explain the correlation.
  Allele-specific modification can be tested for by looking at the correlation
  between read coverage of the two histone marks at heterozygous loci.
- If the correlation between loci is low, then this is consistent with allele-spe
-cific modification.
+ If the correlation between read counts for opposite loci is low, then this
+ is consistent with allele-specific modification.
  Finally if the modifications do not separate by either cell or allele,
  the colocation of these two marks is most likely occurring at the level
  of individual histones, with the heterogenously modified histone representing
@@ -13906,6 +14042,23 @@ again
  that the two marks are occurring on opposite H3 subunits of the same histones.
 \end_layout
 
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+Try to see if double ChIP-seq is actually feasible, and if not, come up
+ with some other idea for directly detecting the mixed mod state.
+ Oh! Actually ChIP-seq isn't required, only double ChIP followed by quantificati
+on.
+ That's one possible angle.
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
 \begin_layout Section*
 Ch3
 \end_layout
@@ -13922,13 +14075,24 @@ fRMAtools could be adapted to not require equal-sized groups
 Ch4
 \end_layout
 
-\begin_layout Itemize
-Look in discussion, I think there's some stuff there already
+\begin_layout Standard
+\begin_inset Flex TODO Note (inline)
+status open
+
+\begin_layout Plain Layout
+I've already done a good bit of work outside just this globin blocking thing,
+ so I'm not sure what to put for future directions.
+ Does it inculde the other stuff I've done but not published?
+\end_layout
+
+\end_inset
+
+
 \end_layout
 
 \begin_layout Standard
 \begin_inset ERT
-status open
+status collapsed
 
 \begin_layout Plain Layout
 
@@ -13947,6 +14111,18 @@ bibname}{References}
 \end_inset
 
 
+\end_layout
+
+\begin_layout Standard
+\begin_inset CommandInset bibtex
+LatexCommand bibtex
+btprint "btPrintCited"
+bibfiles "code-refs,refs-PROCESSED"
+options "bibtotoc,unsrt"
+
+\end_inset
+
+
 \end_layout
 
 \begin_layout Standard
@@ -13974,18 +14150,6 @@ Check in-text citation format.
 \end_inset
 
 
-\end_layout
-
-\begin_layout Standard
-\begin_inset CommandInset bibtex
-LatexCommand bibtex
-btprint "btPrintCited"
-bibfiles "code-refs,refs-PROCESSED"
-options "bibtotoc,unsrt"
-
-\end_inset
-
-
 \end_layout
 
 \end_body

Alguns arquivos não foram mostrados porque muitos arquivos mudaram nesse diff