6 年之前 · 34f9118a1f
--- a/thesis.lyx
+++ b/thesis.lyx
@@ -3038,7 +3038,7 @@ s in the linear model in a similar fashion to known batch effects in order
 
				 \end_layout
			
 
				 
			
 
				 \begin_layout Subsection
			
 
				-Benjamini-Hochberg + pval dist
			
 
				+Interpreting p-value distributions and estimating false discovery rates
			
 
				 \end_layout
			
 
				 
			
 
				 \begin_layout Standard
			
@@ -3078,7 +3078,17 @@ significant
 
				 When only a fraction of null hypotheses are true, the p-value distribution
			
 
				  will be a mixture of a uniform component representing the null hypotheses
			
 
				  that are true and a non-uniform component representing the null hypotheses
			
 
				- that are not true.
			
 
				+ that are not true (Figure 
			
 
				+\begin_inset CommandInset ref
			
 
				+LatexCommand ref
			
 
				+reference "fig:Example-pval-hist"
			
 
				+plural "false"
			
 
				+caps "false"
			
 
				+noprefix "false"
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+).
			
 
				  The fraction belonging to the uniform component is referred to as 
			
 
				 \begin_inset Formula $\pi_{0}$
			
 
				 \end_inset
			
@@ -3086,7 +3096,7 @@ When only a fraction of null hypotheses are true, the p-value distribution
 
				 , which ranges from 1 (all null hypotheses true) to 0 (all null hypotheses
			
 
				  false).
			
 
				  Furthermore, the non-uniform component must be biased toward zero, since
			
 
				- any evidence against the null hypothesis must push the p-value for a test
			
 
				+ any evidence against the null hypothesis pushes the p-value for a test
			
 
				  toward zero.
			
 
				  We can exploit this fact to estimate the 
			
 
				 \begin_inset Flex Glossary Term
			
@@ -3137,7 +3147,7 @@ literal "false"
 
				 \begin_inset Formula $\pi_{0}=1$
			
 
				 \end_inset
			
 
				 
			
 
				- unconditionally.
			
 
				+.
			
 
				  Hence it gives an estimated upper bound for the 
			
 
				 \begin_inset Flex Glossary Term
			
 
				 status open
			
@@ -3149,6 +3159,7 @@ FDR
 
				 \end_inset
			
 
				 
			
 
				  at any significance threshold, rather than a point estimate.
			
 
				+ 
			
 
				 \end_layout
			
 
				 
			
 
				 \begin_layout Standard
			
@@ -3226,14 +3237,81 @@ The distribution of p-values from a large number of independent tests (such
 
				 
			
 
				 \end_layout
			
 
				 
			
 
				+\begin_layout Standard
			
 
				+We can also estimate 
			
 
				+\begin_inset Formula $\pi_{0}$
			
 
				+\end_inset
			
 
				+
			
 
				+ for the entire distribution of p-values, which can give an idea of the
			
 
				+ overall signal size in the data without setting any significance threshold
			
 
				+ or making any decisions about which specific null hypotheses to reject.
			
 
				+ As 
			
 
				+\begin_inset Flex Glossary Term
			
 
				+status open
			
 
				+
			
 
				+\begin_layout Plain Layout
			
 
				+FDR
			
 
				+\end_layout
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+ estimation, there are many methods proposed for estimating 
			
 
				+\begin_inset Formula $\pi_{0}$
			
 
				+\end_inset
			
 
				+
			
 
				+.
			
 
				+ The one used in this work is the Phipson method of averaging local 
			
 
				+\begin_inset Flex Glossary Term
			
 
				+status open
			
 
				+
			
 
				+\begin_layout Plain Layout
			
 
				+FDR
			
 
				+\end_layout
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+ values 
			
 
				+\begin_inset CommandInset citation
			
 
				+LatexCommand cite
			
 
				+key "Phipson2013Thesis"
			
 
				+literal "false"
			
 
				+
			
 
				+\end_inset
			
 
				+
			
 
				+.
			
 
				+ Once 
			
 
				+\begin_inset Formula $\pi_{0}$
			
 
				+\end_inset
			
 
				+
			
 
				+ is estimated, the number of null hypotheses that are false can be estimated
			
 
				+ as 
			
 
				+\begin_inset Formula $(1-\pi_{0})*N$
			
 
				+\end_inset
			
 
				+
			
 
				+.
			
 
				+\end_layout
			
 
				+
			
 
				 \begin_layout Standard
			
 
				 Conversely, a p-value distribution that is neither uniform nor zero-biased
			
 
				  is evidence of a modeling failure.
			
 
				  Such a distribution would imply that there is less than zero evidence against
			
 
				  the null hypothesis, which is not possible (in a frequentist setting).
			
 
				- The usual cause is a model assumption that is violated by the data, such
			
 
				- as assuming equal variance between groups (homoskedasticity) when the variance
			
 
				- of each group is not equal (heteroskedasticity).
			
 
				+ Attempting to estimate 
			
 
				+\begin_inset Formula $\pi_{0}$
			
 
				+\end_inset
			
 
				+
			
 
				+ from such a distribution would yield an estimate greater than 1, a nonsensical
			
 
				+ result.
			
 
				+ The usual cause of a poorly-behaving p-value distribution is a model assumption
			
 
				+ that is violated by the data, such as assuming equal variance between groups
			
 
				+ (homoskedasticity) when the variance of each group is not equal (heteroskedasti
			
 
				+city) or failing to model a strong confounding batch effect.
			
 
				+ In particular, such a p-value distribution is 
			
 
				+\emph on
			
 
				+not 
			
 
				+\emph default
			
 
				+consistent with a simple lack of signal in the data, as this should result
			
 
				+ in a uniform distribution.
			
 
				  Hence, observing such a p-value distribution should prompt a search for
			
 
				  violated model assumptions.
			
 
				 \end_layout