|
@@ -44,6 +44,7 @@
|
|
\use_default_options true
|
|
\use_default_options true
|
|
\begin_modules
|
|
\begin_modules
|
|
todonotes
|
|
todonotes
|
|
|
|
+logicalmkup
|
|
\end_modules
|
|
\end_modules
|
|
\maintain_unincluded_children false
|
|
\maintain_unincluded_children false
|
|
\language english
|
|
\language english
|
|
@@ -261,7 +262,14 @@ Search and replace: naive -> naïve
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Look into auto-generated nomenclature list: https://wiki.lyx.org/Tips/Nomenclature.
|
|
|
|
|
|
+Look into auto-generated nomenclature list:
|
|
|
|
+\begin_inset CommandInset href
|
|
|
|
+LatexCommand href
|
|
|
|
+target "https://wiki.lyx.org/Tips/Nomenclature"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
Otherwise, do a manual pass for all abbreviations at the end.
|
|
Otherwise, do a manual pass for all abbreviations at the end.
|
|
Do nomenclature/abbreviations independently for each chapter.
|
|
Do nomenclature/abbreviations independently for each chapter.
|
|
\end_layout
|
|
\end_layout
|
|
@@ -442,7 +450,7 @@ My thesis is due Thursday, October 10th, so in order to be useful to me,
|
|
I'll need your feedback at least a few days before that, ideally by Monday,
|
|
I'll need your feedback at least a few days before that, ideally by Monday,
|
|
October 7th.
|
|
October 7th.
|
|
If you have limited time and are unable to get through the whole thesis,
|
|
If you have limited time and are unable to get through the whole thesis,
|
|
- please focus your effors on Chapters 1 and 2, since those are the roughest
|
|
|
|
|
|
+ please focus your efforts on Chapters 1 and 2, since those are the roughest
|
|
and most in need of revision.
|
|
and most in need of revision.
|
|
Chapter 3 is fairly short and straightforward, and Chapter 4 is an adaptation
|
|
Chapter 3 is fairly short and straightforward, and Chapter 4 is an adaptation
|
|
of a paper that's already been through a few rounds of revision, so they
|
|
of a paper that's already been through a few rounds of revision, so they
|
|
@@ -502,13 +510,13 @@ Rejection is the major long-term threat to organ and tissue allografts
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
Organ and tissue transplants are a life-saving treatment for people who
|
|
Organ and tissue transplants are a life-saving treatment for people who
|
|
- have lost the function of an important organ.
|
|
|
|
|
|
+ have lost the function of an important organ [CITE?].
|
|
In some cases, it is possible to transplant a patient's own tissue from
|
|
In some cases, it is possible to transplant a patient's own tissue from
|
|
one area of their body to another, referred to as an autograft.
|
|
one area of their body to another, referred to as an autograft.
|
|
This is common for tissues that are distributed throughout many areas of
|
|
This is common for tissues that are distributed throughout many areas of
|
|
the body, such as skin and bone.
|
|
the body, such as skin and bone.
|
|
However, in cases of organ failure, there is no functional self tissue
|
|
However, in cases of organ failure, there is no functional self tissue
|
|
- remaining, and a transplant from another person – the donor – is required.
|
|
|
|
|
|
+ remaining, and a transplant from another person – a donor – is required.
|
|
This is referred to as an allograft.
|
|
This is referred to as an allograft.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
@@ -517,8 +525,14 @@ Organ and tissue transplants are a life-saving treatment for people who
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Possible citation for degree of generic variability: https://www.ncbi.nlm.nih.gov/pu
|
|
|
|
-bmed/22424236?dopt=Abstract
|
|
|
|
|
|
+Possible citation for degree of generic variability:
|
|
|
|
+\begin_inset CommandInset href
|
|
|
|
+LatexCommand href
|
|
|
|
+target "https://www.ncbi.nlm.nih.gov/pubmed/22424236?dopt=Abstract"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -550,12 +564,13 @@ Because an allograft comes from a different person, it is genetically distinct
|
|
y identify the graft as foreign tissue and begin attacking it, eventually
|
|
y identify the graft as foreign tissue and begin attacking it, eventually
|
|
resulting in failure and death of the graft, a process referred to as transplan
|
|
resulting in failure and death of the graft, a process referred to as transplan
|
|
t rejection.
|
|
t rejection.
|
|
- Rejection is the most significant challenge to the long-term health of
|
|
|
|
- an allograft.
|
|
|
|
|
|
+ Rejection is the most significant challenge to the long-term health and
|
|
|
|
+ survival of an allograft [CITE?].
|
|
Like any adaptive immune response, graft rejection generally occurs via
|
|
Like any adaptive immune response, graft rejection generally occurs via
|
|
- two broad mechanisms: cellular immunity, in which CD8+ T-cells induce apoptosis
|
|
|
|
- in the graft cells; and humoral immunity, in which B-cells produce antibodies
|
|
|
|
- that bind to graft proteins and direct an immune response against the graft.
|
|
|
|
|
|
+ two broad mechanisms: cellular immunity, in which CD8+ T-cells recognizing
|
|
|
|
+ graft-specific antigens induce apoptosis in the graft cells; and humoral
|
|
|
|
+ immunity, in which B-cells produce antibodies that bind to graft proteins
|
|
|
|
+ and direct an immune response against the graft [CITE?].
|
|
In either case, rejection shows most of the typical hallmarks of an adaptive
|
|
In either case, rejection shows most of the typical hallmarks of an adaptive
|
|
immune response, in particular mediation by CD4+ T-cells and formation
|
|
immune response, in particular mediation by CD4+ T-cells and formation
|
|
of immune memory.
|
|
of immune memory.
|
|
@@ -566,17 +581,18 @@ Diagnosis and treatment of allograft rejection is a major challenge
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-To prevent rejection, allograft recipients are treated with immune suppression.
|
|
|
|
|
|
+To prevent rejection, allograft recipients are treated with immune suppressive
|
|
|
|
+ drugs [CITE?].
|
|
The goal is to achieve sufficient suppression of the immune system to prevent
|
|
The goal is to achieve sufficient suppression of the immune system to prevent
|
|
rejection of the graft without compromising the ability of the immune system
|
|
rejection of the graft without compromising the ability of the immune system
|
|
to raise a normal response against infection.
|
|
to raise a normal response against infection.
|
|
As such, a delicate balance must be struck: insufficient immune suppression
|
|
As such, a delicate balance must be struck: insufficient immune suppression
|
|
- may lead to rejection and ultimately loss of the graft; exceissive suppression
|
|
|
|
|
|
+ may lead to rejection and ultimately loss of the graft; excessive suppression
|
|
leaves the patient vulnerable to life-threatening opportunistic infections.
|
|
leaves the patient vulnerable to life-threatening opportunistic infections.
|
|
Because every patient is different, immune suppression must be tailored
|
|
Because every patient is different, immune suppression must be tailored
|
|
for each patient.
|
|
for each patient.
|
|
Furthermore, immune suppression must be tuned over time, as the immune
|
|
Furthermore, immune suppression must be tuned over time, as the immune
|
|
- system's activity is not static, nor is it held in a steady state.
|
|
|
|
|
|
+ system's activity is not static, nor is it held in a steady state [CITE?].
|
|
In order to properly adjust the dosage of immune suppression drugs, it
|
|
In order to properly adjust the dosage of immune suppression drugs, it
|
|
is necessary to monitor the health of the transplant and increase the dosage
|
|
is necessary to monitor the health of the transplant and increase the dosage
|
|
if evidence of rejection is observed.
|
|
if evidence of rejection is observed.
|
|
@@ -585,9 +601,9 @@ To prevent rejection, allograft recipients are treated with immune suppression.
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
However, diagnosis of rejection is a significant challenge.
|
|
However, diagnosis of rejection is a significant challenge.
|
|
Early diagnosis is essential in order to step up immune suppression before
|
|
Early diagnosis is essential in order to step up immune suppression before
|
|
- the immune system damages the graft beyond recovery.
|
|
|
|
|
|
+ the immune system damages the graft beyond recovery [CITE?].
|
|
The current gold standard test for graft rejection is a tissue biopsy,
|
|
The current gold standard test for graft rejection is a tissue biopsy,
|
|
- examained for visible signs of rejection by a trained histologist.
|
|
|
|
|
|
+ examined for visible signs of rejection by a trained histologist [CITE?].
|
|
When a patient shows symptoms of possible rejection, a
|
|
When a patient shows symptoms of possible rejection, a
|
|
\begin_inset Quotes eld
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
@@ -607,7 +623,7 @@ sub-clinical
|
|
\begin_inset Quotes erd
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- rejection.
|
|
|
|
|
|
+ rejection [CITE?].
|
|
In light of this, is is now common to perform
|
|
In light of this, is is now common to perform
|
|
\begin_inset Quotes eld
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
@@ -640,20 +656,20 @@ literal "false"
|
|
However, biopsies have a number of downsides that limit their effectiveness
|
|
However, biopsies have a number of downsides that limit their effectiveness
|
|
as a diagnostic tool.
|
|
as a diagnostic tool.
|
|
First, the need for manual inspection by a histologist means that diagnosis
|
|
First, the need for manual inspection by a histologist means that diagnosis
|
|
- is subject to the biases of the particular histologist examining the biopsy.
|
|
|
|
- In marginal cases two different histologists may give two different diagnoses
|
|
|
|
|
|
+ is subject to the biases of the particular histologist examining the biopsy
|
|
|
|
+ [CITE?].
|
|
|
|
+ In marginal cases, two different histologists may give two different diagnoses
|
|
to the same biopsy.
|
|
to the same biopsy.
|
|
Second, a biopsy can only evaluate if rejection is occurring in the section
|
|
Second, a biopsy can only evaluate if rejection is occurring in the section
|
|
of the graft from which the tissue was extracted.
|
|
of the graft from which the tissue was extracted.
|
|
- If rejection is only occurring in one section of the graft and the tissue
|
|
|
|
- is extracted from a different section, it may result in a false negative
|
|
|
|
- diagnosis.
|
|
|
|
- Most importantly, however, extraction of tissue from a graft is invasive
|
|
|
|
- and is treated as an injury by the body, which results in inflammation
|
|
|
|
- that in turn promotes increased immune system activity.
|
|
|
|
|
|
+ If rejection is localized to one section of the graft and the tissue is
|
|
|
|
+ extracted from a different section, a false negative diagnosis may result.
|
|
|
|
+ Most importantly, extraction of tissue from a graft is invasive and is
|
|
|
|
+ treated as an injury by the body, which results in inflammation that in
|
|
|
|
+ turn promotes increased immune system activity [CITE?].
|
|
Hence, the invasiveness of biopsies severely limits the frequency with
|
|
Hence, the invasiveness of biopsies severely limits the frequency with
|
|
- which the can safely be performed.
|
|
|
|
- Typically protocol biopsies are not scheduled more than about once per
|
|
|
|
|
|
+ which they can safely be performed.
|
|
|
|
+ Typically, protocol biopsies are not scheduled more than about once per
|
|
month
|
|
month
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
@@ -670,11 +686,11 @@ literal "false"
|
|
would make it easier to evaluate when a given test is outside the normal
|
|
would make it easier to evaluate when a given test is outside the normal
|
|
parameters for that specific patient, rather than relying on normal ranges
|
|
parameters for that specific patient, rather than relying on normal ranges
|
|
for the population as a whole.
|
|
for the population as a whole.
|
|
- Lastly, more frequent tests would be a boon to the transplant research
|
|
|
|
- community.
|
|
|
|
- Beyond simply providing more data, the increased time granularity of the
|
|
|
|
- tests will enable studying the progression of a rejection event on the
|
|
|
|
- scale of days to weeks, rather than months.
|
|
|
|
|
|
+ Lastly, the accumulated data from more frequent tests would be a boon to
|
|
|
|
+ the transplant research community.
|
|
|
|
+ Beyond simply providing more data overall, the better time granularity
|
|
|
|
+ of the tests will enable studying the progression of a rejection event
|
|
|
|
+ on the scale of days to weeks, rather than months.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Subsubsection
|
|
\begin_layout Subsubsection
|
|
@@ -705,8 +721,8 @@ false positive
|
|
immune responses, because antigen-presenting cells usually only express
|
|
immune responses, because antigen-presenting cells usually only express
|
|
the proper co-stimulation after detecting evidence of an infection, such
|
|
the proper co-stimulation after detecting evidence of an infection, such
|
|
as the presence of common bacterial cell components or inflamed tissue.
|
|
as the presence of common bacterial cell components or inflamed tissue.
|
|
- Most effector cells die after the foreign antigen is cleared, but some
|
|
|
|
- remain and differentiate into memory cells.
|
|
|
|
|
|
+ Most effector cells die after the foreign antigen is cleared, since they
|
|
|
|
+ are no longer needed, but some remain and differentiate into memory cells.
|
|
Like naive cells, memory cells respond to detection of their specific antigen
|
|
Like naive cells, memory cells respond to detection of their specific antigen
|
|
by differentiating into effector cells, ready to fight an infection.
|
|
by differentiating into effector cells, ready to fight an infection.
|
|
However, unlike naive cells, memory cells do not require the same degree
|
|
However, unlike naive cells, memory cells do not require the same degree
|
|
@@ -719,10 +735,10 @@ In the context of a pathogenic infection, immune memory is a major advantage,
|
|
allowing an organism to rapidly fight off a previously encountered pathogen
|
|
allowing an organism to rapidly fight off a previously encountered pathogen
|
|
much more quickly and effectively than the first time it was encountered.
|
|
much more quickly and effectively than the first time it was encountered.
|
|
However, if effector cells that recognize an antigen from an allograft
|
|
However, if effector cells that recognize an antigen from an allograft
|
|
- are allowed to differentiate into memory cells, suppressing rejection of
|
|
|
|
|
|
+ are allowed to differentiate into memory cells, preventing rejection of
|
|
the graft becomes much more difficult.
|
|
the graft becomes much more difficult.
|
|
Many immune suppression drugs work by interfering with the co-stimulation
|
|
Many immune suppression drugs work by interfering with the co-stimulation
|
|
- that naive cells require in order to mount an immune response.
|
|
|
|
|
|
+ that naive cells require in order to mount an immune response [CITE?].
|
|
Since memory cells do not require this co-stimulation, these drugs are
|
|
Since memory cells do not require this co-stimulation, these drugs are
|
|
not effective at suppressing an immune response that is mediated by memory
|
|
not effective at suppressing an immune response that is mediated by memory
|
|
cells.
|
|
cells.
|
|
@@ -737,15 +753,30 @@ In the context of a pathogenic infection, immune memory is a major advantage,
|
|
cells have been fairly well characterized, the internal regulatory mechanisms
|
|
cells have been fairly well characterized, the internal regulatory mechanisms
|
|
that allow memory cells to respond more quickly and without co-stimulation
|
|
that allow memory cells to respond more quickly and without co-stimulation
|
|
are still poorly understood.
|
|
are still poorly understood.
|
|
- In order to develop immune suppression that either prevents the formation
|
|
|
|
- of memory cells or works more effectively against memory cells, the mechanisms
|
|
|
|
- of immune memory formation and regulation must be better understood.
|
|
|
|
|
|
+ In order to develop methods of immune suppression that either prevent the
|
|
|
|
+ formation of memory cells or work more effectively against memory cells,
|
|
|
|
+ the mechanisms of immune memory formation and regulation must be better
|
|
|
|
+ understood.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Subsection
|
|
\begin_layout Subsection
|
|
Overview of bioinformatic analysis methods
|
|
Overview of bioinformatic analysis methods
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
|
|
+\begin_layout Standard
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Also cite: R, Bioconductor, snakemake, python, pandas, bedtools, bowtie2,
|
|
|
|
+ hisat2, STAR, samtools, sra-toolkit, picard tools
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
The studies presented in this work all involve the analysis of high-throughput
|
|
The studies presented in this work all involve the analysis of high-throughput
|
|
genomic and epigenomic data.
|
|
genomic and epigenomic data.
|
|
@@ -779,7 +810,15 @@ Linear models are a generalization of the
|
|
\begin_inset Formula $t$
|
|
\begin_inset Formula $t$
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
--test and ANOVA to arbitrarily complex experimental designs.
|
|
|
|
|
|
+-test and ANOVA to arbitrarily complex experimental designs
|
|
|
|
+\begin_inset CommandInset citation
|
|
|
|
+LatexCommand cite
|
|
|
|
+key "chambers:1992"
|
|
|
|
+literal "false"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+.
|
|
In a typical linear model, there is one dependent variable observation
|
|
In a typical linear model, there is one dependent variable observation
|
|
per sample.
|
|
per sample.
|
|
For example, in a linear model of height as a function of age and sex,
|
|
For example, in a linear model of height as a function of age and sex,
|
|
@@ -811,7 +850,9 @@ Linear models are a generalization of the
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
The central challenge when fitting a linear model is to estimate the variance
|
|
The central challenge when fitting a linear model is to estimate the variance
|
|
of the data accurately.
|
|
of the data accurately.
|
|
- This quantity is the most difficult to estimate when sample sizes are small.
|
|
|
|
|
|
+ Out of all parameters required to evaluate statistical significance of
|
|
|
|
+ an effect, the variance is the most difficult to estimate when sample sizes
|
|
|
|
+ are small.
|
|
A single shared variance could be estimated for all of the features together,
|
|
A single shared variance could be estimated for all of the features together,
|
|
and this estimate would be very stable, in contrast to the individual feature
|
|
and this estimate would be very stable, in contrast to the individual feature
|
|
variance estimates.
|
|
variance estimates.
|
|
@@ -837,9 +878,9 @@ literal "false"
|
|
|
|
|
|
.
|
|
.
|
|
While the individual feature variance estimates are not stable, the common
|
|
While the individual feature variance estimates are not stable, the common
|
|
- variance estiamate for the entire data set is quite stable, so using a
|
|
|
|
- combination of the two yields a variance estimate for each feature with
|
|
|
|
- greater precision than the individual feature varaiances.
|
|
|
|
|
|
+ variance estimate for the entire data set is quite stable, so using a combinati
|
|
|
|
+on of the two yields a variance estimate for each feature with greater precision
|
|
|
|
+ than the individual feature variances.
|
|
The trade-off for this improvement is that squeezing each estimated variance
|
|
The trade-off for this improvement is that squeezing each estimated variance
|
|
toward the common value introduces some bias – the variance will be underestima
|
|
toward the common value introduces some bias – the variance will be underestima
|
|
ted for features with high variance and overestimated for features with
|
|
ted for features with high variance and overestimated for features with
|
|
@@ -871,7 +912,7 @@ literal "false"
|
|
genes.
|
|
genes.
|
|
While linear models typically assume that all samples have equal variance,
|
|
While linear models typically assume that all samples have equal variance,
|
|
limma is able to relax this assumption by identifying and down-weighting
|
|
limma is able to relax this assumption by identifying and down-weighting
|
|
- samples the diverge more strongly from the lienar model across many features
|
|
|
|
|
|
+ samples the diverge more strongly from the linear model across many features
|
|
|
|
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
@@ -922,7 +963,7 @@ literal "false"
|
|
The Poisson distribution accurately represents the distribution of counts
|
|
The Poisson distribution accurately represents the distribution of counts
|
|
expected for a given gene abundance, and the gamma distribution is then
|
|
expected for a given gene abundance, and the gamma distribution is then
|
|
used to represent the variation in gene abundance between biological replicates.
|
|
used to represent the variation in gene abundance between biological replicates.
|
|
- For this reason, the square root of the dispersion paramter of the negative
|
|
|
|
|
|
+ For this reason, the square root of the dispersion parameter of the negative
|
|
binomial is sometimes referred to as the biological coefficient of variation,
|
|
binomial is sometimes referred to as the biological coefficient of variation,
|
|
since it represents the variability that was present in the samples prior
|
|
since it represents the variability that was present in the samples prior
|
|
to the Poisson
|
|
to the Poisson
|
|
@@ -967,8 +1008,8 @@ Unlike RNA-seq data, in which gene annotations provide a well-defined set
|
|
occur anywhere in the genome.
|
|
occur anywhere in the genome.
|
|
However, most genome regions will not contain significant ChIP-seq read
|
|
However, most genome regions will not contain significant ChIP-seq read
|
|
coverage, and analyzing every position in the entire genome is statistically
|
|
coverage, and analyzing every position in the entire genome is statistically
|
|
- and computationally infeasible, so it is necesary to identify regions of
|
|
|
|
- interest inside which ChIP-seq reads will be counted and analyzed.
|
|
|
|
|
|
+ and computationally infeasible, so it is necessary to identify regions
|
|
|
|
+ of interest inside which ChIP-seq reads will be counted and analyzed.
|
|
One option is to define a set of interesting regions
|
|
One option is to define a set of interesting regions
|
|
\emph on
|
|
\emph on
|
|
a priori
|
|
a priori
|
|
@@ -1008,7 +1049,7 @@ literal "false"
|
|
|
|
|
|
.
|
|
.
|
|
In contrast, some proteins, chief among them histones, do not bind only
|
|
In contrast, some proteins, chief among them histones, do not bind only
|
|
- at a small number of specific sites, but rather bind potentailly almost
|
|
|
|
|
|
+ at a small number of specific sites, but rather bind potentially almost
|
|
everywhere in the entire genome.
|
|
everywhere in the entire genome.
|
|
When looking at histone marks, adjacent histones tend to be similarly marked,
|
|
When looking at histone marks, adjacent histones tend to be similarly marked,
|
|
and a given mark may be present on an arbitrary number of consecutive histones
|
|
and a given mark may be present on an arbitrary number of consecutive histones
|
|
@@ -1097,7 +1138,7 @@ In addition to other considerations, if called peaks are to be used as regions
|
|
are called based on a combination of all ChIP-seq reads from all experimental
|
|
are called based on a combination of all ChIP-seq reads from all experimental
|
|
conditions, so that the identified peaks are based on the average abundance
|
|
conditions, so that the identified peaks are based on the average abundance
|
|
across all conditions, which is independent of any differential abundance
|
|
across all conditions, which is independent of any differential abundance
|
|
- between condtions
|
|
|
|
|
|
+ between conditions
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
key "Lun2015a"
|
|
key "Lun2015a"
|
|
@@ -1109,7 +1150,7 @@ literal "false"
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Subsubsection
|
|
\begin_layout Subsubsection
|
|
-Normalization of high-throughput data is non-trivial and application-dependant
|
|
|
|
|
|
+Normalization of high-throughput data is non-trivial and application-dependent
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -1122,7 +1163,7 @@ High-throughput data sets invariable require some kind of normalization
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
For Affymetrix expression arrays, the standard normalization algorithm used
|
|
For Affymetrix expression arrays, the standard normalization algorithm used
|
|
- in most analyses is Robust Multichip Average (RMA).
|
|
|
|
|
|
+ in most analyses is Robust Multichip Average (RMA) [CITE].
|
|
RMA is designed with the assumption that some fraction of probes on each
|
|
RMA is designed with the assumption that some fraction of probes on each
|
|
array will be artifactual and takes advantage of the fact that each gene
|
|
array will be artifactual and takes advantage of the fact that each gene
|
|
is represented by multiple probes by implementing normalization and summarizati
|
|
is represented by multiple probes by implementing normalization and summarizati
|
|
@@ -1143,7 +1184,9 @@ frozen
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
, so that each array is effectively normalized against this frozen reference
|
|
, so that each array is effectively normalized against this frozen reference
|
|
- set rather than the other arrays in the data set under study.
|
|
|
|
|
|
+ set rather than the other arrays in the data set under study [CITE].
|
|
|
|
+ Other array normalization methods considered include dChip, GRSN, and SCAN
|
|
|
|
+ [CITEx3].
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -1159,7 +1202,7 @@ n challenges.
|
|
(CPM).
|
|
(CPM).
|
|
Furthermore, if the abundance of a single gene increases, then in order
|
|
Furthermore, if the abundance of a single gene increases, then in order
|
|
for its fraction of the total reads to increase, all other genes' fractions
|
|
for its fraction of the total reads to increase, all other genes' fractions
|
|
- must decrease to accomodate it.
|
|
|
|
|
|
+ must decrease to accommodate it.
|
|
This effect is known as composition bias, and it is an artifact of the
|
|
This effect is known as composition bias, and it is an artifact of the
|
|
read sampling process that has nothing to do with the biology of the samples
|
|
read sampling process that has nothing to do with the biology of the samples
|
|
and must therefore be normalized out.
|
|
and must therefore be normalized out.
|
|
@@ -1204,7 +1247,7 @@ literal "false"
|
|
to implement a normalization as a smooth function of abundance.
|
|
to implement a normalization as a smooth function of abundance.
|
|
However, this strategy makes a much stronger assumption about the data:
|
|
However, this strategy makes a much stronger assumption about the data:
|
|
that the average log fold change is zero across all abundance levels.
|
|
that the average log fold change is zero across all abundance levels.
|
|
- Hence, the simpler scaling normalziations based on background or signal
|
|
|
|
|
|
+ Hence, the simpler scaling normalization based on background or signal
|
|
regions are generally preferred whenever possible.
|
|
regions are generally preferred whenever possible.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
@@ -1439,7 +1482,7 @@ CD4 T-cell
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
.
|
|
.
|
|
- I think there might be a plus sign somwehere in there now? Also, maybe
|
|
|
|
|
|
+ I think there might be a plus sign somewhere in there now? Also, maybe
|
|
figure out a reasonable way to abbreviate
|
|
figure out a reasonable way to abbreviate
|
|
\begin_inset Quotes eld
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
@@ -1485,7 +1528,7 @@ CD4 T-cells are central to all adaptive immune responses, as well as immune
|
|
to that infection differentiate into memory CD4 T-cells, which are responsible
|
|
to that infection differentiate into memory CD4 T-cells, which are responsible
|
|
for responding to the same pathogen in the future.
|
|
for responding to the same pathogen in the future.
|
|
Memory CD4 T-cells are functionally distinct, able to respond to an infection
|
|
Memory CD4 T-cells are functionally distinct, able to respond to an infection
|
|
- more quickly and without the co-stimulation requried by naive CD4 T-cells.
|
|
|
|
|
|
+ more quickly and without the co-stimulation required by naive CD4 T-cells.
|
|
However, the molecular mechanisms underlying this functional distinction
|
|
However, the molecular mechanisms underlying this functional distinction
|
|
are not well-understood.
|
|
are not well-understood.
|
|
Epigenetic regulation via histone modification is thought to play an important
|
|
Epigenetic regulation via histone modification is thought to play an important
|
|
@@ -1532,17 +1575,17 @@ In order to investigate the relationship between gene expression and these
|
|
before and after activation.
|
|
before and after activation.
|
|
Like the original analysis, this analysis looks at the dynamics of these
|
|
Like the original analysis, this analysis looks at the dynamics of these
|
|
marks histone marks and compare them to gene expression dynamics at the
|
|
marks histone marks and compare them to gene expression dynamics at the
|
|
- same time points during activation, as well as comapre them between naive
|
|
|
|
|
|
+ same time points during activation, as well as compare them between naive
|
|
and memory cells, in hope of discovering evidence of new mechanistic details
|
|
and memory cells, in hope of discovering evidence of new mechanistic details
|
|
in the interplay between them.
|
|
in the interplay between them.
|
|
- The original analysis of this data treated each gene promoter as a monolithinc
|
|
|
|
- unit and mostly assumed that ChIP-seq reads or peaks occuring anywhere
|
|
|
|
|
|
+ The original analysis of this data treated each gene promoter as a monolithic
|
|
|
|
+ unit and mostly assumed that ChIP-seq reads or peaks occurring anywhere
|
|
within a promoter were equivalent, regardless of where they occurred relative
|
|
within a promoter were equivalent, regardless of where they occurred relative
|
|
to the gene structure.
|
|
to the gene structure.
|
|
For an initial analysis of the data, this was a necessary simplifying assumptio
|
|
For an initial analysis of the data, this was a necessary simplifying assumptio
|
|
n.
|
|
n.
|
|
The current analysis aims to relax this assumption, first by directly analyzing
|
|
The current analysis aims to relax this assumption, first by directly analyzing
|
|
- ChIP-seq peaks for differential modification, and second by taking a mor
|
|
|
|
|
|
+ ChIP-seq peaks for differential modification, and second by taking a more
|
|
granular look at the ChIP-seq read coverage within promoter regions to
|
|
granular look at the ChIP-seq read coverage within promoter regions to
|
|
ask whether the location of histone modifications relative to the gene's
|
|
ask whether the location of histone modifications relative to the gene's
|
|
TSS is an important factor, as opposed to simple proximity.
|
|
TSS is an important factor, as opposed to simple proximity.
|
|
@@ -1739,7 +1782,7 @@ status collapsed
|
|
\begin_inset Caption Standard
|
|
\begin_inset Caption Standard
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Salomn vs STAR quantification, Ensembl gene annotation
|
|
|
|
|
|
+Salmon vs STAR quantification, Ensembl gene annotation
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -2032,7 +2075,7 @@ literal "false"
|
|
|
|
|
|
, ignoring the time point variable due to the confounding with the batch
|
|
, ignoring the time point variable due to the confounding with the batch
|
|
variable.
|
|
variable.
|
|
- The result is a marked improvement, but the unavoidable counfounding with
|
|
|
|
|
|
+ The result is a marked improvement, but the unavoidable confounding with
|
|
time point means that certain real patterns of gene expression will be
|
|
time point means that certain real patterns of gene expression will be
|
|
indistinguishable from the batch effect and subtracted out as a result.
|
|
indistinguishable from the batch effect and subtracted out as a result.
|
|
Specifically, any
|
|
Specifically, any
|
|
@@ -2123,7 +2166,7 @@ literal "false"
|
|
|
|
|
|
.
|
|
.
|
|
The resulting analysis gives an accurate assessment of statistical significance
|
|
The resulting analysis gives an accurate assessment of statistical significance
|
|
- for all comparisons, which unfortuantely means a loss of statistical power
|
|
|
|
|
|
+ for all comparisons, which unfortunately means a loss of statistical power
|
|
for comparisons involving samples in batch 1.
|
|
for comparisons involving samples in batch 1.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
@@ -2137,8 +2180,17 @@ literal "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-, converted to normalized logCPM with quality weights using voomWithQualityWeigh
|
|
|
|
-ts
|
|
|
|
|
|
+, converted to normalized logCPM with quality weights using
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+voomWithQualityWeights
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
key "Law2013,Liu2015"
|
|
key "Law2013,Liu2015"
|
|
@@ -3096,7 +3148,7 @@ noprefix "false"
|
|
|
|
|
|
.
|
|
.
|
|
Latent factors 1, 4, and 5 were determined to explain the most variation
|
|
Latent factors 1, 4, and 5 were determined to explain the most variation
|
|
- consistently across all data sets (Fgure
|
|
|
|
|
|
+ consistently across all data sets (Figure
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
reference "fig:mofa-varexplained"
|
|
reference "fig:mofa-varexplained"
|
|
@@ -4490,7 +4542,7 @@ status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
This figure is generated from the old analysis.
|
|
This figure is generated from the old analysis.
|
|
- Eiher note that in some way or re-generate it from the new peak calls.
|
|
|
|
|
|
+ Either note that in some way or re-generate it from the new peak calls.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5283,7 +5335,7 @@ name "fig:RNA-PCA-group"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-RNA-seq PCoA showing principal coordiantes 2 and 3.
|
|
|
|
|
|
+RNA-seq PCoA showing principal coordinates 2 and 3.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -5379,7 +5431,7 @@ noprefix "false"
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
-), albiet in the 2nd and 3rd principal coordinates, indicating that it is
|
|
|
|
|
|
+), albeit in the 2nd and 3rd principal coordinates, indicating that it is
|
|
not the most dominant pattern driving gene expression.
|
|
not the most dominant pattern driving gene expression.
|
|
Taken together, the data show that promoter histone methylation for these
|
|
Taken together, the data show that promoter histone methylation for these
|
|
3 histone marks and RNA expression for naive and memory cells are most
|
|
3 histone marks and RNA expression for naive and memory cells are most
|
|
@@ -6564,7 +6616,7 @@ clusters
|
|
are really just sections of a single connected cloud rather than discrete
|
|
are really just sections of a single connected cloud rather than discrete
|
|
clusters.
|
|
clusters.
|
|
The cloud is approximately ellipsoid-shaped, with each PC being an axis
|
|
The cloud is approximately ellipsoid-shaped, with each PC being an axis
|
|
- of the ellipse, and each cluster consisting of a pyrimidal section of the
|
|
|
|
|
|
+ of the ellipse, and each cluster consisting of a pyramidal section of the
|
|
ellipsoid.
|
|
ellipsoid.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
@@ -6709,7 +6761,7 @@ one size fits all
|
|
data within an experiment may not be appropriate, and a better approach
|
|
data within an experiment may not be appropriate, and a better approach
|
|
may be to use a separate promoter radius for each kind of data, with each
|
|
may be to use a separate promoter radius for each kind of data, with each
|
|
radius being derived from the data itself.
|
|
radius being derived from the data itself.
|
|
- Furthermore, the apparent assymetry of upstream and downstream promoter
|
|
|
|
|
|
+ Furthermore, the apparent asymmetry of upstream and downstream promoter
|
|
histone modification with respect to gene expression, seen in Figures
|
|
histone modification with respect to gene expression, seen in Figures
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
@@ -6787,7 +6839,7 @@ noprefix "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
kb is approximately consistent with the distance from the TSS at which enrichmen
|
|
kb is approximately consistent with the distance from the TSS at which enrichmen
|
|
-t of H3K4 methylationis correlates with increased expression, showing that
|
|
|
|
|
|
+t of H3K4 methylation correlates with increased expression, showing that
|
|
this radius, which was determined by a simple analysis of measuring the
|
|
this radius, which was determined by a simple analysis of measuring the
|
|
distance from each TSS to the nearest peak, also has functional significance.
|
|
distance from each TSS to the nearest peak, also has functional significance.
|
|
For H3K27me3, the correlation between histone modification near the promoter
|
|
For H3K27me3, the correlation between histone modification near the promoter
|
|
@@ -7378,7 +7430,7 @@ effective promoter radius
|
|
\begin_inset Quotes erd
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
- specific to each histone mark based on distince from the TSS within which
|
|
|
|
|
|
+ specific to each histone mark based on distance from the TSS within which
|
|
an excess of peaks was called for that mark.
|
|
an excess of peaks was called for that mark.
|
|
This concept was then used to guide further analyses throughout the study.
|
|
This concept was then used to guide further analyses throughout the study.
|
|
However, while the effective promoter radius was useful in those analyses,
|
|
However, while the effective promoter radius was useful in those analyses,
|
|
@@ -7593,7 +7645,7 @@ To better study the convergence hypothesis, a new experiment should be designed
|
|
the same cell cultures could be activated serially multiple times, and
|
|
the same cell cultures could be activated serially multiple times, and
|
|
sequenced after each activation cycle right before the next activation.
|
|
sequenced after each activation cycle right before the next activation.
|
|
It is likely that several activations in the same model system will settle
|
|
It is likely that several activations in the same model system will settle
|
|
- into a cylical pattern, converging to a consistent
|
|
|
|
|
|
+ into a cyclical pattern, converging to a consistent
|
|
\begin_inset Quotes eld
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
@@ -7715,7 +7767,7 @@ These three hypotheses could be disentangled by single-cell ChIP-seq.
|
|
is consistent with allele-specific modification.
|
|
is consistent with allele-specific modification.
|
|
Finally if the modifications do not separate by either cell or allele,
|
|
Finally if the modifications do not separate by either cell or allele,
|
|
the colocation of these two marks is most likely occurring at the level
|
|
the colocation of these two marks is most likely occurring at the level
|
|
- of individual histones, with the heterogenously modified histone representing
|
|
|
|
|
|
+ of individual histones, with the heterogeneously modified histone representing
|
|
a distinct state.
|
|
a distinct state.
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
@@ -7803,7 +7855,7 @@ This section could probably use some citations
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
Microarrays, bead arrays, and similar assays produce raw data in the form
|
|
Microarrays, bead arrays, and similar assays produce raw data in the form
|
|
of fluorescence intensity measurements, with the each intensity measurement
|
|
of fluorescence intensity measurements, with the each intensity measurement
|
|
- proportional to the abundance of some fluorescently-labelled target DNA
|
|
|
|
|
|
+ proportional to the abundance of some fluorescently labelled target DNA
|
|
or RNA sequence that base pairs to a specific probe sequence.
|
|
or RNA sequence that base pairs to a specific probe sequence.
|
|
However, these measurements for each probe are also affected my many technical
|
|
However, these measurements for each probe are also affected my many technical
|
|
confounding factors, such as the concentration of target material, strength
|
|
confounding factors, such as the concentration of target material, strength
|
|
@@ -7902,7 +7954,7 @@ literal "false"
|
|
|
|
|
|
.
|
|
.
|
|
Quantile normalization is performed against a pre-generated set of quantiles
|
|
Quantile normalization is performed against a pre-generated set of quantiles
|
|
- learned from a collection of 850 publically available arrays sampled from
|
|
|
|
|
|
+ learned from a collection of 850 publicly available arrays sampled from
|
|
a wide variety of tissues in the Gene Expression Omnibus (GEO).
|
|
a wide variety of tissues in the Gene Expression Omnibus (GEO).
|
|
Each array's probe intensity distribution is normalized against these pre-gener
|
|
Each array's probe intensity distribution is normalized against these pre-gener
|
|
ated quantiles.
|
|
ated quantiles.
|
|
@@ -7947,7 +7999,7 @@ literal "false"
|
|
|
|
|
|
.
|
|
.
|
|
SCAN is truly single-channel in that it does not require a set of normalization
|
|
SCAN is truly single-channel in that it does not require a set of normalization
|
|
- paramters estimated from an external set of reference samples like fRMA
|
|
|
|
|
|
+ parameters estimated from an external set of reference samples like fRMA
|
|
does.
|
|
does.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
@@ -7960,7 +8012,7 @@ DNA methylation arrays are a relatively new kind of assay that uses microarrays
|
|
to measure the degree of methylation on cytosines in specific regions arrayed
|
|
to measure the degree of methylation on cytosines in specific regions arrayed
|
|
across the genome.
|
|
across the genome.
|
|
First, bisulfite treatment converts all unmethylated cytosines to uracil
|
|
First, bisulfite treatment converts all unmethylated cytosines to uracil
|
|
- (which then become thymine after amplication) while leaving methylated
|
|
|
|
|
|
+ (which then become thymine after amplification) while leaving methylated
|
|
cytosines unaffected.
|
|
cytosines unaffected.
|
|
Then, each target region is interrogated with two probes: one binds to
|
|
Then, each target region is interrogated with two probes: one binds to
|
|
the original genomic sequence and interrogates the level of methylated
|
|
the original genomic sequence and interrogates the level of methylated
|
|
@@ -8048,7 +8100,7 @@ noprefix "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
).
|
|
).
|
|
- This transformation results in values with better statistical perperties:
|
|
|
|
|
|
+ This transformation results in values with better statistical properties:
|
|
the unconstrained range is suitable for linear modeling, and the error
|
|
the unconstrained range is suitable for linear modeling, and the error
|
|
distributions are more normal.
|
|
distributions are more normal.
|
|
Hence, most linear modeling and other statistical testing on methylation
|
|
Hence, most linear modeling and other statistical testing on methylation
|
|
@@ -8171,7 +8223,7 @@ on of TX and AR samples was considered.
|
|
The ADNR samples were included during normalization but excluded from all
|
|
The ADNR samples were included during normalization but excluded from all
|
|
classifier training and validation.
|
|
classifier training and validation.
|
|
This ensures that the performance on internal and external validation sets
|
|
This ensures that the performance on internal and external validation sets
|
|
- is directly comparable, since both are performing the same task: distinguising
|
|
|
|
|
|
+ is directly comparable, since both are performing the same task: distinguishing
|
|
TX from AR.
|
|
TX from AR.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
@@ -8226,9 +8278,9 @@ literal "false"
|
|
\end_inset
|
|
\end_inset
|
|
|
|
|
|
.
|
|
.
|
|
- When evaluting internal validation performance, only the 157 internal samples
|
|
|
|
- were normalized; when evaluating external validation performance, all 157
|
|
|
|
- internal samples and 75 external samples were normalized together.
|
|
|
|
|
|
+ When evaluating internal validation performance, only the 157 internal
|
|
|
|
+ samples were normalized; when evaluating external validation performance,
|
|
|
|
+ all 157 internal samples and 75 external samples were normalized together.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
@@ -8248,8 +8300,17 @@ Generating custom fRMA vectors for hthgu133pluspm array platform
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
In order to enable fRMA normalization for the hthgu133pluspm array platform,
|
|
In order to enable fRMA normalization for the hthgu133pluspm array platform,
|
|
- custom fRMA normalization vectors were trained using the frmaTools package
|
|
|
|
-
|
|
|
|
|
|
+ custom fRMA normalization vectors were trained using the
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+frmaTools
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ package
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
key "McCall2011"
|
|
key "McCall2011"
|
|
@@ -8306,15 +8367,15 @@ Put code on Github and reference it.
|
|
To investigate the whether DNA methylation could be used to distinguish
|
|
To investigate the whether DNA methylation could be used to distinguish
|
|
between healthy and dysfunctional transplants, a data set of 78 Illumina
|
|
between healthy and dysfunctional transplants, a data set of 78 Illumina
|
|
450k methylation arrays from human kidney graft biopsies was analyzed for
|
|
450k methylation arrays from human kidney graft biopsies was analyzed for
|
|
- differential metylation between 4 transplant statuses: healthy transplant
|
|
|
|
|
|
+ differential methylation between 4 transplant statuses: healthy transplant
|
|
(TX), transplants undergoing acute rejection (AR), acute dysfunction with
|
|
(TX), transplants undergoing acute rejection (AR), acute dysfunction with
|
|
- no rejection (ADNR), and chronic allograpft nephropathy (CAN).
|
|
|
|
|
|
+ no rejection (ADNR), and chronic allograft nephropathy (CAN).
|
|
The data consisted of 33 TX, 9 AR, 8 ADNR, and 28 CAN samples.
|
|
The data consisted of 33 TX, 9 AR, 8 ADNR, and 28 CAN samples.
|
|
The uneven group sizes are a result of taking the biopsy samples before
|
|
The uneven group sizes are a result of taking the biopsy samples before
|
|
the eventual fate of the transplant was known.
|
|
the eventual fate of the transplant was known.
|
|
Each sample was additionally annotated with a donor ID (anonymized), Sex,
|
|
Each sample was additionally annotated with a donor ID (anonymized), Sex,
|
|
- Age, Ethnicity, Creatinine Level, and Diabetes diagnosois (all samples
|
|
|
|
- in this data set came from patients with either Type 1 or Type 2 diabetes).
|
|
|
|
|
|
+ Age, Ethnicity, Creatinine Level, and Diabetes diagnosis (all samples in
|
|
|
|
+ this data set came from patients with either Type 1 or Type 2 diabetes).
|
|
|
|
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
@@ -8668,7 +8729,7 @@ literal "false"
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
From the M-values, a series of parallel analyses was performed, each adding
|
|
From the M-values, a series of parallel analyses was performed, each adding
|
|
- additional steps into the model fit to accomodate a feature of the data
|
|
|
|
|
|
+ additional steps into the model fit to accommodate a feature of the data
|
|
(see Table
|
|
(see Table
|
|
\begin_inset CommandInset ref
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
LatexCommand ref
|
|
@@ -10029,7 +10090,7 @@ noprefix "false"
|
|
Comparing RMA to each of the 5 fRMA normalizations, the distribution of
|
|
Comparing RMA to each of the 5 fRMA normalizations, the distribution of
|
|
log ratios is somewhat wide, indicating that the normalizations disagree
|
|
log ratios is somewhat wide, indicating that the normalizations disagree
|
|
on the expression values of a fair number of probe sets.
|
|
on the expression values of a fair number of probe sets.
|
|
- In contrast, comparisons of fRMA against fRMA, the vast mojority of probe
|
|
|
|
|
|
+ In contrast, comparisons of fRMA against fRMA, the vast majority of probe
|
|
sets have very small log ratios, indicating a very high agreement between
|
|
sets have very small log ratios, indicating a very high agreement between
|
|
the normalized values generated by the two normalizations.
|
|
the normalized values generated by the two normalizations.
|
|
This shows that the fRMA normalization's behavior is not very sensitive
|
|
This shows that the fRMA normalization's behavior is not very sensitive
|
|
@@ -10546,8 +10607,8 @@ Mean-variance trend modeling in methylation array data.
|
|
\series default
|
|
\series default
|
|
The estimated log2(standard deviation) for each probe is plotted against
|
|
The estimated log2(standard deviation) for each probe is plotted against
|
|
the probe's average M-value across all samples as a black point, with some
|
|
the probe's average M-value across all samples as a black point, with some
|
|
- transparency to make overplotting more visible, since there are about 450,000
|
|
|
|
- points.
|
|
|
|
|
|
+ transparency to make over-plotting more visible, since there are about
|
|
|
|
+ 450,000 points.
|
|
Density of points is also indicated by the dark blue contour lines.
|
|
Density of points is also indicated by the dark blue contour lines.
|
|
The prior variance trend estimated by eBayes is shown in light blue, while
|
|
The prior variance trend estimated by eBayes is shown in light blue, while
|
|
the lowess trend of the points is shown in red.
|
|
the lowess trend of the points is shown in red.
|
|
@@ -10602,7 +10663,7 @@ noprefix "false"
|
|
M-values of +4 and -4.
|
|
M-values of +4 and -4.
|
|
These modes correspond to methylation sites that are nearly 100% methylated
|
|
These modes correspond to methylation sites that are nearly 100% methylated
|
|
and nearly 100% unmethylated, respectively.
|
|
and nearly 100% unmethylated, respectively.
|
|
- The strong bomodality indicates that a majority of probes interrogate sites
|
|
|
|
|
|
+ The strong bimodality indicates that a majority of probes interrogate sites
|
|
that fall into one of these two categories.
|
|
that fall into one of these two categories.
|
|
The points in between these modes represent sites that are either partially
|
|
The points in between these modes represent sites that are either partially
|
|
methylated in many samples, or are fully methylated in some samples and
|
|
methylated in many samples, or are fully methylated in some samples and
|
|
@@ -10622,8 +10683,7 @@ noprefix "false"
|
|
|
|
|
|
).
|
|
).
|
|
However, the uptick in the center is interesting: it indicates that sites
|
|
However, the uptick in the center is interesting: it indicates that sites
|
|
- that are not constitutitively methylated or unmethylated have a higher
|
|
|
|
- variance.
|
|
|
|
|
|
+ that are not constitutively methylated or unmethylated have a higher variance.
|
|
This could be a genuine biological effect, or it could be spurious noise
|
|
This could be a genuine biological effect, or it could be spurious noise
|
|
that is only observable at sites with varying methylation.
|
|
that is only observable at sites with varying methylation.
|
|
\end_layout
|
|
\end_layout
|
|
@@ -10708,7 +10768,7 @@ noprefix "false"
|
|
This shows that the observations with extreme M-values have been appropriately
|
|
This shows that the observations with extreme M-values have been appropriately
|
|
down-weighted to account for the fact that the noise in those observations
|
|
down-weighted to account for the fact that the noise in those observations
|
|
has been amplified by the non-linear M-value transformation.
|
|
has been amplified by the non-linear M-value transformation.
|
|
- In turn, this gives relatively more weight to observervations in the middle
|
|
|
|
|
|
+ In turn, this gives relatively more weight to observations in the middle
|
|
region, which are more likely to correspond to probes measuring interesting
|
|
region, which are more likely to correspond to probes measuring interesting
|
|
biology (not constitutively methylated or unmethylated).
|
|
biology (not constitutively methylated or unmethylated).
|
|
\end_layout
|
|
\end_layout
|
|
@@ -12314,7 +12374,7 @@ noprefix "false"
|
|
).
|
|
).
|
|
In analysis C, the trend is still estimated at the probe level, but instead
|
|
In analysis C, the trend is still estimated at the probe level, but instead
|
|
of estimating a single variance value shared across all observations for
|
|
of estimating a single variance value shared across all observations for
|
|
- a given probe, the voom method computes an initial estiamte of the variance
|
|
|
|
|
|
+ a given probe, the voom method computes an initial estimate of the variance
|
|
for each observation individually based on where its model-fitted M-value
|
|
for each observation individually based on where its model-fitted M-value
|
|
falls on the trend line and then assigns inverse-variance weights to model
|
|
falls on the trend line and then assigns inverse-variance weights to model
|
|
the difference in variance between observations.
|
|
the difference in variance between observations.
|
|
@@ -12367,21 +12427,21 @@ and
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-The significant association of diebetes diagnosis with sample quality is
|
|
|
|
|
|
+The significant association of diabetes diagnosis with sample quality is
|
|
interesting.
|
|
interesting.
|
|
The samples with Type 2 diabetes tended to have more variation, averaged
|
|
The samples with Type 2 diabetes tended to have more variation, averaged
|
|
across all probes, than those with Type 1 diabetes.
|
|
across all probes, than those with Type 1 diabetes.
|
|
- This is consistent with the consensus that type 2 disbetes and the associated
|
|
|
|
|
|
+ This is consistent with the consensus that type 2 diabetes and the associated
|
|
metabolic syndrome represent a broad dysregulation of the body's endocrine
|
|
metabolic syndrome represent a broad dysregulation of the body's endocrine
|
|
- signalling related to metabolism [citation needed].
|
|
|
|
|
|
+ signaling related to metabolism [citation needed].
|
|
This dysregulation could easily manifest as a greater degree of variation
|
|
This dysregulation could easily manifest as a greater degree of variation
|
|
in the DNA methylation patterns of affected tissues.
|
|
in the DNA methylation patterns of affected tissues.
|
|
- In contrast, Type 1 disbetes has a more specific cause and effect, so a
|
|
|
|
|
|
+ In contrast, Type 1 diabetes has a more specific cause and effect, so a
|
|
less variable methylation signature is expected.
|
|
less variable methylation signature is expected.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-This preliminary anlaysis suggests that some degree of differential methylation
|
|
|
|
|
|
+This preliminary analysis suggests that some degree of differential methylation
|
|
exists between TX and each of the three types of transplant disfunction
|
|
exists between TX and each of the three types of transplant disfunction
|
|
studied.
|
|
studied.
|
|
Hence, it may be feasible to train a classifier to diagnose transplant
|
|
Hence, it may be feasible to train a classifier to diagnose transplant
|
|
@@ -12451,7 +12511,7 @@ researcher degree of freedom
|
|
|
|
|
|
into the analysis, since the generated normalization vectors now depend
|
|
into the analysis, since the generated normalization vectors now depend
|
|
on the choice of batch size based on vague selection criteria and instinct,
|
|
on the choice of batch size based on vague selection criteria and instinct,
|
|
- which can unintentionally inproduce bias if the researcher chooses a batch
|
|
|
|
|
|
+ which can unintentionally introduce bias if the researcher chooses a batch
|
|
size based on what seems to yield the most favorable downstream results
|
|
size based on what seems to yield the most favorable downstream results
|
|
|
|
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
@@ -12581,9 +12641,15 @@ g for gene expression profiling by globin reduction of peripheral blood
|
|
status open
|
|
status open
|
|
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_layout Plain Layout
|
|
-Chapter author list: https://tex.stackexchange.com/questions/156862/displaying-aut
|
|
|
|
-hor-for-each-chapter-in-book Every chapter gets an author list, which may
|
|
|
|
- or may not be part of a citation to a published/preprinted paper.
|
|
|
|
|
|
+Chapter author list:
|
|
|
|
+\begin_inset CommandInset href
|
|
|
|
+LatexCommand href
|
|
|
|
+target "https://tex.stackexchange.com/questions/156862/displaying-author-for-each-chapter-in-book"
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ Every chapter gets an author list, which may or may not be part of a citation
|
|
|
|
+ to a published/preprinted paper.
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\end_inset
|
|
\end_inset
|
|
@@ -12814,9 +12880,26 @@ RNA-seq Library Preparation
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Sequencing libraries were prepared with 200ng total RNA from each sample.
|
|
|
|
- Polyadenylated mRNA was selected from 200 ng aliquots of cynomologus blood-deri
|
|
|
|
-ved total RNA using Ambion Dynabeads Oligo(dT)25 beads (Invitrogen) following
|
|
|
|
|
|
+\begin_inset Flex TODO Note (inline)
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Add protected spaces where appropriate to prevent unwanted line breaks.
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\begin_layout Standard
|
|
|
|
+Sequencing libraries were prepared with 200
|
|
|
|
+\begin_inset space ~
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ng total RNA from each sample.
|
|
|
|
+ Polyadenylated mRNA was selected from 200 ng aliquots of cynomolgus blood-deriv
|
|
|
|
+ed total RNA using Ambion Dynabeads Oligo(dT)25 beads (Invitrogen) following
|
|
manufacturer’s recommended protocol.
|
|
manufacturer’s recommended protocol.
|
|
PolyA selected RNA was then combined with 8 pmol of HBA1/2 (site 1), 8
|
|
PolyA selected RNA was then combined with 8 pmol of HBA1/2 (site 1), 8
|
|
pmol of HBA1/2 (site 2), 12 pmol of HBB (site 1) and 12 pmol of HBB (site
|
|
pmol of HBA1/2 (site 2), 12 pmol of HBB (site 1) and 12 pmol of HBB (site
|
|
@@ -12901,9 +12984,37 @@ literal "false"
|
|
|
|
|
|
.
|
|
.
|
|
Counts of uniquely mapped reads were obtained for every gene in each sample
|
|
Counts of uniquely mapped reads were obtained for every gene in each sample
|
|
- with the “featureCounts” function from the Rsubread package, using each
|
|
|
|
- of the three possibilities for the “strandSpecific” option: sense, antisense,
|
|
|
|
- and unstranded
|
|
|
|
|
|
+ with the
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+featureCounts
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ function from the
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+Rsubread
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ package, using each of the three possibilities for the
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+strandSpecific
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ option: sense, antisense, and unstranded
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
key "Liao2014"
|
|
key "Liao2014"
|
|
@@ -12947,8 +13058,17 @@ Normalization and Exploratory Data Analysis
|
|
\end_layout
|
|
\end_layout
|
|
|
|
|
|
\begin_layout Standard
|
|
\begin_layout Standard
|
|
-Libraries were normalized by computing scaling factors using the edgeR package’s
|
|
|
|
- Trimmed Mean of M-values method
|
|
|
|
|
|
+Libraries were normalized by computing scaling factors using the
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+edgeR
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ package’s Trimmed Mean of M-values method
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
key "Robinson2010"
|
|
key "Robinson2010"
|
|
@@ -12972,8 +13092,28 @@ literal "false"
|
|
In order to assess the effect of blocking on reproducibility, Pearson and
|
|
In order to assess the effect of blocking on reproducibility, Pearson and
|
|
Spearman correlation coefficients were computed between the logCPM values
|
|
Spearman correlation coefficients were computed between the logCPM values
|
|
for every pair of libraries within the globin-blocked (GB) and unblocked
|
|
for every pair of libraries within the globin-blocked (GB) and unblocked
|
|
- (non-GB) groups, and edgeR's “estimateDisp” function was used to compute
|
|
|
|
- negative binomial dispersions separately for the two groups
|
|
|
|
|
|
+ (non-GB) groups, and
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+edgeR
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+'s
|
|
|
|
+\begin_inset Flex Code
|
|
|
|
+status open
|
|
|
|
+
|
|
|
|
+\begin_layout Plain Layout
|
|
|
|
+estimateDisp
|
|
|
|
+\end_layout
|
|
|
|
+
|
|
|
|
+\end_inset
|
|
|
|
+
|
|
|
|
+ function was used to compute negative binomial dispersions separately for
|
|
|
|
+ the two groups
|
|
\begin_inset CommandInset citation
|
|
\begin_inset CommandInset citation
|
|
LatexCommand cite
|
|
LatexCommand cite
|
|
key "Chen2014"
|
|
key "Chen2014"
|