Use Of Genome To Analyze Physiological Conditions Discussion Help


– 2 DB responses

– 50 words each

– write it in your own worlds

– read the article again if needed

– this is very simple and you don’t have to go into too much details. here is an example of how you should write the responses (DO NOT COPY IT):

example: Hi, This was an amazing article! I was also intrigued by the depth and quality of information that was provided. I agree that the Riskogram algorithm is another fascinating tool, and I am amazed how they can relate every thing to the iPOPs. I wouldn’t want to get blood drawn on a regular basis either! I am also all for supporting research, but this study takes a large toll.

Respond the two discussion board posts
Minimum 50 words EACH; max 100 words
This should be a simple response in word own worlds, and it should correlate with your
initial discussion board post
You can refer back to the article in order to make the responses
Initial post that you wrote:
Hi Class!
What surprised me in the article is how advent for genome sequencing plus analysis of the
physiological conditions have proved to be very powerful. This is shown through different
technologies whereby a generation of high superiority genome sequence. The genomic DNA
then becomes exposed to the WGS through the technologies that come from the Genomics
that are complete. The other thing that also surprised me is how exome sequencing is done
by different technologies, for instance, the usage of genotyping arrays plus RNA sequencing.
Personally, I have the curiosity of knowing more in relation to what was found from the study
since some facts and figures were written down. I would also give some of my samples to be
tested but under the condition that the information would be kept private and confidential.
This would help me to understand the study even more. I would also love to know the signs
and symptoms of some diseases and if the diseases had some cure or they could not be
cured. I would also want to know the disease states and the personal variants for the RNA
Respond these two posts: Minimum 50 words EACH; max 100 words
Post 1: After reading the Snydrome article, I found two aspects personally interesting. The
first interesting find from this article was that the researcher was able to use a single
algorithm, called the RiskOGram, to assess genetic disease risks. This algorithm combined
many alleles that were linked to the disease risk to come up with the specific diseases and
disorders the patient was at high risk for. The patient in this case had a higher risk for
coronary artery disease and a much higher risk of basal cell carcinoma, hypertriglyceridemia,
and type II diabetes. As scary as these risks are, I would rather know so I could prepare myself
and understand the preventative measures that could be taken.
Another interesting point I found was when the researcher observed a certain mutation in the
patient, called TERT, and had knowledge that it was associated with the disease aplastic
anemia. The researcher measured the patient’s telomeres after finding this mutation, but saw
little differences in length than what he expected. He also noted that the patient’s 83-year old
mother also had the TERT mutation but did not express the symptoms of aplastic anemia. I
thought this was interesting because previous research believed that if you had the TERT
mutation, it meant you were going to suffer from the disease at some point, but this finding
does suggest that context and environment play roles in this as well. (Word Count: 233).
Post 2: Hello everyone,
One thing that surprised me about Dr. Michael Snyder’s research was that they were able to
identify sequences not present in the reference sequence of his genome. This was interesting
because they were able to confirm that there are a number of undocumented genetic regions
that exist in the human genome but that they can also be identified using deep sequencing
techniques. Another thing that interested me was that they were able to identify and examine
a number of genes for medical relevance. These included a mutation (E366K) in
the SERPINA1 gene previously known in the subject, a damaging mutation in TERT,
associated with acquired aplastic anemia (Links to an external site.)Links to an external
site.and variants associated with hypertriglyceridemia and diabetes. Lastly, I thought it was
interesting how they were able to associate SNVs with TF binding. They were able to identify
14,922 SNVs that lie within 36 TFs with binding factors, this had not been previously
attempted which was intriguing.
Personal Omics Profiling
Reveals Dynamic Molecular
and Medical Phenotypes
Rui Chen,1,11 George I. Mias,1,11 Jennifer Li-Pook-Than,1,11 Lihua Jiang,1,11 Hugo Y.K. Lam,1,12 Rong Chen,2,12
Elana Miriami,1 Konrad J. Karczewski,1 Manoj Hariharan,1 Frederick E. Dewey,3 Yong Cheng,1 Michael J. Clark,1
Hogune Im,1 Lukas Habegger,6,7 Suganthi Balasubramanian,6,7 Maeve O’Huallachain,1 Joel T. Dudley,2
Sara Hillenmeyer,1 Rajini Haraksingh,1 Donald Sharon,1 Ghia Euskirchen,1 Phil Lacroute,1 Keith Bettinger,1 Alan P. Boyle,1
Maya Kasowski,1 Fabian Grubert,1 Scott Seki,2 Marco Garcia,2 Michelle Whirl-Carrillo,1 Mercedes Gallardo,9,10
Maria A. Blasco,9 Peter L. Greenberg,4 Phyllis Snyder,1 Teri E. Klein,1 Russ B. Altman,1,5 Atul J. Butte,2 Euan A. Ashley,3
Mark Gerstein,6,7,8 Kari C. Nadeau,2 Hua Tang,1 and Michael Snyder1,*
of Genetics, Stanford University School of Medicine
of Systems Medicine and Division of Immunology and Allergy, Department of Pediatrics
3Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine
4Division of Hematology, Department of Medicine
5Department of Bioengineering
Stanford University, Stanford, CA 94305, USA
6Program in Computational Biology and Bioinformatics
7Department of Molecular Biophysics and Biochemistry
8Department of Computer Science
Yale University, New Haven, CT 06520, USA
9Telomeres and Telomerase Group, Molecular Oncology Program, Spanish National Cancer Centre (CNIO), Madrid E-28029, Spain
10Life Length, Madrid E-28003, Spain
11These authors contributed equally to this work
12Present address: Personalis, Palo Alto, CA 94301, USA
DOI 10.1016/j.cell.2012.02.009
Personalized medicine is expected to benefit from
combining genomic information with regular monitoring of physiological states by multiple highthroughput methods. Here, we present an integrative
personal omics profile (iPOP), an analysis that
combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single
individual over a 14 month period. Our iPOP analysis
revealed various medical risks, including type 2
diabetes. It also uncovered extensive, dynamic
changes in diverse molecular components and
biological pathways across healthy and diseased
conditions. Extremely high-coverage genomic
and transcriptomic data, which provide the basis
of our iPOP, revealed extensive heteroallelic
changes during healthy and diseased states and an
unexpected RNA editing mechanism. This study
demonstrates that longitudinal iPOP can be used
to interpret healthy and diseased states by connecting genomic information with additional dynamic
omics activity.
Personalized medicine aims to assess medical risks, monitor,
diagnose and treat patients according to their specific genetic
composition and molecular phenotype. The advent of genome
sequencing and the analysis of physiological states has proven
to be powerful (Cancer Genome Atlas Research Network,
2011). However, its implementation for the analysis of otherwise
healthy individuals for estimation of disease risk and medical
interpretation is less clear. Much of the genome is difficult to
interpret and many complex diseases, such as diabetes, neurological disorders and cancer, likely involve a large number of
different genes and biological pathways (Ashley et al., 2010;
Grayson et al., 2011; Li et al., 2011), as well as environmental
contributors that can be difficult to assess. As such, the combination of genomic information along with a detailed molecular
analysis of samples will be important for predicting, diagnosing
and treating diseases as well as for understanding the onset, progression, and prevalence of disease states (Snyder et al., 2009).
Presently, healthy and diseased states are typically followed
using a limited number of assays that analyze a small number
of markers of distinct types. With the advancement of many
new technologies, it is now possible to analyze upward of 105
molecular constituents. For example, DNA microarrays have
allowed the subcategorization of lymphomas and gliomas
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1293
(Mischel et al., 2003), and RNA sequencing (RNA-Seq) has
identified breast cancer transcript isoforms (Li et al., 2011; van
der Werf et al., 2007; Wu et al., 2010; Lapuk et al., 2010).
Although transcriptome and RNA splicing profiling are powerful
and convenient, they provide a partial portrait of an organism’s
physiological state. Transcriptomic data, when combined with
genomic, proteomic, and metabolomic data are expected to
provide a much deeper understanding of normal and diseased
states (Snyder et al., 2010). To date, comprehensive integrative
omics profiles have been limited and have not been applied to
the analysis of generally healthy individuals.
To obtain a better understanding of: (1) how to generate an
integrative personal omics profile (iPOP) and examine as many
biological components as possible, (2) how these components
change during healthy and diseased states, and (3) how this
information can be combined with genomic information to
estimate disease risk and gain new insights into diseased states,
we performed extensive omics profiling of blood components
from a generally healthy individual over a 14 month period
(24 months total when including time points with other molecular
analyses). We determined the whole-genome sequence (WGS)
of the subject, and together with transcriptomic, proteomic, metabolomic, and autoantibody profiles, used this information to
generate an iPOP. We analyzed the iPOP of the individual over
the course of healthy states and two viral infections (Figure 1A).
Our results indicate that disease risk can be estimated by
a whole-genome sequence and by regularly monitoring health
states with iPOP disease onset may also be observed. The
wealth of information provided by detailed longitudinal iPOP revealed unexpected molecular complexity, which exhibited
dynamic changes during healthy and diseased states, and
provided insight into multiple biological processes. Detailed
omics profiling coupled with genome sequencing can provide
molecular and physiological information of medical significance.
This approach can be generalized for personalized health monitoring and medicine.
Overview of Personal Omics Profiling
Our overall iPOP strategy was to: (1) determine the genome
sequence at high accuracy and evaluate disease risks, (2)
monitor omics components over time and integrate the relevant
omics information to assess the variation of physiological states,
and (3) examine in detail the expression of personal variants
at the level of RNA and protein to study molecular complexity
and dynamic changes in diseased states.
We performed iPOP on blood components (peripheral blood
mononuclear cells [PBMCs], plasma and sera that are highly
accessible) from a 54-year-old male volunteer over the course
of 14 months (IRB-8629). The samples used for iPOP were taken
over an interval of 401 days (days 0–400). In addition, a complete
medical exam plus laboratory and additional tests were performed before the study officially launched (day 123) and blood
glucose was sampled multiple times after the comprehensive
omics profiling (days 401–602) (Figure 1A). Extensive sampling
was performed during two viral infections that occurred during
this period: a human rhinovirus (HRV) infection beginning on
1294 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
day 0 and a respiratory syncytial virus (RSV) infection starting
on day 289. A total of 20 time points were extensively analyzed
and a summary of the time course is indicated in Figure 1A.
The different types of analyses performed are summarized in
Figures 1B and 1C. These analyses, performed on PBMCs
and/or serum components, included WGS, complete transcriptome analysis (providing information about the abundance of
alternative spliced isoforms, heteroallelic expression, and RNA
edits, as well as expression of miRNAs at selected time points),
proteomic and metabolomic analyses, and autoantibody
profiles. An integrative analysis of these data highlights dynamic
omics changes and provides rich information about healthy and
diseased phenotypes.
Whole-Genome Sequencing
We first generated a high quality genome sequence of this
individual using a variety of different technologies. Genomic
DNA was subjected to deep WGS using technologies from
Complete Genomics (CG, 35 nt paired end) and Illumina
(100 nt paired end) at 150- and 120-fold total coverage, respectively, exome sequencing using three different technologies to
80- to 100-fold average coverage (see Extended Experimental
Procedures available online) and analysis using genotyping
arrays and RNA sequencing.
The vast majority of genomic sequences (91%) mapped to the
hg19 (GRCh37) reference genome. However, because of the
depth of our sequencing, we were able to identify sequences
not present in the reference sequence. Assembly of the
unmapped Illumina sequencing reads (60,434,531, 9% of the
total) resulted in 1,425 (of 29,751) contigs (spanning 26 Mb) overlapping with RefSeq gene sequences that were not annotated in
the hg19 reference genome. The remaining sequences appeared
unique, including 2,919 exons expressed in the RNA-Seq data
(e.g., Figure S1A). These results confirm that a large number of
undocumented genetic regions exist in individual human
genome sequences and can be identified by very deep
sequencing and de novo assembly (Li et al., 2010).
Our analysis detected many single nucleotide variants (SNVs),
small insertions and deletions (indels) and structural variants
(SVs; large insertions, deletions, and inversions relative to
hg19), (summarized in Table 1 and Experimental Procedures).
134,341 (4.1%) high-confidence SNVs are not present in
dbSNP, indicating that they are very rare or private to the
subject. Only 302 high-confidence indels reside within RefSeq
protein coding exons and exhibit enrichments in multiples of
three nucleotides (p < 0.0001). In addition to indels, 2,566
high-confidence SVs were identified (Experimental Procedures
and Table S1) and 8,646 mobile element insertions were identified (Stewart et al., 2011).
Analysis of the subject’s mother’s genome by comprehensive
genome sequencing (as above) and imputation allowed a
maternal/paternal chromosomal phasing of 92.5% of the
subject’s SNVs and indels (see Extended Experimental Procedures for details). Of 1,162 compound heterozygous mutations
in genes, 139 contain predicted compound heterozygous
deleterious and/or nonsense mutations. Phasing enabled the
assembly of a personal genome sequence of very high confidence (c.f., Rozowsky et al., 2011).
Figure 1. Summary of Study
(A) Time course summary. The subject was monitored for a total of 726 days, during which there were two infections (red bar, HRV; green bar, RSV). The black bar
indicates the period when the subject: (1) increased exercise, (2) ingested 81 mg of acetylsalicylic acid and ibuprofen tablets each day (the latter only during the
first 6 weeks of this period), and (3) substantially reduced sugar intake. Blue numbers indicate fasted time points.
(B) iPOP experimental design indicating the tissues and analyses involved in this study.
(C) Circos (Krzywinski et al., 2009) plot summarizing iPOP. From outer to inner rings: chromosome ideogram; genomic data (pale blue ring), structural variants >
50 bp (deletions [blue tiles], duplications [red tiles]), indels (green triangles); transcriptomic data (yellow ring), expression ratio of HRV infection to healthy states;
proteomic data (light purple ring), ratio of protein levels during HRV infection to healthy states; transcriptomic data (yellow ring), differential heteroallelic
expression ratio of alternative allele to reference allele for missense and synonymous variants (purple dots) and candidate RNA missense and synonymous edits
(red triangles, purple dots, orange triangles and green dots, respectively).
See also Figure S1.
WGS-Based Disease Risk Evaluation
We identified variants likely to be associated with increased
susceptibility to disease (Dewey et al., 2011). The list of high
confidence SNVs and indels was analyzed for rare alleles (50 bp)
In 1000G projecta



High confidence values are from variants identified across multiple platforms (Illumina and CG) and/or Exome and RNA-Seq data. Annotations were
based from variant call formatted (vcf) files for heterozygous calls: 0/1, reference (ref)/alternative (alt); 1/2, alt/alt and homozygous calls; 1/1, alt/alt; 1/,
(alt/alt-incomplete call). Polyphen-2 was used to identify the location of the SNVs.
1000G (1000 Genomes Project Consortium, 2010).
(homozygous) (Vaxillaire et al., 2008), and KCNJ11 (homozygous) (Hani et al., 1998) and TCF7 (heterozygous) (Erlich et al.,
Genetic disease risks were also assessed by the RiskOGram
algorithm, which integrates information from multiple alleles
associated with disease risk (Ashley et al., 2010) (Figure 2B).
This analysis revealed a modest elevated risk for coronary artery
disease and significantly elevated risk levels of basal cell carcinoma (Figure 2B), hypertriglyceridemia, and type 2 diabetes
(T2D) (Figures 2B and 2C).
In addition to coding region variants we also analyzed genomic
variants that may affect regulatory elements (transcription
factors [TF]), which had not been attempted previously (Data
S1). A total of 14,922 (of 234,980) SNVs lie in the motifs of 36
TFs known to be associated with the binding data (see Experimental Procedures), indicating that these are likely having a
direct effect on TF binding. Comparison of SNPs that alter
binding patterns of NFkB and Pol II sites (Kasowski et al.,
2010), also revealed a number of other interesting regulatory
variants, some of which are associated with human disease
(e.g., EDIL) (Sun et al., 2010) (Figure S1B).
Medical Phenotypes Monitoring
Based on the above analysis of medically relevant variants and
the RiskOGram, we monitored markers associated with highrisk disease phenotypes and performed additional medically
relevant assays.
Monitoring of glucose levels and HbA1c revealed the onset of
T2D as diagnosed by the subject’s physician (day 369, Figures
2A and 2C). The subject lacked many known factors associated
with diabetes (nonsmoker; BMI = 23.9 and 21.7 on day 0 and day
511, respectively) and glucose levels were normal for the first
1296 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
part of the study. However, glucose levels elevated shortly after
the RSV infection (after day 301) extending for several months
(Figure 2D). High levels of glucose were further confirmed using
glycated HbA1c measurements at two time points (days 329,
369) during this period (6.4% and 6.7%, respectively). After
a dramatic change in diet, exercise and ingestion of low doses
of acetylsalicylic acid a gradual decrease in glucose (to
93 mg/dl at day 602) and HbA1c levels to 4.7% was observed.
Insulin resistance was not evident at day 322. The patient was
negative for anti-GAD and anti-islet antibodies, and insulin levels
correlated well with the fasted and nonfasted states (Figure S2C),
consistent with T2D. These results indicate that a genome
sequence can be used to estimate disease risk in a healthy individual, and by monitoring traits associated with that disease,
disease markers can be detected and the phenotype treated.
The subject contained a TERT mutation previously associated
with aplastic anemia (Yamaguchi et al., 2005). However, measurements of telomere length suggested little or no decrease in
telomere length and modest increase in numbers of cells with
short telomeres relative to age-matched controls (Figures S2A
and S2B). Importantly, the patient and his 83-year-old mother
share the same mutation but neither exhibit symptoms of aplastic anemia, indicating that this mutation does not always result in
disease and is likely context specific in its effects.
Consistent with the elevated hypertriglyceridemia risk, triglycerides were found to be high (321 mg/dl) at the beginning of the
study. These levels were reduced (81–116 mg/dl) after regularly
taking simvastatin (20 mg/day).
We also examined the variants for their potential effects on
drug response (see Extended Experimental Procedures). Among
the alleles of interest, (Figure 2A and Table S2B) two genotypes
affecting the LPIN1 and SLC22A1 genes were associated with
Table 2. Summary of Disease-Related Rare Variants
Total high confidence rare SNVs
Damaging or possibly damaging
Putative loss-of-function SNVsa
Total high confidence rare indels
Coding indels
Frameshift indels
miRNA indels
miRNA target sequence indels
Putative loss-of-function indels
In curated Mendelian disease genes.
favorable (glucose lowering) responses to two diabetic drugs, rosiglitazone and metformin, respectively.
We followed the levels of 51 cytokines along with the C-reactive
protein (CRP) using ELISA assays, which revealed strong induction of proinflammatory cytokines and CRP during each infection
(Figures 2E and 2F). We also observed a spike of many cytokines
at day 12 after the RSV infection (day 301 overall). These data
define the physiological states and serve as a valuable reference
for the omic profiles integrated into a longitudinal map of healthy
and diseased states described in the next sections.
We also profiled autoantibodies during the HRV infection.
Plasma and serum samples from the first four time points
(days 123, 0, 4 and 21), along with plasma samples from 34
healthy controls were used to probe a protein microarray containing 9,483 unique human proteins spotted in duplicate. A total
of 884 antigens with increased reactivity (Data S2) in the candidate plasma relative to healthy controls were found (p < 0.01,
Benjamini-Hochberg p < 0.01). Among the potentially interesting
results was high reactivity with DOK6, an insulin receptor binding
protein (NCBI gene database). These results demonstrate that
autoantibodies can be monitored and that information relevant
to disease conditions can be found.
Dynamic Omics Analysis: Integrative Omics Profiling
of Molecular Responses
We profiled the levels of transcripts, proteins, and metabolites
across the HRV and RSV infections and healthy states using
a variety of approaches. RNA-Seq of 20 time points generated
over 2.67 billion uniquely mapped 101b paired-end reads
(123 million reads average per time point) and allowed for an
analysis of the molecular complexity of the transcriptome in
normal cells (PBMCs) at an unprecedented level. The relative
levels of 6,280 proteins were also measured at 14 time points
through differential labeling of samples using isobaric tandem
mass tags (TMT), followed by liquid chromatography and mass
spectrometry (LC-MS/MS) (Cox and Mann, 2010; Theodoridis
et al., 2011). A total of 3,731 PBMC proteins could be consistently monitored across most of the 14 time points (see Figure S3A and Data S3). In addition, 6,862 and 4,228 metabolite
peaks were identified for the HRV and RSV infection, and a total
of 1,020 metabolites were tracked for both infections (see Figure S4 and Data S4, [3]). Finally, as described below, we also
analyzed miRNAs during the HRV infection.
This wealth of omics information allowed us to examine
detailed dynamic trends related directly to the physiological
states of the individual and revealed enormous changes in
biological processes that occurred during healthy and diseased
states. For each profile (transcriptome, proteome, metabolome),
we systematically searched for two types of nonrandom
patterns: (1) correlated patterns over time and (2) single unusual
events (i.e., spikes that may occur at any given time point defined
as statistically significantly high or low signal instances compared to what would be expected by chance). To perform this
analysis, we developed a general scheme for integrated analysis
of data (see Figure S5 and Extended Experimental Procedures
for further details). We used a Fourier spectral analysis approach
that both normalizes the various omics data on equal basis for
identifying the common trends and features, and, also accounts
for data set variability, uneven sampling, and data gaps, in order
to detect real-time changes in any kind of omics activity at
the differential time points (see Supplemental Information).
Autocorrelations were calculated to assess nonrandomness
of the time-series (p < 0.05 one-tailed based on simulated
bootstrap nonparametric distribution by sampling with replacement of the original data, n > 100,000), with significant signals
classified as autocorrelated (I). The remaining data was searched
for spike events, which were classified as spike maxima (II)
or spike minima (III) (p < 0.05 one-tailed based on differences
from simulated, n > 100,000 random distribution of the timeseries). After classification, the data were agglomerated into hierarchical clusters (using correlation distance and average linkage)
of common patterns and biological relevance was assessed
through GO (Ashburner et al., 2000) analysis (Cytoscape [Smoot
et al., 2011], BiNGO [Maere et al., 2005] p < 0.05, BenjaminiHochberg [Benjamini and Hochberg, 1995] adjusted p < 0.05)
and pathway analysis (Reactome [Croft et al., 2011] functional
interaction [FI], networks including KEGG [Kanehisa and Goto,
2000; Smoot et al., 2011], p < 0.05, FDR < 0.05). The unified
framework approach was implemented on all the different data
sets both individually and in combination, and our results
revealed a number of differential changes that occurred both
during infectious states and the varying glucose states.
We first analyzed the different individual transcriptome, proteome (serum and PBMC) and metabolome data sets; the
proteome and metabolome results are presented in the Supplemental Information (Figures S3, S4, S6 and Data S3–S6). A total
of 19,714 distinct transcript isoforms (Wang et al., 2008) corresponding to 12,659 genes (Figure S1C) were tracked for the
entire time course, and their dynamic expression response
was classified into either autocorrelated (I) and spike sets, further
subdivided as displaying maxima (II) or minima (III) (Figure 3). The
clustering and enrichment analysis displayed a number of
interesting pathways in each class. In the autocorrelated group
(Figure 3B, [I]; see also Figure S6A and Data S6, [1 and 2]), we
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1297
Figure 2. Medical Findings
(A) High interest disease- and drug-related variants in the subject’s genome.
(B) RiskGraph of the top 20 diseases with the highest posttest probabilities. For each disease, the arrow represents the pretest probability according to the
subject’s age, gender, and ethnicity. The line represents the posttest probability after incorporating the subject’s genome sequence. Listed to the right are the
numbers of independent disease-associated SNVs used to calculate the subject’s posttest probability.
(C) RiskOGram of type 2 diabetes. The RiskOGram illustrates how the subject’s posttest probability of T2D was calculated using 28 independent SNVs. The
middle graph displays the posttest probability. The left side shows the associated genes, SNVs, and the subject’s genotypes. The right side shows the likelihood
ratio (LR), number of studies, cohort sizes, and the posttest probability.
(D) Blood glucose trend. Measurements were taken from samples analyzed at either nonfasted or fasted states; the nonfasted states (all but days 186,
322, 329, and 369 and after day 400) were at a fixed time after a constant meal. Data was presented as moving average with a window of 15 days. Red
1298 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
found two main trends: an upward trend (2,023 genes), following
the onset of the RSV infection, and a similar coincidental
downward trend (2,207 genes). The upward autocorrelated
trend revealed a number of pathways as enriched (p < 0.002,
FDR < 0.05), including protein metabolism and influenza life
cycle. Additionally, the downward autocorrelation cluster
showed a multitude of enriched pathways (p < 0.008, FDR <
0.05), such as TCR signaling in naive CD4+ T cells, lysosome,
B cell signaling, androgen regulation, and of particular interest,
insulin signaling/response pathways. These different pathways,
which are activated as a response to an immune infection, often
share common genes and additionally we observe many genes
hitherto unknown to be involved in these pathways but displaying the same trend. Furthermore, we observed that the downward trend, that began with the onset of the RSV infection and
appeared to accelerate after day 307, coincided with the beginning of the observed elevated glucose levels in the subject.
In the dynamic spike class we again saw patterns that were
concordant with phenotypes (Figure 3B, [II] and [III]; see also
Figure S6A and Data S6, [3–14]). A set of expression spikes
displaying maxima (547 genes), that are common to the onset
of both the RSV and HRV infections are associated with phagosome, immune processes and phagocytosis, (p < 1 3 10 4,
FDR < 6 3 10 3). Furthermore, a cluster that exhibits an elevated
spike at the onset of the RSV infection involves the major histocompatibility genes (p < 7 3 10 4, Benjamini-Hochberg adjusted
p < 0.03). A large number of genes with a coexpression pattern
common to both infections in the time course have yet to be
implicated in known pathways and provide possible connections
related to immune response. Finally, our spike class displaying
minima showed a distinct cluster (1,535 genes) singular to day
307 (day 14 of the RSV infection), associated with TCR signaling
again, TGF receptors, and T cell and insulin signaling pathways
(p < 0.02, FDR < 0.03). Overall, the transcriptome analysis
captures the dynamic response of the body responding to infection as also evidenced by our cytokine measurements, and also
can monitor health changes over long periods of time, with
various trends.
To further leverage the transcriptome and genome data, we
performed an integrated analysis of transcriptome, proteomic
and metabolomics data for each time point, observing how this
corresponded to the varying physiological states monitored as
described in the above sections. Because of the availability of
many time points through the course of infection, we examined
in detail the onset of the RSV infection, as well as extended our
complete dynamics omics profile during the times that our
subject began exhibiting high glucose levels. Figure 4 shows
an integrated interpretation of omics data (see also Figure S6B
and Data S7), where all trends are combined for each omics
data set and the common patterns emerge providing complementary information. In addition to the common patterns
observed in our transcriptome analysis, new patterns emerged,
some unique to protein data, some to metabolite, and some
common to all. In particular we found the following interesting
results: for autocorrelated clusters we found the same trends
as observed in the transcriptome, additionally augmented
with concordant protein expressions. Pathways such as the
phagosome, lysosome, protein processing in endoplasmic reticulum, and insulin pathways emerged as significantly enriched
(p < 0.002, FDR < 0.0075), and showed a downward trend postinfection, and further accelerated after 3 weeks following the
initial onset of the RSV infection (this cluster comprised of
1,452 transcriptomic and 69 proteomic components, corresponding to 1,444 genes). The elevated spike class showed a
maxima cluster on day 18 post RSV infection (one time point
after the cytokine maximum), with enrichment in pathways
such as the spliceosome, glucose regulation of insulin secretion,
and various pathways related to a stress response (p < 1 3 10 4,
FDR < 0.02)—this cluster included 1,956 transcriptomic, 571
proteomic and 23 metabolomic components, corresponding to
2,344 genes. Even though current proteomic information is
more limited than the full transcriptome because it follows fewer
components, as evidenced in Figure 4 (II), several pathways,
including the glucose regulation of insulin secretion pathway,
clearly emerge from the proteomic information and would not
have been observed by only monitoring the transcriptome.
Additionally, in this cluster we find significant GO enrichment in
splicing and metabolic processes (p < 6 3 10 47, BenjaminiHochberg adjusted p < 10 45). Furthermore, inspection of
metabolites reveals 23 that show the same exact trend (i.e.,
spikes at day 18 post RSV infection); at least one, lauric acid
has been implicated in fatty acid metabolism and insulin regulatory pathways (Kusunoki et al., 2007). Finally, we observe minima
spikes as well, with yet another interesting group on day 18,
which showed downregulation in several pathways (p < 0.003,
FDR < 0.05), such as the formation of platelet plug. This cluster
displayed a high degree of synergy between the various omics
data, comprised of 3,237 transcriptomic and 761 proteomic
components corresponding 3,400 genes and 83 metabolomic
In summary, our integrated approach revealed a clear
systemic response to the RSV infection following its onset and
postinfection response, including a pronounced response
evident at day 18 post RSV infection. A variety of infection/stress
response related pathways were affected along with those
associated to the high glucose levels in the later time points,
including insulin response pathways.
Dynamic Omics Analysis: Extensive Heteroallelic
Variation and RNA Editing
The considerable amount of transcriptome and proteome data
allowed us to analyze and follow changes in allele-specific
and green arrows and bars indicate the times of the HRV and RSV infections, respectively. Black arrows and bars indicate the period with life style
(E) C-reactive protein trend line. Error bars represent standard deviation of three assays.
(F) Serum cytokine profiles. Red box and day number, HRV infection; green box and day number, RSV infection; question mark, elevated cytokine levels indicating an unknown event at day 301. Red is increased cytokine levels.
See also Figure S2.
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1299
Figure 3. Transcriptome Time Course Analysis
(A) Summary of approach for identification of differentially expressed components. The various omics sets were processed through a common framework
involving spectral analysis, clustering, and pathway enrichment analysis.
(B) Pattern classification. The different emergent patterns from the analysis of the transcriptome for the entire time course are displayed for the autocorrelation (I),
spike maxima (II), and spike minima (III) classes. For different clusters, examples of gene connections in selected pathways based on Reactome (Croft et al., 2011)
FI (Cytoscape plugin [Smoot et al., 2011]) are shown as networks. Example GO (Ashburner et al., 2000) enrichment analysis results from Cytoscape (Smoot et al.,
2011) BiNGO (Maere et al., 2005) plugin and pathway enrichment results (Reactome FI [Croft et al., 2011]) are included.
See also Figures S5 and S6.
1300 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
expression (ASE), splicing, and editing at the RNA and protein
levels during healthy and diseased states.
Of the 49,017 genomic variants associated with coding or
UTR regions (Table 1), 12,785 (26%) were expressed in PBMCs
(R40 read coverage; Table S3). A total of 8,509 of the variants
are heterozygous (1,113 missense) and the remainder (4,686;
684 missense) are homozygous. Eight of the 83 nonsense mutations were expressed indicating that not all nonsense mutations
result in transcript loss.
The numerous heterozygous variants allowed an analysis of
the dynamics of differential ASE, (shrunk ratios, Experimental
Procedures; Figures 5A and S7B) in PBMCs during healthy and
diseased states. We found 497 and 1,047 genes that exhibited
differential ASE during HRV and RSV infection, respectively
(posterior probability R 0.75, beta-binomial model; R 40 reads,
R 7 time points); many of these are immune response genes,
e.g., PADI4 and PLOD1 (Figure 5B). Among the differential ASE
sites 100 and 218 were specific to HRV and RSV infected states,
respectively (Figures 5C and 5D). Differential ASE genes in the
HRV compared to healthy phase were enriched for those encoding SNARE vesicular transport proteins (DAVID analysis; Benjamini p < 0.05). Summing over all computed ASE alternative to
total ratios revealed that nonreference heteroallelic variants
were expressed at 98% of reference variants. The expression
of over 50 heterozygous variants, including some of the rare/
private SNVs (which form 0.72% of the genomic total), and differentially expressed variants (SVIL and TRIM5), was confirmed by
Sanger cDNA sequencing and/or digital PCR (Hindson et al.,
2011) of cDNA (Figures 5B and S7). Overall, these results
demonstrate that differential ASE is pervasive in humans and is
particularly distinct during healthy and infected states, with
many of these changes residing in immune response genes.
The depth of our RNA-Seq data enabled us to re-evaluate
the extent of RNA editing (Figure 6 and Data S8 and S11A),
typically an adenosine to inosine (A-to-I) conversion (Li et al.,
2009b) or infrequently cytidine to uridine (C-to-U), in normal
human cells. We found 2,376 high-confidence coding-associated RNA edits, including 795 A-to-I (A-to-G) and 277 C-to-U
deamination-like edits (Figure 6A). A total of 587 edits in 175
genes were predicted to cause amino acid substitutions
(Polyphen-2 [Adzhubei et al., 2010]); the remainder were nonsense (11), synonymous (435), or located in 50 /30 UTRs (103/
1,240). Ten edited bases causing amino acid substitutions
were validated by Sanger cDNA sequencing and/or digital
droplet PCR, as well as by identification of their peptide counterparts by mass spectrometry (Figure 6B). Interestingly, we identified A-to-G edits (Figure 6B), e.g., IGFBP7, BLCAP, and AZIN1 in
PBMCs that were known to occur in other tissues (Gommans
et al., 2008; Levanon et al., 2005), indicating that the same
RNA can be edited in other cell types. BLCAP exhibited two
edited changes (Figure 6C) with edited/total ratios of 0.12–0.2
and 0.18–0.31, respectively, comparable to the 0.21 ratio previously observed in the brain (Galeano et al., 2010).
Furthermore, we found and validated two missense-causing
edits, U-to-C in SCFD2 and G-to-A in FBXO25 (Figure 6D), indicating an amination-like RNA-editing mechanism, previously not
observed in human cells. Our results reveal that a large number
of edits occur and exhibit dynamic and differential changes in
populations of PBMCs (Figure 6B). The total number of edited
RNAs, while extensive, is significantly lower than that reported
in human lymphoblastoid lines and very different in its distribution (Li et al., 2011). We believe that in addition to tissue-specific
variation, the observed differences are also likely due to overcalling of false-positive SNVs, a problem we corrected with deep
exome sequencing, removal of repeat regions and pseudogenes, and strings of close-proximity variants (Data S11A).
Finally, to determine whether the nonreference allele and edited RNAs serve as templates for protein synthesis, we generated
proteome databases for 4,586 missense SNVs and all 30,385
edits and used them to search our mass spectra from the untargeted protein profiling experiments as well as in a targeted
approach to directly search for 500 edited proteins (see
Extended Experimental Procedures). Peptides for 48 SNVs and
51 edits were identified (FDR < 0.01 and requiring one unique
peptide per protein; Data S9 and S11B). A total of 17/17 selected
SNVs (100%) were validated by Sanger sequencing. Seven
peptides derived from the SNV and six peptides derived from
edited transcripts were unique to a single protein in the IPI database (Kersey et al., 2004) and classified as high confidence.
These results indicate that a large fraction of personal variants
are expressed as transcripts and a number of these are also
translated as proteins.
miRNA Variant Analysis
In addition to the omics profiling above, we identified 619–681
known miRNAs from PBMCs per time point (>10 reads, days
4, 21, 116, 185, and 186), 106 of which showed dynamic
changes (e.g., Figures S2D and S2E). Examination of miRNA
editing revealed 50 edited miRNAs (C-to-U or A-to-I) with stringent criteria (edited reads > 5% of total reads or > 399 modified
reads) indicating that at least 4% of expressed miRNAs are
potentially edited. Eighteen miRNAs contain edits located within
the functionally critical ‘‘seed sequences,’’ potentially affecting
their mRNA targets. Interestingly, expression of SNV-containing
miRNAs was generally higher compared to SNV-free miRNA
(Figures 6E and 6F). In addition to edits, analysis of the SNVs
located in miRNAs revealed that most (25 of 31) SNV-containing
miRNAs were not expressed. These miRNAs were among those
discovered in cancer cell lines (Jima et al., 2010) and may not
normally be highly expressed in PBMCs from healthy
To our knowledge, our study is the first to perform extensive
personal iPOP of an individual through healthy and diseased
states. It revealed extensive complex and dynamic changes in
the omics profiles, especially in the transcriptomes, between
healthy states and viral infections, and between nondiabetic
and diabetic states. iPOP provides a multidimensional view of
medical states, including healthy states, response to viral infection, recovery, and T2D onset. Our study indicates that disease
risk can be assessed from a genome sequence and illustrates
how traits associated with disease can be monitored to identify
varying physiological stages. We show that large numbers of
molecular components are present in blood samples and can
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1301
1302 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
Figure 5. Heteroallelic Expression Study of PBMCs
(A) Frequency of allele-specific expression (ASE) based on shrunk alternative/total ratios of RNA-Seq data. A total of 143 positions fall outside the three standard
deviations (s) range (see Figure S7B; 0.66), suggesting that certain heterozygous alleles (DNA level) are preferentially expressed in PBMCs. Standard
deviations (s) are denoted with dotted lines and the average ratio overlapping across all time points is 0.49.
(B) Digital droplet PCR validation of two heteroallelic expressed genes PADI4 and PLOD (relative to alternative allele).
(C) Heat map of the HRV infection time course (seven time points) showing differential ASE during HRV infection day 0 (red arrow) relative to average shrunk ratios
of healthy states (days 116–255).
(D) Heat map of the RSV infection time course (13 time points) showing differential ASE specific to RSV infection day 289 (red arrow) relative to average shrunk
ratios of healthy states (days 311–400), onset of high glucose on day 307 is also shown (red arrow). Heat map ratios are relative to the alternative allele (alternative/
total, posterior probability >0.75). Example of enriched KEGG pathway gene cluster (Huang et al., 2009; Benjamini p < 0.05) shown below Figure 5C.
See also Figure S7 and Data S11A.
be measured (>3 billion measurements taken over 20 time
points). For the transcriptome many of these arise from differential splicing, ASE, and editing events. By observing dynamic
molecular changes that correspond to physiological states,
this proof-of-principle study offers a pilot implementation of
personalized medicine. The information obtained may greatly
help in the design and application of personalized health monitoring, diagnosis, prognosis, and treatment.
We speculate that differential expression of ASE/edits
may be important in monitoring and assessing diseased
states. In this respect the genes/proteins in which one isoform
is abundant in one condition (e.g., diseased or healthy state)
whereas another is abundant in another (e.g., diseased state)
may provide unique physiological advantages to the individual in distinct environmental conditions. Because multiple
genes in our study that exhibit ASE and editing changes are
Figure 4. Integrated Omics Analysis
For days 186–400, the different emergent patterns from an integrated analysis of the transcriptome, proteome, and metabolome data are displayed for the
autocorrelation (I), spike maxima (II), and spike minima (III) classes. For different clusters, examples of gene connections in selected pathways based on
Reactome (Croft et al., 2011) and FI Cytoscape (Smoot et al., 2011) plugin are shown as networks, with constituents marked as assessed from proteome data,
transcriptome data or both. Example GO (Ashburner et al., 2000) enrichment analysis results from Cytoscape (Smoot et al., 2011) BiNGO (Maere et al., 2005)
plugin and pathway enrichment results (Reactome FI [Croft et al., 2011]) are included.
See also Figures S4–S6.
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1303
Figure 6. RNA Editing and miRNA Expression of PBMCs
(A) Distribution of candidate RNA editing types in missense (red) and synonymous and UTRs (blue), based on seven or more time points (total 20 time points).
(B) Selected summary of known and novel RNA edits expressed in PBMCs. RNA edits were validated by digital PCR (green) and proteomic mass spectrometry
(C) Detail of two missense-causing edit sites in BLCAP. Selected data from RNA-Seq at day 4 and day 255 (top left), Sanger sequencing of day 255 cDNA (bottom
left), and digital PCR (right panel) are shown.
(D) Digital droplet PCR analysis of novel edit sites in SCFD2 (left) and FBXO25 (right) genes show no variants in DNA, whereas in RNA, editing is evident (top left
(E and F) Expression of SNV-containing and SNV-free miRNA, respectively, for days 4, 21, 116, 185, and 186. Red lines, mean; error bars, standard error of the mean.
Genome browsers, chromatograms, and digital PCR data were analyzed with software from DNAnexus, Inc., Chromas 2.33, and QuantaLife, respectively.
See also Figure S7 and Data S8 and S11A and S11B.
involved in immune function, we speculate that these components are particularly valuable for mediating immune responses to environmental conditions such as exposure to
1304 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
pathogens. Likewise miRNA SNVs and edits, which also
undergo differential expression, may confer unique biological

Although we analyzed a single individual, insights were gained
by integrating the multiple omics profiles associated with distinct physiological states. Through examination of molecular
patterns, clear signatures of dynamic biological processes
were evident, including immune responses during infection,
insulin signaling response alterations after the RSV infection.
Indeed, careful monitoring of omics changes across multiple
time points for the same individual revealed detailed responses,
which might not have been evident had the analyses been performed on groups due to interindividual variability. Hence, we
expect that our longitudinal personalized profiling approach
provides valuable information on an individual basis.
We focused on a generally healthy subject who exhibited no
apparent disease symptoms. This is a critical aspect of personalized medicine, which is to perform iPOP and evaluate the
importance and changes of all the profiles in ordinary individuals.
These results have important implications and suggest new
paradigm shifts: first, genome sequencing can be used to direct
the monitoring of specific diseases (in this study, aplastic anemia
and diabetes) and second, by following large numbers of molecules a more comprehensive view of disease states can be
analyzed to follow physiological states.
Our study revealed that many distinct molecular events and
pathways are activated both through viral infection and the onset
of diabetes. Indeed, the monitoring of large numbers of different
components revealed a steady decrease of insulin-related
responses that are associated with diabetes-insulin response
pathways occurring from the early healthy state to a high glucose
state. Although many of the activated and repressed pathways
could be detected through transcript profiling, some were detected only with the proteomics data and some with the combined set of data. In addition a large number of connections
with diabetes and insulin signaling using metabolites, miRNAs,
and autoantibodies were observed. One particularly interesting
response detected with the proteomics data was the onset of
the elevated glucose response that was tightly associated with
the RSV infection and a particular subclinical response at day
12/18 postinfection. It is tempting to speculate that the RSV infection and/or the associated event at day 12/18 triggered the onset
of high glucose/T2D. Although viral infections have been associated with T1D (van der Werf et al., 2007), we are unaware of viral
infection associated with T2D. Inflammation and activated innate
immunity have been associated with T2D (Pickup, 2004), and we
speculate that perhaps RSV triggered aberrant glucose metabolism through activation of a viral inflammation response in
conjunction with a predisposition toward T2D. Although this
cannot be proven with the analyses from a single individual, this
study nonetheless serves as proof-of-principle that iPOP can
be performed and provide valuable information. Because diabetes is a complex disease there may be many ways to acquire
high glucose phenotype; longitudinal iPOP analysis of a large
number of individuals may be extremely valuable to dissecting
the disease and its various subtypes, as well providing information into the molecular mechanism of its onset.
Finally, we believe that the wealth of data generated from this
study will serve as a valuable resource to the community in the
developing field of personalized medicine. A large database
with the complete time-dynamic profiles for more individuals
that acquire infections and other types of diseases will be
extremely valuable in the early diagnostics, monitoring and treatment of diseased states.
The subject and mother in this study were recruited under the IRB protocol
IRB-8629 at Stanford University. Full methods and associated references
can be found in the Extended Experimental Procedures section.
WGS was performed at Complete Genomics and Illumina. High-confidence
SNVs were mostly correct as evidenced by: (1) Illumina Omni1-Quad genotyping arrays (99.3% sensitivity), (2) a Ti/Tv ratio of 2.14 as expected (1000
Genomes Project Consortium, 2010), (3) Illumina capture and DNA sequencing
(92.7% accuracy), and (4) Sanger sequencing of 36 randomly selected SNVs
(36/36 validated, Table S1). In contrast, the low confidence SNVs had a Ti/
Tv of only 1.46 and an accuracy of 63.8% (19 of 33 confirmed by Sanger
sequencing, Table S1A). Similarly, the majority of the 216,776 high-confidence
indels are likely to be correct as (1) Sanger sequencing validated 14 of 15 (93%)
tested indels and (2) exome-sequencing validated most indels (4,706, 82%);
meanwhile the 806,125 low confidence indels had a low validation rate
(5,225, 0.65%). SVs were called using: (1) paired-end mapping (Chen et al.,
2009) (2) read depth (Abyzov et al., 2011), (3) split reads (Ye et al., 2009),
and (4) junction mapping (Lam et al., 2010) to the breakpoint junction database
from the 1000 G (Mills et al., 2011). A total of 2,566 were found by two different
methods or platforms (CG or Illumina) and were called high confidence; >90%
of these were in the database of genome variants.
Strand-specific RNA-Seq libraries were prepared as described previously
(Parkhomchuk et al., 2009) and sequenced on 1–3 lanes of Illumina’s HiSeq
2000 instrument. The TopHat package (Trapnell et al., 2009) was used to align
the reads to the hg19 reference genome, followed by Cufflinks for transcript
assembly and RNA expression analysis (Trapnell et al., 2010). The Samtools
package (Li et al., 2009a) was used to identify variants including single nucleotide variants (SNV) and Indels. Small RNAs were prepared from PBMCs for
the first five time points; sequencing was performed according to Illumina’s
Small RNA v1.5 Sample Preparation Guide.
The Luminex 51-plex Human Cytokines assay was performed at the
Stanford Human Immune Monitoring Center. For mass spectrometry, proteins
were prepared from PBMC cell lysates, labeled at lysines using the TMT
isobaric tags by Pierce, and digested with trypsin and analyzed using reverse
phase LC coupled to a Thermo Scientific (LTQ)-Orbitrap Velos instrument. In
order to profile serum, 14 major glycoproteins were first removed using the
Agilent Human 14 Multiple Affinity Removal System (MARS) column in order
to analyze the less abundant constituents. Metabolites were extracted by
four times serum volume of equal mixture of methanol, acetonitrile, and
acetone and separated using our Agilent 1260 liquid chromatography. Hydrophobic molecules were profiled using reversed phase UPLC followed by
APCI-MS and hydrophilic molecule were analyzed using HILIC UPLC followed
by ESI-MS in either the positive or negative mode.
For the integrated analysis, per omics set, for each time-series curve the
Lomb-Scargle transformation (Hocke and Kämpfer, 2009; Lomb, 1976; Scargle, 1982, 1989) for unevenly sampled gapped time-series data was implemented (Ahdesmäki et al., 2007; Glynn et al., 2006; Van Dongen et al., 1999;
Yang et al., 2011; Zhao et al., 2008). This allowed us to obtain a periodogram,
which was used to calculate autocorrelations and then reconstruct the timeseries with even sampling, allowing standard time-series analysis and performing data clustering, while taking the time intervals into account (see
Extended Experimental Procedures).
Autoantibodyome profiling was performed using the Invitrogen ProtoArray
Protein Microarray v5.0 according to the manufacturer’s instructions.
The SRA accession number for the WGS sequence reported in this paper is
SRP008054.4. The GEO accession number for the RNA-Seq and miRNASeq data sequence reported in this paper is GSE33029. See Extended Experimental Procedures for data dissemination details.
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1305
reactions, pathways and biological processes. Nucleic Acids Res. 39 (Database issue), D691–D697.
Supplemental Information includes Extended Experimental Procedures, seven
figures, four tables, and eleven data files and can be found with this article
online at doi:10.1016/j.cell.2012.02.009.
Dewey, F.E., Chen, R., Cordero, S.P., Ormond, K.E., Caleshu, C., Karczewski,
K.J., Whirl-Carrillo, M., Wheeler, M.T., Dudley, J.T., Byrnes, J.K., et al. (2011).
Phased whole-genome genetic risk in a family quartet using a major allele
reference sequence. PLoS Genet. 7, e1002280.
Erlich, H.A., Valdes, A.M., Julier, C., Mirel, D., and Noble, J.A.; Type I Diabetes
Genetics Consortium. (2009). Evidence for association of the TCF7 locus with
type I diabetes. Genes Immun. 10 (Suppl 1), S54–S59.
M.S. is funded by grants from Stanford University and the NIH. M.G. is funded
by grants from the NIH. G.I.M. is funded by NIH training grant. K.J.K., J.T.D.,
and S.H. are supported by the NIH/NLM training grant T15-LM007033.
T.E.K. and R.B.A are funded by NIH/NIGMS R24-GM61374. M.A.B.’s laboratory is funded by the Spanish Ministry of Science and Innovation Projects
SAF2008-05384 and CSD2007-00017, European Union FP7 Projects 2007A-201630 (GENICA) and 2007-A-200950 (TELOMARKER), European
Research Council Advanced Grant GA232854, the Körber Foundation, the
Fundación Marcelino Botı́n, and Fundación Lilly (España). F.E.D. was supported by NIH/NHLBI training grant T32 HL094274. E.A.A. was supported by
NIH/NHLBI KO8 HL083914, NIH New Investigator DP2 Award OD004613,
and a grant from the Breetwor Family Foundation. We dedicate this manuscript
to Dr. Tara A. Gianoulis, an enthusiastic advocate for genomic science. R.B.A.,
E.A.A., A.B., and M.S. serve as founders and consultants for Personalis. R.B.A.
is a consultant to 23andMe. M.S. is a member of the scientific advisory board
of GenapSys and a consultant for Illumina. M.A.B. acts as consultant and holds
stock in Life Length.
Received: October 11, 2011
Revised: January 27, 2012
Accepted: February 4, 2012
Published: March 15, 2012
1000 Genomes Project Consortium. (2010). A map of human genome variation
from population-scale sequencing. Nature 467, 1061–1073.
Abyzov, A., Urban, A.E., Snyder, M., and Gerstein, M. (2011). CNVnator: an
approach to discover, genotype, and characterize typical and atypical CNVs
from family and population genome sequencing. Genome Res. 21, 974–984.
Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A.,
Bork, P., Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server
for predicting damaging missense mutations. Nat. Methods 7, 248–249.
Ahdesmäki, M., Lähdesmäki, H., Gracey, A., Shmulevich, L., and Yli-Harja, O.
(2007). Robust regression for periodicity detection in non-uniformly sampled
time-course gene expression data. BMC Bioinformatics 8, 233.
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M.,
Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology:
tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet.
25, 25–29.
Ashley, E.A., Butte, A.J., Wheeler, M.T., Chen, R., Klein, T.E., Dewey, F.E.,
Dudley, J.T., Ormond, K.E., Pavlovic, A., Morgan, A.A., et al. (2010). Clinical
assessment incorporating a personal genome. Lancet 375, 1525–1535.
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate:
a practical and powerful approach to multiple testing. Roy. Statist. Soc. Ser.
B 57, 289–300.
Cancer Genome Atlas Research Network. (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615.
Chen, K., Wallis, J.W., McLellan, M.D., Larson, D.E., Kalicki, J.M., Pohl, C.S.,
McGrath, S.D., Wendl, M.C., Zhang, Q., Locke, D.P., et al. (2009). BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681.
Galeano, F., Leroy, A., Rossetti, C., Gromova, I., Gautier, P., Keegan, L.P.,
Massimi, L., Di Rocco, C., O’Connell, M.A., and Gallo, A. (2010). Human
BLCAP transcript: new editing events in normal and cancerous tissues. Int.
J. Cancer 127, 127–137.
Glynn, E.F., Chen, J., and Mushegian, A.R. (2006). Detecting periodic patterns
in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics 22, 310–316.
Gommans, W.M., Tatalias, N.E., Sie, C.P., Dupuis, D., Vendetti, N., Smith, L.,
Kaushal, R., and Maas, S. (2008). Screening of human SNP database identifies
recoding sites of A-to-I RNA editing. RNA 14, 2074–2085.
Grayson, B.L., Wang, L., and Aune, T.M. (2011). Peripheral blood gene expression profiles in metabolic syndrome, coronary artery disease and type 2 diabetes. Genes Immun. 12, 341–351.
Hani, E.H., Boutin, P., Durand, E., Inoue, H., Permutt, M.A., Velho, G., and
Froguel, P. (1998). Missense mutations in the pancreatic islet beta cell inwardly
rectifying K+ channel gene (KIR6.2/BIR): a meta-analysis suggests a role in the
polygenic basis of Type II diabetes mellitus in Caucasians. Diabetologia 41,
Hindson, B.J., Ness, K.D., Masquelier, D.A., Belgrader, P., Heredia, N.J.,
Makarewicz, A.J., Bright, I.J., Lucero, M.Y., Hiddessen, A.L., Legler, T.C.,
et al. (2011). High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal. Chem. 83, 8604–8610.
Hocke, K., and Kämpfer, N. (2009). Gap filling and noise reduction of unevenly
sampled data by means of the Lomb-Scargle periodogram. Atmos. Chem.
Phys. 9, 4197–4206.
Huang, W., Sherman, B.T., and Lempicki, R.A. (2009). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large
gene lists. Nucleic Acids Res. 37, 1–13.
Jima, D.D., Zhang, J., Jacobs, C., Richards, K.L., Dunphy, C.H., Choi, W.W.,
Au, W.Y., Srivastava, G., Czader, M.B., Rizzieri, D.A., et al; Hematologic Malignancies Research Consortium. (2010). Deep sequencing of the small RNA
transcriptome of normal and malignant human B cells identifies hundreds of
novel microRNAs. Blood 116, e118–e127.
Kanehisa, M., and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and
genomes. Nucleic Acids Res. 28, 27–30.
Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A.,
Waszak, S.M., Habegger, L., Rozowsky, J., Shi, M., Urban, A.E., et al.
(2010). Variation in transcription factor binding among humans. Science 328,
Kersey, P.J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E., and
Apweiler, R. (2004). The International Protein Index: an integrated database
for proteomics experiments. Proteomics 4, 1985–1988.
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D.,
Jones, S.J., and Marra, M.A. (2009). Circos: an information aesthetic for
comparative genomics. Genome Res. 19, 1639–1645.
Cox, J., and Mann, M. (2010). Quantitative, high-resolution proteomics for
data-driven systems biology. Annu. Rev. Biochem. 80, 273–299.
Kusunoki, M., Tsutsumi, K., Nakayama, M., Kurokawa, T., Nakamura, T.,
Ogawa, H., Fukuzawa, Y., Morishita, M., Koide, T., and Miyata, T. (2007).
Relationship between serum concentrations of saturated fatty acids and
unsaturated fatty acids and the homeostasis model insulin resistance index
in Japanese patients with type 2 diabetes mellitus. J. Med. Invest. 54,
Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M.,
Garapati, P., Gopinath, G., Jassal, B., et al. (2011). Reactome: a database of
Lam, H.Y., Mu, X.J., Stütz, A.M., Tanzer, A., Cayting, P.D., Snyder, M., Kim,
P.M., Korbel, J.O., and Gerstein, M.B. (2010). Nucleotide-resolution analysis
1306 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol.
28, 47–55.
Lapuk, A., Marr, H., Jakkula, L., Pedro, H., Bhattacharya, S., Purdom, E., Hu,
Z., Simpson, K., Pachter, L., Durinck, S., et al. (2010). Exon-level microarray
analyses identify alternative splicing programs in breast cancer. Mol. Cancer
Res. 8, 961–974.
Levanon, E.Y., Hallegger, M., Kinar, Y., Shemesh, R., Djinovic-Carugo, K.,
Rechavi, G., Jantsch, M.F., and Eisenberg, E. (2005). Evolutionarily conserved
human targets of adenosine to inosine RNA editing. Nucleic Acids Res. 33,
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.,
Abecasis, G., and Durbin, R.; 1000 Genome Project Data Processing
Subgroup. (2009a). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079.
Li, J.B., Levanon, E.Y., Yoon, J.K., Aach, J., Xie, B., Leproust, E., Zhang, K.,
Gao, Y., and Church, G.M. (2009b). Genome-wide identification of human
RNA editing sites by parallel DNA capturing and sequencing. Science 324,
Li, M., Wang, I.X., Li, Y., Bruzel, A., Richards, A.L., Toung, J.M., and Cheung,
V.G. (2011). Widespread RNA and DNA sequence differences in the human
transcriptome. Science 333, 53–58.
Li, R., Li, Y., Zheng, H., Luo, R., Zhu, H., Li, Q., Qian, W., Ren, Y., Tian, G., Li, J.,
et al. (2010). Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63.
Lomb, N. (1976). Least-squares frequency analysis of unequally spaced data.
Astrophys. Space Sci. 39, 447–462.
Maere, S., Heymans, K., and Kuiper, M. (2005). BiNGO: a Cytoscape plugin to
assess overrepresentation of gene ontology categories in biological networks.
Bioinformatics 21, 3448–3449.
Mills, R.E., Walter, K., Stewart, C., Handsaker, R.E., Chen, K., Alkan, C.,
Abyzov, A., Yoon, S.C., Ye, K., Cheetham, R.K., et al; 1000 Genomes Project.
(2011). Mapping copy number variation by population-scale genome
sequencing. Nature 470, 59–65.
Mischel, P.S., Shai, R., Shi, T., Horvath, S., Lu, K.V., Choe, G., Seligson, D.,
Kremen, T.J., Palotie, A., Liau, L.M., et al. (2003). Identification of molecular
subtypes of glioblastoma by gene expression profiling. Oncogene 22, 2361–
Parkhomchuk, D., Borodina, T., Amstislavskiy, V., Banaru, M., Hallen, L.,
Krobitsch, S., Lehrach, H., and Soldatov, A. (2009). Transcriptome analysis
by strand-specific sequencing of complementary DNA. Nucleic Acids Res.
37, e123.
Pickup, J.C. (2004). Inflammation and activated innate immunity in the
pathogenesis of type 2 diabetes. Diabetes Care 27, 813–823.
Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J.,
Bjornson, R., Kong, Y., Kitabayashi, N., et al. (2011). AlleleSeq: analysis of
allele-specific expression and binding in a network framework. Mol. Syst.
Biol. 7, 522.
Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., and Ideker, T. (2011).
Cytoscape 2.8: new features for data integration and network visualization.
Bioinformatics 27, 431–432.
Snyder, M., Weissman, S., and Gerstein, M. (2009). Personal phenotypes to go
with personal genomes. Mol. Syst. Biol. 5, 273.
Snyder, M., Du, J., and Gerstein, M. (2010). Personal genome sequencing:
current approaches and challenges. Genes Dev. 24, 423–431.
Stewart, C., Kural, D., Strömberg, M.P., Walker, J.A., Konkel, M.K., Stütz,
A.M., Urban, A.E., Grubert, F., Lam, H.Y., Lee, W.P., et al; 1000 Genomes
Project. (2011). A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genet. 7, e1002236.
Sun, J.C., Liang, X.T., Pan, K., Wang, H., Zhao, J.J., Li, J.J., Ma, H.Q., Chen,
Y.B., and Xia, J.C. (2010). High expression level of EDIL3 in HCC predicts poor
prognosis of HCC patients. World J. Gastroenterol. 16, 4611–4615.
Theodoridis, G., Gika, H.G., and Wilson, I.D. (2011). Mass spectrometry-based
holistic analytical approaches for metabolite profiling in systems biology
studies. Mass. Spectrom. Rev. 30, 884–906.
Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice
junctions with RNA-Seq. Bioinformatics 25, 1105–1111.
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren,
M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly
and quantification by RNA-Seq reveals unannotated transcripts and isoform
switching during cell differentiation. Nat. Biotechnol. 28, 511–515.
van der Werf, N., Kroese, F.G., Rozing, J., and Hillebrands, J.L. (2007). Viral
infections as potential triggers of type 1 diabetes. Diabetes Metab. Res.
Rev. 23, 169–183.
Van Dongen, H.P., Olofsen, E., VanHartevelt, J.H., and Kruyt, E.W. (1999).
A procedure of multiple period searching in unequally spaced time-series
with the Lomb-Scargle method. Biol. Rhythm Res. 30, 149–177.
Vaxillaire, M., Cavalcanti-Proença, C., Dechaume, A., Tichet, J., Marre, M.,
Balkau, B., and Froguel, P.; DESIR Study Group. (2008). The common
P446L polymorphism in GCKR inversely modulates fasting glucose and
triglyceride levels and reduces type 2 diabetes risk in the DESIR prospective
general French population. Diabetes 57, 2253–2257.
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C.,
Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform
regulation in human tissue transcriptomes. Nature 456, 470–476.
Wu, J.Q., Habegger, L., Noisa, P., Szekely, A., Qiu, C., Hutchison, S., Raha, D.,
Egholm, M., Lin, H., Weissman, S., et al. (2010). Dynamic transcriptomes
during neural differentiation of human embryonic stem cells revealed by short,
long, and paired-end sequencing. Proc. Natl. Acad. Sci. USA 107, 5254–5259.
Yamaguchi, H., Calado, R.T., Ly, H., Kajigaya, S., Baerlocher, G.M., Chanock,
S.J., Lansdorp, P.M., and Young, N.S. (2005). Mutations in TERT, the gene for
telomerase reverse transcriptase, in aplastic anemia. N. Engl. J. Med. 352,
Yang, R., Zhang, C., and Su, Z. (2011). LSPR: an integrated periodicity detection algorithm for unevenly sampled temporal microarray data. Bioinformatics
27, 1023–1025.
Scargle, J.D. (1982). Studies in astronomical time series analysis. II-Statistical
aspects of spectral analysis of unevenly spaced data. Astrophys. J. 263,
Ye, K., Schulz, M.H., Long, Q., Apweiler, R., and Ning, Z. (2009). Pindel:
a pattern growth approach to detect break points of large deletions and
medium sized insertions from paired-end short reads. Bioinformatics 25,
Scargle, J.D. (1989). Studies in astronomical time series analysis. III-Fourier
transforms, autocorrelation functions, and cross-correlation functions of
unevenly spaced data. Astrophys. J. 343, 874–887.
Zhao, W., Agyepong, K., Serpedin, E., and Dougherty, E.R. (2008). Detecting
periodic genes from irregularly sampled gene expressions: a comparison
study. EURASIP J. Bioinform. Syst. Biol. 2008, 769293.
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1307

Purchase answer to see full

We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.