Publications by authors named "Larsson Omberg"

37 Publications

Crowdsourcing digital health measures to predict Parkinson's disease severity: the Parkinson's Disease Digital Biomarker DREAM Challenge.

NPJ Digit Med 2021 Mar 19;4(1):53. Epub 2021 Mar 19.

Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands.

Consumer wearables and sensors are a rich source of data about patients' daily disease and symptom burden, particularly in the case of movement disorders like Parkinson's disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-021-00414-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7979931PMC
March 2021

Limb and trunk accelerometer data collected with wearable sensors from subjects with Parkinson's disease.

Sci Data 2021 02 5;8(1):47. Epub 2021 Feb 5.

Department of Physical Medicine and Rehabilitation, Harvard Medical School, Spaulding Rehabilitation Hospital, Boston, Massachusetts, USA.

Parkinson's disease (PD) is a neurodegenerative disorder characterized by motor and non-motor symptoms. Dyskinesia and motor fluctuations are complications of PD medications. An objective measure of on/off time with/without dyskinesia has been sought for some time because it would facilitate the titration of medications. The objective of the dataset herein presented is to assess if wearable sensor data can be used to generate accurate estimates of limb-specific symptom severity. Nineteen subjects with PD experiencing motor fluctuations were asked to wear a total of five wearable sensors on both forearms and shanks, as well as on the lower back. Accelerometer data was collected for four days, including two laboratory visits lasting 3 to 4 hours each while the remainder of the time was spent at home and in the community. During the laboratory visits, subjects performed a battery of motor tasks while clinicians rated limb-specific symptom severity. At home, subjects were instructed to use a smartphone app that guided the periodic performance of a set of motor tasks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-021-00831-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7864964PMC
February 2021

Accelerometer data collected with a minimum set of wearable sensors from subjects with Parkinson's disease.

Sci Data 2021 02 5;8(1):48. Epub 2021 Feb 5.

Department of Physical Medicine and Rehabilitation, Harvard Medical School, Spaulding Rehabilitation Hospital, Boston, Massachusetts, USA.

Parkinson's disease (PD) is a neurodegenerative disorder associated with motor and non-motor symptoms. Current treatments primarily focus on managing motor symptom severity such as tremor, bradykinesia, and rigidity. However, as the disease progresses, treatment side-effects can emerge such as on/off periods and dyskinesia. The objective of the Levodopa Response Study was to identify whether wearable sensor data can be used to objectively quantify symptom severity in individuals with PD exhibiting motor fluctuations. Thirty-one subjects with PD were recruited from 2 sites to participate in a 4-day study. Data was collected using 2 wrist-worn accelerometers and a waist-worn smartphone. During Days 1 and 4, a portion of the data was collected in the laboratory while subjects performed a battery of motor tasks as clinicians rated symptom severity. The remaining of the recordings were performed in the home and community settings. To our knowledge, this is the first dataset collected using wearable accelerometers with specific focus on individuals with PD experiencing motor fluctuations that is made available via an open data repository.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-021-00830-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7865022PMC
February 2021

Design of a virtual longitudinal observational study in Parkinson's disease (AT-HOME PD).

Ann Clin Transl Neurol 2021 02 22;8(2):308-320. Epub 2020 Dec 22.

Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.

Objective: The expanding power and accessibility of personal technology provide an opportunity to reduce burdens and costs of traditional clinical site-centric therapeutic trials in Parkinson's disease and generate novel insights. The value of this approach has never been more evident than during the current COVID-19 pandemic. We sought to (1) establish and implement the infrastructure for longitudinal, virtual follow-up of clinical trial participants, (2) compare changes in smartphone-based assessments, online patient-reported outcomes, and remote expert assessments, and (3) explore novel digital markers of Parkinson's disease disability and progression.

Methods: Participants from two recently completed phase III clinical trials of inosine and isradipine enrolled in Assessing Tele-Health Outcomes in Multiyear Extensions of Parkinson's Disease trials (AT-HOME PD), a two-year virtual cohort study. After providing electronic informed consent, individuals complete annual video visits with a movement disorder specialist, smartphone-based assessments of motor function and socialization, and patient-reported outcomes online.

Results: From the two clinical trials, 226 individuals from 42 states in the United States and Canada enrolled. Of these, 181 (80%) have successfully downloaded the study's smartphone application and 161 (71%) have completed patient-reported outcomes on the online platform.

Interpretation: It is feasible to conduct a large-scale, international virtual observational study following the completion of participation in brick-and-mortar clinical trials in Parkinson's disease. This study, which brings research to participants, will compare established clinical endpoints with novel digital biomarkers and thereby inform the longitudinal follow-up of clinical trial participants and design of future clinical trials.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/acn3.51236DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7886038PMC
February 2021

Evaluating the Utility of Smartphone-Based Sensor Assessments in Persons With Multiple Sclerosis in the Real-World Using an App (elevateMS): Observational, Prospective Pilot Digital Health Study.

JMIR Mhealth Uhealth 2020 10 27;8(10):e22108. Epub 2020 Oct 27.

Sage Bionetworks, Seattle, WA, United States.

Background: Multiple sclerosis (MS) is a chronic neurodegenerative disease. Current monitoring practices predominantly rely on brief and infrequent assessments, which may not be representative of the real-world patient experience. Smartphone technology provides an opportunity to assess people's daily-lived experience of MS on a frequent, regular basis outside of episodic clinical evaluations.

Objective: The objectives of this study were to evaluate the feasibility and utility of capturing real-world MS-related health data remotely using a smartphone app, "elevateMS," to investigate the associations between self-reported MS severity and sensor-based active functional tests measurements, and the impact of local weather conditions on disease burden.

Methods: This was a 12-week, observational, digital health study involving 3 cohorts: self-referred participants who reported an MS diagnosis, clinic-referred participants with neurologist-confirmed MS, and participants without MS (controls). Participants downloaded the elevateMS app and completed baseline assessments, including self-reported physical ability (Patient-Determined Disease Steps [PDDS]), as well as longitudinal assessments of quality of life (Quality of Life in Neurological Disorders [Neuro-QoL] Cognitive, Upper Extremity, and Lower Extremity Function) and daily health (MS symptoms, triggers, health, mobility, pain). Participants also completed functional tests (finger-tapping, walk and balance, voice-based Digit Symbol Substitution Test [DSST], and finger-to-nose) as an independent assessment of MS-related cognition and motor activity. Local weather data were collected each time participants completed an active task. Associations between self-reported baseline/longitudinal assessments, functional tests, and weather were evaluated using linear (for cross-sectional data) and mixed-effects (for longitudinal data) regression models.

Results: A total of 660 individuals enrolled in the study; 31 withdrew, 495 had MS (n=359 self-referred, n=136 clinic-referred), and 134 were controls. Participation was highest in clinic-referred versus self-referred participants (median retention: 25.5 vs 7.0 days). The top 5 most common MS symptoms, reported at least once by participants with MS, were fatigue (310/495, 62.6%), weakness (222/495, 44.8%), memory/attention issues (209/495, 42.2%), and difficulty walking (205/495, 41.4%), and the most common triggers were high ambient temperature (259/495, 52.3%), stress (250/495, 50.5%), and late bedtime (221/495, 44.6%). Baseline PDDS was significantly associated with functional test performance in participants with MS (mixed model-based estimate of most significant feature across functional tests [β]: finger-tapping: β=-43.64, P<.001; DSST: β=-5.47, P=.005; walk and balance: β=-.39, P=.001; finger-to-nose: β=.01, P=.01). Longitudinal Neuro-QoL scores were also significantly associated with functional tests (finger-tapping with Upper Extremity Function: β=.40, P<.001; walk and balance with Lower Extremity Function: β=-99.18, P=.02; DSST with Cognitive Function: β=1.60, P=.03). Finally, local temperature was significantly associated with participants' test performance (finger-tapping: β=-.14, P<.001; DSST: β=-.06, P=.009; finger-to-nose: β=-53.88, P<.001).

Conclusions: The elevateMS study app captured the real-world experience of MS, characterized some MS symptoms, and assessed the impact of environmental factors on symptom severity. Our study provides further evidence that supports smartphone app use to monitor MS with both active assessments and patient-reported measures of disease burden. App-based tracking may provide unique and timely real-world data for clinicians and patients, resulting in improved disease insights and management.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2196/22108DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7655470PMC
October 2020

The AD Knowledge Portal: A Repository for Multi-Omic Data on Alzheimer's Disease and Aging.

Curr Protoc Hum Genet 2020 12;108(1):e105

Sage Bionetworks, Seattle, Washington.

The AD Knowledge Portal (adknowledgeportal.org) is a public data repository that shares data and other resources generated by multiple collaborative research programs focused on aging, dementia, and Alzheimer's disease (AD). In this article, we highlight how to use the Portal to discover and download genomic variant and transcriptomic data from the same individuals. First, we show how to use the web interface to browse and search for data of interest using relevant file annotations. We demonstrate how to learn more about the context surrounding the data, including diagnostic criteria and methodological details about sample preparation and data analysis. We present two primary ways to download data-using a web interface, and using a programmatic method that provides access using the command line. Finally, we show how to merge separate sources of metadata into a comprehensive file that contains factors and covariates necessary in downstream analyses. © 2020 The Authors. Basic Protocol 1: Find and download files associated with a selected study Basic Protocol 2: Download files in bulk using the command line client Basic Protocol 3: Working with file annotations and metadata.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cphg.105DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7587039PMC
December 2020

Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions.

Sci Data 2020 10 12;7(1):340. Epub 2020 Oct 12.

Sage Bionetworks, Seattle, WA, 98121, USA.

The availability of high-quality RNA-sequencing and genotyping data of post-mortem brain collections from consortia such as CommonMind Consortium (CMC) and the Accelerating Medicines Partnership for Alzheimer's Disease (AMP-AD) Consortium enable the generation of a large-scale brain cis-eQTL meta-analysis. Here we generate cerebral cortical eQTL from 1433 samples available from four cohorts (identifying >4.1 million significant eQTL for >18,000 genes), as well as cerebellar eQTL from 261 samples (identifying 874,836 significant eQTL for >10,000 genes). We find substantially improved power in the meta-analysis over individual cohort analyses, particularly in comparison to the Genotype-Tissue Expression (GTEx) Project eQTL. Additionally, we observed differences in eQTL patterns between cerebral and cerebellar brain regions. We provide these brain eQTL as a resource for use by the research community. As a proof of principle for their utility, we apply a colocalization analysis to identify genes underlying the GWAS association peaks for schizophrenia and identify a potentially novel gene colocalization with lncRNA RP11-677M14.2 (posterior probability of colocalization 0.975).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-020-00642-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7550587PMC
October 2020

Meta-Analysis of the Alzheimer's Disease Human Brain Transcriptome and Functional Dissection in Mouse Models.

Cell Rep 2020 07;32(2):107908

Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA.

We present a consensus atlas of the human brain transcriptome in Alzheimer's disease (AD), based on meta-analysis of differential gene expression in 2,114 postmortem samples. We discover 30 brain coexpression modules from seven regions as the major source of AD transcriptional perturbations. We next examine overlap with 251 brain differentially expressed gene sets from mouse models of AD and other neurodegenerative disorders. Human-mouse overlaps highlight responses to amyloid versus tau pathology and reveal age- and sex-dependent expression signatures for disease progression. Human coexpression modules enriched for neuronal and/or microglial genes broadly overlap with mouse models of AD, Huntington's disease, amyotrophic lateral sclerosis, and aging. Other human coexpression modules, including those implicated in proteostasis, are not activated in AD models but rather following other, unexpected genetic manipulations. Our results comprise a cross-species resource, highlighting transcriptional networks altered by human brain pathophysiology and identifying correspondences with mouse models for AD preclinical studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2020.107908DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7428328PMC
July 2020

Effects of mood and aging on keystroke dynamics metadata and their diurnal patterns in a large open-science sample: A BiAffect iOS study.

J Am Med Inform Assoc 2020 07;27(7):1007-1018

Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, USA.

Objective: Ubiquitous technologies can be leveraged to construct ecologically relevant metrics that complement traditional psychological assessments. This study aims to determine the feasibility of smartphone-derived real-world keyboard metadata to serve as digital biomarkers of mood.

Materials And Methods: BiAffect, a real-world observation study based on a freely available iPhone app, allowed the unobtrusive collection of typing metadata through a custom virtual keyboard that replaces the default keyboard. User demographics and self-reports for depression severity (Patient Health Questionnaire-8) were also collected. Using >14 million keypresses from 250 users who reported demographic information and a subset of 147 users who additionally completed at least 1 Patient Health Questionnaire, we employed hierarchical growth curve mixed-effects models to capture the effects of mood, demographics, and time of day on keyboard metadata.

Results: We analyzed 86 541 typing sessions associated with a total of 543 Patient Health Questionnaires. Results showed that more severe depression relates to more variable typing speed (P < .001), shorter session duration (P < .001), and lower accuracy (P < .05). Additionally, typing speed and variability exhibit a diurnal pattern, being fastest and least variable at midday. Older users exhibit slower and more variable typing, as well as more pronounced slowing in the evening. The effects of aging and time of day did not impact the relationship of mood to typing variables and were recapitulated in the 250-user group.

Conclusions: Keystroke dynamics, unobtrusively collected in the real world, are significantly associated with mood despite diurnal patterns and effects of age, and thus could serve as a foundation for constructing digital biomarkers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocaa057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7647317PMC
July 2020

Deep Phenotyping of Parkinson's Disease.

J Parkinsons Dis 2020 ;10(3):855-873

Department of Computer Science, University of Rochester, Rochester, NY, USA.

Phenotype is the set of observable traits of an organism or condition. While advances in genetics, imaging, and molecular biology have improved our understanding of the underlying biology of Parkinson's disease (PD), clinical phenotyping of PD still relies primarily on history and physical examination. These subjective, episodic, categorical assessments are valuable for diagnosis and care but have left gaps in our understanding of the PD phenotype. Sensors can provide objective, continuous, real-world data about the PD clinical phenotype, increase our knowledge of its pathology, enhance evaluation of therapies, and ultimately, improve patient care. In this paper, we explore the concept of deep phenotyping-the comprehensive assessment of a condition using multiple clinical, biological, genetic, imaging, and sensor-based tools-for PD. We discuss the rationale for, outline current approaches to, identify benefits and limitations of, and consider future directions for deep clinical phenotyping.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3233/JPD-202006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7458535PMC
January 2020

Indicators of retention in remote digital health studies: a cross-study evaluation of 100,000 participants.

NPJ Digit Med 2020 17;3:21. Epub 2020 Feb 17.

1Sage Bionetworks, Seattle, WA USA.

Digital technologies such as smartphones are transforming the way scientists conduct biomedical research. Several remotely conducted studies have recruited thousands of participants over a span of a few months allowing researchers to collect real-world data at scale and at a fraction of the cost of traditional research. Unfortunately, remote studies have been hampered by substantial participant attrition, calling into question the representativeness of the collected data including generalizability of outcomes. We report the findings regarding recruitment and retention from eight remote digital health studies conducted between 2014-2019 that provided individual-level study-app usage data from more than 100,000 participants completing nearly 3.5 million remote health evaluations over cumulative participation of 850,000 days. Median participant retention across eight studies varied widely from 2-26 days (median across all studies = 5.5 days). Survival analysis revealed several factors significantly associated with increase in participant retention time, including (i) referral by a clinician to the study (increase of 40 days in median retention time); (ii) compensation for participation (increase of 22 days, 1 study); (iii) having the clinical condition of interest in the study (increase of 7 days compared with controls); and (iv) older age (increase of 4 days). Additionally, four distinct patterns of daily app usage behavior were identified by unsupervised clustering, which were also associated with participant demographics. Most studies were not able to recruit a sample that was representative of the race/ethnicity or geographical diversity of the US. Together these findings can help inform recruitment and retention strategies to enable equitable participation of populations in future digital health research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-020-0224-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7026051PMC
February 2020

Data Science Approaches for Effective Use of Mobile Device-Based Collection of Real-World Data.

Clin Pharmacol Ther 2020 04 9;107(4):719-721. Epub 2020 Feb 9.

Sage Bionetworks, Seattle, Washington, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cpt.1781DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7158202PMC
April 2020

Detecting the impact of subject characteristics on machine learning-based diagnostic applications.

NPJ Digit Med 2019 11;2:99. Epub 2019 Oct 11.

1Sage Bionetworks, Seattle, USA.

Collection of high-dimensional, longitudinal digital health data has the potential to support a wide-variety of research and clinical applications including diagnostics and longitudinal health tracking. Algorithms that process these data and inform digital diagnostics are typically developed using training and test sets generated from multiple repeated measures collected across a set of individuals. However, the inclusion of repeated measurements is not always appropriately taken into account in the analytical evaluations of predictive performance. The assignment of repeated measurements from each individual to both the training and the test sets ("record-wise" data split) is a common practice and can lead to massive underestimation of the prediction error due to the presence of "identity confounding." In essence, these models learn to identify subjects, in addition to diagnostic signal. Here, we present a method that can be used to effectively calculate the amount of identity confounding learned by classifiers developed using a record-wise data split. By applying this method to several real datasets, we demonstrate that identity confounding is a serious issue in digital health studies and that record-wise data splits for machine learning- based applications need to be avoided.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-019-0178-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6789029PMC
October 2019

Identifying and ranking potential driver genes of Alzheimer's disease using multiview evidence aggregation.

Bioinformatics 2019 07;35(14):i568-i576

Sage Bionetworks, Seattle, WA, USA.

Motivation: Late onset Alzheimer's disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types.

Results: We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer's. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer's and are enriched in pathways that have been previously associated with the disease.

Availability And Implementation: Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz365DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612835PMC
July 2019

An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics.

Cell 2018 04;173(2):400-416.e11

Chan Soon-Shiong Institute of Molecular Medicine at Windber, Windber, PA 15963, USA. Electronic address:

For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2018.02.052DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6066282PMC
April 2018

Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation.

Cell 2018 04;173(2):338-354.e15

Poznań University of Medical Sciences, 61701 Poznań, Poland; Greater Poland Cancer Center, 61866 Poznań, Poland; International Institute for Molecular Oncology, 60203 Poznań, Poland. Electronic address:

Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2018.03.034DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5902191PMC
April 2018

Unsupervised Analysis of Transcriptomics in Bacterial Sepsis Across Multiple Datasets Reveals Three Robust Clusters.

Crit Care Med 2018 06;46(6):915-925

Stanford Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA.

Objectives: To find and validate generalizable sepsis subtypes using data-driven clustering.

Design: We used advanced informatics techniques to pool data from 14 bacterial sepsis transcriptomic datasets from eight different countries (n = 700).

Setting: Retrospective analysis.

Subjects: Persons admitted to the hospital with bacterial sepsis.

Interventions: None.

Measurements And Main Results: A unified clustering analysis across 14 discovery datasets revealed three subtypes, which, based on functional analysis, we termed "Inflammopathic, Adaptive, and Coagulopathic." We then validated these subtypes in nine independent datasets from five different countries (n = 600). In both discovery and validation data, the Adaptive subtype is associated with a lower clinical severity and lower mortality rate, and the Coagulopathic subtype is associated with higher mortality and clinical coagulopathy. Further, these clusters are statistically associated with clusters derived by others in independent single sepsis cohorts.

Conclusions: The three sepsis subtypes may represent a unifying framework for understanding the molecular heterogeneity of the sepsis syndrome. Further study could potentially enable a precision medicine approach of matching novel immunomodulatory therapies with septic patients most likely to benefit.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CCM.0000000000003084DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5953807PMC
June 2018

A community approach to mortality prediction in sepsis via gene expression analysis.

Nat Commun 2018 02 15;9(1):694. Epub 2018 Feb 15.

Department of Pharmacology, University of South Alabama, Mobile, AL, 36688, USA.

Improved risk stratification and prognosis prediction in sepsis is a critical unmet need. Clinical severity scores and available assays such as blood lactate reflect global illness severity with suboptimal performance, and do not specifically reveal the underlying dysregulation of sepsis. Here, we present prognostic models for 30-day mortality generated independently by three scientific groups by using 12 discovery cohorts containing transcriptomic data collected from primarily community-onset sepsis patients. Predictive performance is validated in five cohorts of community-onset sepsis patients in which the models show summary AUROCs ranging from 0.765-0.89. Similar performance is observed in four cohorts of hospital-acquired sepsis. Combining the new gene-expression-based prognostic models with prior clinical severity scores leads to significant improvement in prediction of 30-day mortality as measured via AUROC and net reclassification improvement index These models provide an opportunity to develop molecular bedside tests that may improve risk stratification and mortality prediction in patients with sepsis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-03078-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5814463PMC
February 2018

Molecular, phenotypic, and sample-associated data to describe pluripotent stem cell lines and derivatives.

Sci Data 2017 03 28;4:170030. Epub 2017 Mar 28.

Sage Bionetworks, Seattle, Washington 98109, USA.

The use of induced pluripotent stem cells (iPSC) derived from independent patients and sources holds considerable promise to improve the understanding of development and disease. However, optimized use of iPSC depends on our ability to develop methods to efficiently qualify cell lines and protocols, monitor genetic stability, and evaluate self-renewal and differentiation potential. To accomplish these goals, 57 stem cell lines from 10 laboratories were differentiated to 7 different states, resulting in 248 analyzed samples. Cell lines were differentiated and characterized at a central laboratory using standardized cell culture methodologies, protocols, and metadata descriptors. Stem cell and derived differentiated lines were characterized using RNA-seq, miRNA-seq, copy number arrays, DNA methylation arrays, flow cytometry, and molecular histology. All materials, including raw data, metadata, analysis and processing code, and methodological and provenance documentation are publicly available for re-use and interactive exploration at https://www.synapse.org/pcbc. The goal is to provide data that can improve our ability to robustly and reproducibly use human pluripotent stem cells to understand development and disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.30DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5369318PMC
March 2017

Spatiotemporal Reconstruction of the Human Blastocyst by Single-Cell Gene-Expression Analysis Informs Induction of Naive Pluripotency.

Dev Cell 2016 07;38(1):100-15

Department of Obstetrics and Gynecology, Stanford University, Stanford, CA 94305, USA; Institute for Stem Cell Biology & Regenerative Medicine, Stanford University, Stanford, CA 94305, USA. Electronic address:

Human preimplantation embryo development involves complex cellular and molecular events that lead to the establishment of three cell lineages in the blastocyst: trophectoderm, primitive endoderm, and epiblast. Owing to limited resources of biological specimens, our understanding of how the earliest lineage commitments are regulated remains narrow. Here, we examined gene expression in 241 individual cells from early and late human blastocysts to delineate dynamic gene-expression changes. We distinguished all three lineages and further developed a 3D model of the inner cell mass and trophectoderm in which individual cells were mapped into distinct expression domains. We identified in silico precursors of the epiblast and primitive endoderm lineages and revealed a role for MCRS1, TET1, and THAP11 in epiblast formation and their ability to induce naive pluripotency in vitro. Our results highlight the potential of single-cell gene-expression analysis in human preimplantation development to instruct human stem cell biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.devcel.2016.06.014DOI Listing
July 2016

Integrated Genomic Analysis of Diverse Induced Pluripotent Stem Cells from the Progenitor Cell Biology Consortium.

Stem Cell Reports 2016 07 9;7(1):110-25. Epub 2016 Jun 9.

Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA; Hoxworth Blood Center, University of Cincinnati, Cincinnati, OH 45229, USA. Electronic address:

The rigorous characterization of distinct induced pluripotent stem cells (iPSC) derived from multiple reprogramming technologies, somatic sources, and donors is required to understand potential sources of variability and downstream potential. To achieve this goal, the Progenitor Cell Biology Consortium performed comprehensive experimental and genomic analyses of 58 iPSC from ten laboratories generated using a variety of reprogramming genes, vectors, and cells. Associated global molecular characterization studies identified functionally informative correlations in gene expression, DNA methylation, and/or copy-number variation among key developmental and oncogenic regulators as a result of donor, sex, line stability, reprogramming technology, and cell of origin. Furthermore, X-chromosome inactivation in PSC produced highly correlated differences in teratoma-lineage staining and regulator expression upon differentiation. All experimental results, and raw, processed, and metadata from these analyses, including powerful tools, are interactively accessible from a new online portal at https://www.synapse.org to serve as a reusable resource for the stem cell community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2016.05.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4944587PMC
July 2016

Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases.

Nat Biotechnol 2016 05 11;34(5):531-8. Epub 2016 Apr 11.

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

Genetic studies of human disease have traditionally focused on the detection of disease-causing mutations in afflicted individuals. Here we describe a complementary approach that seeks to identify healthy individuals resilient to highly penetrant forms of genetic childhood disorders. A comprehensive screen of 874 genes in 589,306 genomes led to the identification of 13 adults harboring mutations for 8 severe Mendelian conditions, with no reported clinical manifestation of the indicated disease. Our findings demonstrate the promise of broadening genetic studies to systematically search for well individuals who are buffering the effects of rare, highly penetrant, deleterious mutations. They also indicate that incomplete penetrance for Mendelian diseases is likely more common than previously believed. The identification of resilient individuals may provide a first step toward uncovering protective genetic variants that could help elucidate the mechanisms of Mendelian diseases and new therapeutic strategies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3514DOI Listing
May 2016

PERSONALIZED HYPOTHESIS TESTS FOR DETECTING MEDICATION RESPONSE IN PARKINSON DISEASE PATIENTS USING iPHONE SENSOR DATA.

Pac Symp Biocomput 2016 ;21:273-84

Sage Bionetworks, 1100 Fairview Avenue North, Seattle, Washington 98109, USA*Corresponding author.,

We propose hypothesis tests for detecting dopaminergic medication response in Parkinson disease patients, using longitudinal sensor data collected by smartphones. The processed data is composed of multiple features extracted from active tapping tasks performed by the participant on a daily basis, before and after medication, over several months. Each extracted feature corresponds to a time series of measurements annotated according to whether the measurement was taken before or after the patient has taken his/her medication. Even though the data is longitudinal in nature, we show that simple hypothesis tests for detecting medication response, which ignore the serial correlation structure of the data, are still statistically valid, showing type I error rates at the nominal level. We propose two distinct personalized testing approaches. In the first, we combine multiple feature-specific tests into a single union-intersection test. In the second, we construct personalized classifiers of the before/after medication labels using all the extracted features of a given participant, and test the null hypothesis that the area under the receiver operating characteristic curve of the classifier is equal to 1/2. We compare the statistical power of the personalized classifier tests and personalized union-intersection tests in a simulation study, and illustrate the performance of the proposed tests using data from mPower Parkinsons disease study, recently launched as part of Apples ResearchKit mobile platform. Our results suggest that the personalized tests, which ignore the longitudinal aspect of the data, can perform well in real data analyses, suggesting they might be used as a sound baseline approach, to which more sophisticated methods can be compared to.
View Article and Find Full Text PDF

Download full-text PDF

Source
October 2016

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Cell 2014 Aug 7;158(4):929-944. Epub 2014 Aug 7.

Department of Medicine, University of California San Francisco, 450 35d St, San Francisco, CA, 94148, USA.

Recent genomic analyses of pathologically defined tumor types identify "within-a-tissue" disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2014.06.049DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152462PMC
August 2014

Assessing the clinical utility of cancer genomic and proteomic data across tumor types.

Nat Biotechnol 2014 Jul 22;32(7):644-52. Epub 2014 Jun 22.

1] Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, USA. [2] Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA. [3].

Molecular profiling of tumors promises to advance the clinical management of cancer, but the benefits of integrating molecular data with traditional clinical variables have not been systematically studied. Here we retrospectively predict patient survival using diverse molecular data (somatic copy-number alteration, DNA methylation and mRNA, microRNA and protein expression) from 953 samples of four cancer types from The Cancer Genome Atlas project. We find that incorporating molecular data with clinical variables yields statistically significantly improved predictions (FDR < 0.05) for three cancers but those quantitative gains were limited (2.2-23.9%). Additional analyses revealed little predictive power across tumor types except for one case. In clinically relevant genes, we identified 10,281 somatic alterations across 12 cancer types in 2,928 of 3,277 patients (89.4%), many of which would not be revealed in single-tumor analyses. Our study provides a starting point and resources, including an open-access model evaluation platform, for building reliable prognostic and therapeutic strategies that incorporate molecular data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.2940DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4102885PMC
July 2014

Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas.

Nat Genet 2013 Oct;45(10):1121-6

1] Sage Bionetworks, Seattle, Washington, USA. [2].

The Cancer Genome Atlas Pan-Cancer Analysis Working Group collaborated on the Synapse software platform to share and evolve data, results and methodologies while performing integrative analysis of molecular profiling data from 12 tumor types. The group's work serves as a pilot case study that provides (i) a template for future large collaborative studies; (ii) a system to support collaborative projects; and (iii) a public resource of highly curated data, results and automated systems for the evaluation of community-developed models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.2761DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3950337PMC
October 2013

PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations.

Hum Biol 2012 Aug;84(4):343-64

Department of Biostatistics and Computational Biology, Cornell University, Ithaca, NY, USA.

Identifying ancestry along each chromosome in admixed individuals provides a wealth of information for understanding the population genetic history of admixture events and is valuable for admixture mapping and identifying recent targets of selection. We present PCAdmix (available at https://sites.google.com/site/pcadmix/home ), a Principal Components-based algorithm for determining ancestry along each chromosome from a high-density, genome-wide set of phased single-nucleotide polymorphism (SNP) genotypes of admixed individuals. We compare our method to HAPMIX on simulated data from two ancestral populations, and we find high concordance between the methods. Our method also has better accuracy than LAMP when applied to three-population admixture, a situation as yet unaddressed by HAPMIX. Finally, we apply our method to a data set of four Latino populations with European, African, and Native American ancestry. We find evidence of assortative mating in each of the four populations, and we identify regions of shared ancestry that may be recent targets of selection and could serve as candidate regions for admixture-based association mapping.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3378/027.084.0401DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3740525PMC
August 2012

Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

Am J Hum Genet 2012 Oct;91(4):660-71

Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.

Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2012.08.025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3484644PMC
October 2012

Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations.

BMC Genet 2012 Jun 26;13:49. Epub 2012 Jun 26.

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.

Background: Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries.

Results: Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information.

Conclusions: By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2156-13-49DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3512499PMC
June 2012