Publications by authors named "Tianyuan Lu"

16 Publications

  • Page 1 of 1

Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression.

Genet Epidemiol 2021 Sep 1. Epub 2021 Sep 1.

Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.

Medical research increasingly includes high-dimensional regression modeling with a need for error-in-variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error-corrected cross-validation to enable error-in-variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high-dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross-validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate-adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error-in-variables adjustments more accessible for high-dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics-facilitated personalized medicine research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22430DOI Listing
September 2021

Detecting cord blood cell type-specific epigenetic associations with gestational diabetes mellitus and early childhood growth.

Clin Epigenetics 2021 Jun 26;13(1):131. Epub 2021 Jun 26.

Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chemin de La Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada.

Background: Epigenome-wide association studies (EWAS) have provided opportunities to understand the role of epigenetic mechanisms in development and pathophysiology of many chronic diseases. However, an important limitation of conventional EWAS is that profiles of epigenetic variability are often obtained in samples of mixed cell types. Here, we aim to assess whether changes in cord blood DNA methylation (DNAm) associated with gestational diabetes mellitus (GDM) exposure and early childhood growth markers occur in a cell type-specific manner.

Results: We analyzed 275 cord blood samples collected at delivery from a prospective pre-birth cohort with genome-wide DNAm profiled by the Illumina MethylationEPIC array. We estimated proportions of seven common cell types in each sample using a cord blood-specific DNAm reference panel. Leveraging a recently developed approach named CellDMC, we performed cell type-specific EWAS to identify CpG loci significantly associated with GDM, or 3-year-old body mass index (BMI) z-score. A total of 1410 CpG loci displayed significant cell type-specific differences in methylation level between 23 GDM cases and 252 controls with a false discovery rate < 0.05. Gene Ontology enrichment analysis indicated that LDL transportation emerged from CpG specifically identified from B-cells DNAm analyses and the mitogen-activated protein kinase pathway emerged from CpG specifically identified from natural killer cells DNAm analyses. In addition, we identified four and six loci associated with 3-year-old BMI z-score that were specific to CD8+ T-cells and monocytes, respectively. By performing genome-wide permutation tests, we validated that most of our detected signals had low false positive rates.

Conclusion: Compared to conventional EWAS adjusting for the effects of cell type heterogeneity, the proposed approach based on cell type-specific EWAS could provide additional biologically meaningful associations between CpG methylation, prenatal maternal GDM or 3-year-old BMI. With careful validation, these findings may provide new insights into the pathogenesis, programming, and consequences of related childhood metabolic dysregulation. Therefore, we propose that cell type-specific analyses are worth cautious explorations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13148-021-01114-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8236204PMC
June 2021

A Polygenic Risk Score to Predict Future Adult Short Stature Among Children.

J Clin Endocrinol Metab 2021 Jun;106(7):1918-1928

Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Canada.

Context: Adult height is highly heritable, yet no genetic predictor has demonstrated clinical utility compared to mid-parental height.

Objective: To develop a polygenic risk score for adult height and evaluate its clinical utility.

Design: A polygenic risk score was constructed based on meta-analysis of genomewide association studies and evaluated on the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort.

Subjects: Participants included 442 599 genotyped White British individuals in the UK Biobank and 941 genotyped child-parent trios of European ancestry in the ALSPAC cohort.

Interventions: None.

Main Outcome Measures: Standing height was measured using stadiometer; Standing height 2 SDs below the sex-specific population average was considered as short stature.

Results: Combined with sex, a polygenic risk score captured 71.1% of the total variance in adult height in the UK Biobank. In the ALSPAC cohort, the polygenic risk score was able to identify children who developed adulthood short stature with an area under the receiver operating characteristic curve (AUROC) of 0.84, which is close to that of mid-parental height. Combining this polygenic risk score with mid-parental height or only one of the child's parent's height could improve the AUROC to at most 0.90. The polygenic risk score could also substitute mid-parental height in age-specific Khamis-Roche height predictors and achieve an equally strong discriminative power in identifying children with a short stature in adulthood.

Conclusions: A polygenic risk score could be considered as an alternative or adjunct to mid-parental height to improve screening for children at risk of developing short stature in adulthood in European ancestry populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1210/clinem/dgab215DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8266463PMC
June 2021

Evolutionary Contribution of Duplicated Genes to Genome Evolution in the Ginseng Species Complex.

Genome Biol Evol 2021 05;13(5)

Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, School of Life Sciences, Fudan University, Shanghai, China.

Genes duplicated by whole genome duplication (WGD) and small-scale duplication (SSD) have played important roles in adaptive evolution of all flowering plants. However, it still remains underinvestigated how the distinct models of duplication events and their contending evolutionary patterns have shaped the genome and epigenomes of extant plant species. In this study, we investigated the contribution of the WGD- and SSD-derived duplicate genes to the genome evolution of one diploid and three closely related allotetraploid Panax species based on genome, methylome, and proteome data sets. Our genome-wide comparative analyses revealed that although the ginseng species complex was recently diverged, they have evolved distinct overall patterns of nucleotide variation, cytosine methylation, and protein-level expression. In particular, genetic and epigenetic asymmetries observed in the recent WGD-derived genes are largely consistent across the ginseng species complex. In addition, our results revealed that gene duplicates generated by ancient WGD and SSD mechanisms exhibited distinct evolutionary patterns. We found the ancient WGD-derived genes (i.e., ancient collinear gene) are genetically more conserved and hypomethylated at the cytosine sites. In contrast, some of the SSD-derived genes (i.e., dispersal duplicated gene) showed hypermethylation and high variance in nucleotide variation pattern. Functional enrichment analyses of the duplicated genes indicated that adaptation-related traits (i.e., photosynthesis) created during the distant ancient WGDs are further strengthened by both the more recent WGD and SSD. Together, our findings suggest that different types of duplicated genes may have played distinct but relaying evolutionary roles in the polyploidization and speciation processes in the ginseng species complex.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evab051DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8103499PMC
May 2021

Improved prediction of fracture risk leveraging a genome-wide polygenic risk score.

Genome Med 2021 02 3;13(1):16. Epub 2021 Feb 3.

Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Room H-413, 3755 Chemin de la Côte-Sainte-Catherine, Montreal, Quebec, H3T 1E2, Canada.

Background: Accurately quantifying the risk of osteoporotic fracture is important for directing appropriate clinical interventions. While skeletal measures such as heel quantitative speed of sound (SOS) and dual-energy X-ray absorptiometry bone mineral density are able to predict the risk of osteoporotic fracture, the utility of such measurements is subject to the availability of equipment and human resources. Using data from 341,449 individuals of white British ancestry, we previously developed a genome-wide polygenic risk score (PRS), called gSOS, that captured 25.0% of the total variance in SOS. Here, we test whether gSOS can improve fracture risk prediction.

Methods: We examined the predictive power of gSOS in five genome-wide genotyped cohorts, including 90,172 individuals of European ancestry and 25,034 individuals of Asian ancestry. We calculated gSOS for each individual and tested for the association between gSOS and incident major osteoporotic fracture and hip fracture. We tested whether adding gSOS to the risk prediction models had added value over models using other commonly used clinical risk factors.

Results: A standard deviation decrease in gSOS was associated with an increased odds of incident major osteoporotic fracture in populations of European ancestry, with odds ratios ranging from 1.35 to 1.46 in four cohorts. It was also associated with a 1.26-fold (95% confidence interval (CI) 1.13-1.41) increased odds of incident major osteoporotic fracture in the Asian population. We demonstrated that gSOS was more predictive of incident major osteoporotic fracture (area under the receiver operating characteristic curve (AUROC) = 0.734; 95% CI 0.727-0.740) and incident hip fracture (AUROC = 0.798; 95% CI 0.791-0.805) than most traditional clinical risk factors, including prior fracture, use of corticosteroids, rheumatoid arthritis, and smoking. We also showed that adding gSOS to the Fracture Risk Assessment Tool (FRAX) could refine the risk prediction with a positive net reclassification index ranging from 0.024 to 0.072.

Conclusions: We generated and validated a PRS for SOS which was associated with the risk of fracture. This score was more strongly associated with the risk of fracture than many clinical risk factors and provided an improvement in risk prediction. gSOS should be explored as a tool to improve risk stratification to identify individuals at high risk of fracture.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-021-00838-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7860212PMC
February 2021

H3K27M in Gliomas Causes a One-Step Decrease in H3K27 Methylation and Reduced Spreading within the Constraints of H3K36 Methylation.

Cell Rep 2020 11;33(7):108390

Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada; McGill Genome Centre, Montreal, QC H3A 0G1, Canada. Electronic address:

The discovery of H3K27M mutations in pediatric gliomas marked a new chapter in cancer epigenomics. Numerous studies have investigated the effect of this mutation on H3K27 trimethylation, but only recently have we started to realize its additional effects on the epigenome. Here, we use isogenic glioma H3K27M cell lines to investigate H3K27 methylation and its interaction with H3K36 and H3K9 modifications. We describe a "step down" effect of H3K27M on the distribution of H3K27 methylation: me3 is reduced to me2, me2 is reduced to me1, whereas H3K36me2/3 delineates the boundaries for the spread of H3K27me marks. We also observe a replacement of H3K27me2/3 silencing by H3K9me3. Using a computational simulation, we explain our observations by reduced effectiveness of PRC2 and constraints imposed on the deposition of H3K27me by antagonistic H3K36 modifications. Our work further elucidates the effects of H3K27M in gliomas as well as the general principles of deposition in H3K27 methylation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2020.108390DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703850PMC
November 2020

Investigating transcriptome-wide sex dimorphism by multi-level analysis of single-cell RNA sequencing data in ten mouse cell types.

Biol Sex Differ 2020 11 5;11(1):61. Epub 2020 Nov 5.

Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, 4072, Australia.

Background: It is a long established fact that sex is an important factor that influences the transcriptional regulatory processes of an organism. However, understanding sex-based differences in gene expression has been limited because existing studies typically sequence and analyze bulk tissue from female or male individuals. Such analyses average cell-specific gene expression levels where cell-to-cell variation can easily be concealed. We therefore sought to utilize data generated by the rapidly developing single cell RNA sequencing (scRNA-seq) technology to explore sex dimorphism and its functional consequences at the single cell level.

Methods: Our study included scRNA-seq data of ten well-defined cell types from the brain and heart of female and male young adult mice in the publicly available tissue atlas dataset, Tabula Muris. We combined standard differential expression analysis with the identification of differential distributions in single cell transcriptomes to test for sex-based gene expression differences in each cell type. The marker genes that had sex-specific inter-cellular changes in gene expression formed the basis for further characterization of the cellular functions that were differentially regulated between the female and male cells. We also inferred activities of transcription factor-driven gene regulatory networks by leveraging knowledge of multidimensional protein-to-genome and protein-to-protein interactions and analyzed pathways that were potential modulators of sex differentiation and dimorphism.

Results: For each cell type in this study, we identified marker genes with significantly different mean expression levels or inter-cellular distribution characteristics between female and male cells. These marker genes were enriched in pathways that were closely related to the biological functions of each cell type. We also identified sub-cell types that possibly carry out distinct biological functions that displayed discrepancies between female and male cells. Additionally, we found that while genes under differential transcriptional regulation exhibited strong cell type specificity, six core transcription factor families responsible for most sex-dimorphic transcriptional regulation activities were conserved across the cell types, including ASCL2, EGR, GABPA, KLF/SP, RXRα, and ZF.

Conclusions: We explored novel gene expression-based biomarkers, functional cell group compositions, and transcriptional regulatory networks associated with sex dimorphism with a novel computational pipeline. Our findings indicated that sex dimorphism might be widespread across the transcriptomes of cell types, cell type-specific, and impactful for regulating cellular activities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13293-020-00335-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7643324PMC
November 2020

Individuals with common diseases but with a low polygenic risk score could be prioritized for rare variant screening.

Genet Med 2021 03 28;23(3):508-515. Epub 2020 Oct 28.

Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada.

Purpose: Identifying rare genetic causes of common diseases can improve diagnostic and treatment strategies, but incurs high costs. We tested whether individuals with common disease and low polygenic risk score (PRS) for that disease generated from less expensive genome-wide genotyping data are more likely to carry rare pathogenic variants.

Methods: We identified patients with one of five common complex diseases among 44,550 individuals who underwent exome sequencing in the UK Biobank. We derived PRS for these five diseases, and identified pathogenic rare variant heterozygotes. We tested whether individuals with disease and low PRS were more likely to carry rare pathogenic variants.

Results: While rare pathogenic variants conferred, at most, 5.18-fold (95% confidence interval [CI]: 2.32-10.13) increased odds of disease, a standard deviation increase in PRS, at most, increased the odds of disease by 5.25-fold (95% CI: 5.06-5.45). Among diseased patients, a standard deviation decrease in the PRS was associated with, at most, 2.82-fold (95% CI: 1.14-7.46) increased odds of identifying rare variant heterozygotes.

Conclusion: Rare pathogenic variants were more prevalent among affected patients with a low PRS. Therefore, prioritizing individuals for sequencing who have disease but low PRS may increase the yield of sequencing studies to identify rare variant heterozygotes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41436-020-01007-7DOI Listing
March 2021

Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models.

PLoS Genet 2020 05 4;16(5):e1008766. Epub 2020 May 4.

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec, Canada.

Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects' relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008766DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7224575PMC
May 2020

Polygenic risk for coronary heart disease acts through atherosclerosis in type 2 diabetes.

Cardiovasc Diabetol 2020 01 30;19(1):12. Epub 2020 Jan 30.

Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada.

Background: Type 2 diabetes increases the risk of coronary heart disease (CHD), yet the mechanisms involved remain poorly described. Polygenic risk scores (PRS) provide an opportunity to understand risk factors since they reflect etiologic pathways from the entire genome. We therefore tested whether a PRS for CHD influenced risk of CHD in individuals with type 2 diabetes and which risk factors were associated with this PRS.

Methods: We tested the association of a CHD PRS with CHD and its traditional clinical risk factors amongst individuals with type 2 diabetes in UK Biobank (N = 21,102). We next tested the association of the CHD PRS with atherosclerotic burden in a cohort of 352 genome-wide genotyped participants with type 2 diabetes who had undergone coronary angiograms.

Results: In the UK Biobank we found that the CHD PRS was strongly associated with CHD amongst individuals with type 2 diabetes (OR per standard deviation increase = 1.50; p = 1.5 × 10). But this CHD PRS was, at best, only weakly associated with traditional clinical risk factors, such as hypertension, hyperlipidemia, glycemic control, obesity and smoking. Conversely, in the angiographic cohort, the CHD PRS was strongly associated with multivessel stenosis (OR = 1.65; p = 4.9 × 10) and increased number of major stenotic lesions (OR = 1.35; p = 9.4 × 10).

Conclusions: Polygenic predisposition to CHD is strongly associated with atherosclerotic burden in individuals with type 2 diabetes and this effect is largely independent of traditional clinical risk factors. This suggests that genetic risk for CHD acts through atherosclerosis with little effect on most traditional risk factors, providing the opportunity to explore new biological pathways.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12933-020-0988-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6993460PMC
January 2020

Whole-genome bisulfite sequencing in systemic sclerosis provides novel targets to understand disease pathogenesis.

BMC Med Genomics 2019 10 24;12(1):144. Epub 2019 Oct 24.

Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Côte Sainte-Catherine Road, Montreal, H3T 1E2, Canada.

Background: Systemic sclerosis (SSc) is a rare autoimmune connective tissue disease whose pathogenesis remains incompletely understood. Increasing evidence suggests that both genetic susceptibilities and changes in DNA methylation influence pivotal biological pathways and thereby contribute to the disease. The role of DNA methylation in SSc has not been fully elucidated, because existing investigations of DNA methylation predominantly focused on nucleotide CpGs within restricted genic regions, and were performed on samples containing mixed cell types.

Methods: We performed whole-genome bisulfite sequencing on purified CD4+ T lymphocytes from nine SSc patients and nine controls in a pilot study, and then profiled genome-wide cytosine methylation as well as genetic variations. We adopted robust statistical methods to identify differentially methylated genomic regions (DMRs). We then examined pathway enrichment associated with genes located in these DMRs. We also tested whether changes in CpG methylation were associated with adjacent genetic variation.

Results: We profiled DNA methylation at more than three million CpG dinucleotides genome-wide. We identified 599 DMRs associated with 340 genes, among which 54 genes exhibited further associations with adjacent genetic variation. We also found these genes were associated with pathways and functions that are known to be abnormal in SSc, including Wnt/β-catenin signaling pathway, skin lesion formation and progression, and angiogenesis.

Conclusion: The CD4+ T cell DNA cytosine methylation landscape in SSc involves crucial genes in disease pathogenesis. Some of the methylation patterns are also associated with genetic variation. These findings provide essential foundations for future studies of epigenetic regulation and genome-epigenome interaction in SSc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12920-019-0602-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6813992PMC
October 2019

Rapid Divergence Followed by Adaptation to Contrasting Ecological Niches of Two Closely Related Columbine Species Aquilegia japonica and A. oxysepala.

Genome Biol Evol 2019 03;11(3):919-930

Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, School of Life Sciences, Fudan University, Shanghai, China.

Elucidating the mechanisms underlying the genetic divergence between closely related species is crucial to understanding the origin and evolution of biodiversity. The genus Aquilegia L. has undergone rapid adaptive radiation, generating about 70 well-recognized species that are specialized to distinct habitats and pollinators. In this study, to address the underlying evolutionary mechanisms that drive the genetic divergence, we analyzed the whole genomes of two ecologically isolated Aquilegia species, A. oxysepala and A. japonica as well as their putative hybrid. Our comparative genomic analyses reveal that while the two species diverged only recently and experienced recurrent gene flow, a high level of genetic divergence is observed in their nuclear genomes. In particular, candidate genomic regions that show signature of selection differ dramatically between the two species. Given that the splitting time of the two species is broadly matched with the decrease in effective population sizes, we propose that allopatric isolation together with natural selection have preceded the interspecific gene flow in the process of speciation. The observed high genetic divergence is likely an outcome of combined effects of natural selection, genetic drift and divergent sorting of ancestral polymorphisms. Our study provides a genome-wide view of how genetic divergence has evolved between closely related species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evz038DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6433176PMC
March 2019

Indirect effect inference and application to GAW20 data.

BMC Genet 2018 09 17;19(Suppl 1):67. Epub 2018 Sep 17.

State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life Sciences, Fudan University, 2005 Songhu Road, Shanghai, 200438, China.

Background: Association studies using a single type of omics data have been successful in identifying disease-associated genetic markers, but the underlying mechanisms are unaddressed. To provide a possible explanation of how these genetic factors affect the disease phenotype, integration of multiple omics data is needed.

Results: We propose a novel method, LIPID (likelihood inference proposal for indirect estimation), that uses both single nucleotide polymorphism (SNP) and DNA methylation data jointly to analyze the association between a trait and SNPs. The total effect of SNPs is decomposed into direct and indirect effects, where the indirect effects are the focus of our investigation. Simulation studies show that LIPID performs better in various scenarios than existing methods. Application to the GAW20 data also leads to encouraging results, as the genes identified appear to be biologically relevant to the phenotype studied.

Conclusions: The proposed LIPID method is shown to be meritorious in extensive simulations and in real-data analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12863-018-0638-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157197PMC
September 2018

Urine Proteome Profiling Predicts Lung Cancer from Control Cases and Other Tumors.

EBioMedicine 2018 Apr 17;30:120-128. Epub 2018 Mar 17.

State Key Laboratory of Proteomics, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing Proteome Research Center, Beijing 102206, China; Joint Center for Translational Medicine, Tianjin, Baodi Hospital, Tianjin 301800, China; Alkek Center for Molecular Discovery, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA. Electronic address:

Development of noninvasive, reliable biomarkers for lung cancer diagnosis has many clinical benefits knowing that most of lung cancer patients are diagnosed at the late stage. For this purpose, we conducted proteomic analyses of 231 human urine samples in healthy individuals (n=33), benign pulmonary diseases (n=40), lung cancer (n=33), bladder cancer (n=17), cervical cancer (n=25), colorectal cancer (n=22), esophageal cancer (n=14), and gastric cancer (n=47) patients collected from multiple medical centers. By random forest modeling, we nominated a list of urine proteins that could separate lung cancers from other cases. With a feature selection algorithm, we selected a panel of five urinary biomarkers (FTL: Ferritin light chain; MAPK1IP1L: Mitogen-Activated Protein Kinase 1 Interacting Protein 1 Like; FGB: Fibrinogen Beta Chain; RAB33B: RAB33B, Member RAS Oncogene Family; RAB15: RAB15, Member RAS Oncogene Family) and established a combinatorial model that can correctly classify the majority of lung cancer cases both in the training set (n=46) and the test sets (n=14-47 per set) with an AUC ranging from 0.8747 to 0.9853. A combination of five urinary biomarkers not only discriminates lung cancer patients from control groups but also differentiates lung cancer from other common tumors. The biomarker panel and the predictive model, when validated by more samples in a multi-center setting, may be used as an auxiliary diagnostic tool along with imaging technology for lung cancer detection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ebiom.2018.03.009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5952250PMC
April 2018

Proof-of-Concept Workflow for Establishing Reference Intervals of Human Urine Proteome for Monitoring Physiological and Pathological Changes.

EBioMedicine 2017 Apr 22;18:300-310. Epub 2017 Mar 22.

State Key Laboratory of Proteomics, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing Proteome Research Center, Beijing 102206, China; Joint Center for Translational Medicine, Tianjin Baodi Hospital, Tianjin 301800, China; Alkek Center for Molecular Discovery, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA. Electronic address:

Urine as a true non-invasive sampling source holds great potential for biomarker discovery. While approximately 2000 proteins can be detected by mass spectrometry in urine from healthy people, the amount of these proteins vary considerably. A systematic evaluation of a large number of samples is needed to determine the range of the variations. Current biomarker studies often measure limited number of urine samples in the discovery phase, which makes it difficult to determine whether proteins differentially expressed between control and disease groups represent actual difference, or are just physiological variations among the individuals, leads to failures in the validation phase with the increased sample numbers. Here, we report a streamlined workflow with capacity of measuring 8 urine proteomes per day at the coverage of >1500 proteins. With this workflow, we evaluated variations in 497 urine proteomes from 167 healthy donors, establishing reference intervals (RIs) that covered urine protein variations. We demonstrated that RIs could be used to monitor physiological changes by detecting transient outlier proteins. Furthermore, we provided a RIs-based algorithm for biomarker discovery and validation to screen for diseases such as cancer. This study provided a proof-of-principle workflow for the use of urine proteome for health monitoring and disease screening.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ebiom.2017.03.028DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5405183PMC
April 2017

A fast workflow for identification and quantification of proteomes.

Mol Cell Proteomics 2013 Aug 13;12(8):2370-80. Epub 2013 May 13.

State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 102206, China.

The current in-depth proteomics makes use of long chromatography gradient to get access to more peptides for protein identification, resulting in covering of as many as 8000 mammalian gene products in 3 days of mass spectrometer running time. Here we report a fast sequencing (Fast-seq) workflow of the use of dual reverse phase high performance liquid chromatography - mass spectrometry (HPLC-MS) with a short gradient to achieve the same proteome coverage in 0.5 day. We adapted this workflow to a quantitative version (Fast quantification, Fast-quan) that was compatible to large-scale protein quantification. We subjected two identical samples to the Fast-quan workflow, which allowed us to systematically evaluate different parameters that impact the sensitivity and accuracy of the workflow. Using the statistics of significant test, we unraveled the existence of substantial falsely quantified differential proteins and estimated correlation of false quantification rate and parameters that are applied in label-free quantification. We optimized the setting of parameters that may substantially minimize the rate of falsely quantified differential proteins, and further applied them on a real biological process. With improved efficiency and throughput, we expect that the Fast-seq/Fast-quan workflow, allowing pair wise comparison of two proteomes in 1 day may make MS available to the masses and impact biomedical research in a positive way.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1074/mcp.O112.025023DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3734592PMC
August 2013
-->