Publications by authors named "Ewan Birney"

202 Publications

Highly accurate protein structure prediction for the human proteome.

Nature 2021 Aug 22;596(7873):590-596. Epub 2021 Jul 22.

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03828-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8387240PMC
August 2021

The International Human Genome Project.

Authors:
Ewan Birney

Hum Mol Genet 2021 Jul 15. Epub 2021 Jul 15.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton UK.

The human genome project was conceived and executed as an international project, due to both pragmatic and principled reasons. This internationality has served the project well, with the resulting human genome being freely available for all researchers in all countries. Over time the reference human genome will likely have to evolve to a graph genome, and tap into more diverse sequences worldwide. A similar international mindset underpins data analysis for the interpretation of the human genome from basic to clinical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddab198DOI Listing
July 2021

Genetic variation affects morphological retinal phenotypes extracted from UK Biobank optical coherence tomography images.

PLoS Genet 2021 05 12;17(5):e1009497. Epub 2021 May 12.

School of Life Course Sciences, Section of Ophthalmology, King's College London, London, United Kingdom.

Optical Coherence Tomography (OCT) enables non-invasive imaging of the retina and is used to diagnose and manage ophthalmic diseases including glaucoma. We present the first large-scale genome-wide association study of inner retinal morphology using phenotypes derived from OCT images of 31,434 UK Biobank participants. We identify 46 loci associated with thickness of the retinal nerve fibre layer or ganglion cell inner plexiform layer. Only one of these loci has been associated with glaucoma, and despite its clear role as a biomarker for the disease, Mendelian randomisation does not support inner retinal thickness being on the same genetic causal pathway as glaucoma. We extracted overall retinal thickness at the fovea, representative of foveal hypoplasia, with which three of the 46 SNPs were associated. We additionally associate these three loci with visual acuity. In contrast to the Mendelian causes of severe foveal hypoplasia, our results suggest a spectrum of foveal hypoplasia, in part genetically determined, with consequences on visual function.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1009497DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8143408PMC
May 2021

Genome-wide meta-analysis identifies 127 open-angle glaucoma loci with consistent effect across ancestries.

Nat Commun 2021 02 24;12(1):1258. Epub 2021 Feb 24.

Faculty of Medicine, University of Southampton, Southampton, UK.

Primary open-angle glaucoma (POAG), is a heritable common cause of blindness world-wide. To identify risk loci, we conduct a large multi-ethnic meta-analysis of genome-wide association studies on a total of 34,179 cases and 349,321 controls, identifying 44 previously unreported risk loci and confirming 83 loci that were previously known. The majority of loci have broadly consistent effects across European, Asian and African ancestries. Cross-ancestry data improve fine-mapping of causal variants for several loci. Integration of multiple lines of genetic evidence support the functional relevance of the identified POAG risk loci and highlight potential contributions of several genes to POAG pathogenesis, including SVEP1, RERE, VCAM1, ZNF638, CLIC5, SLC2A12, YAP1, MXRA5, and SMAD6. Several drug compounds targeting POAG risk genes may be potential glaucoma therapeutic candidates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-20851-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7904932PMC
February 2021

The European Bioinformatics Institute: empowering cooperation in response to a global health crisis.

Nucleic Acids Res 2021 01;49(D1):D29-D37

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1077DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778996PMC
January 2021

Genetic and functional insights into the fractal structure of the heart.

Nature 2020 08 19;584(7822):589-594. Epub 2020 Aug 19.

MRC London Institute of Medical Sciences, Imperial College London, London, UK.

The inner surfaces of the human heart are covered by a complex network of muscular strands that is thought to be a remnant of embryonic development. The function of these trabeculae in adults and their genetic architecture are unknown. Here we performed a genome-wide association study to investigate image-derived phenotypes of trabeculae using the fractal analysis of trabecular morphology in 18,096 participants of the UK Biobank. We identified 16 significant loci that contain genes associated with haemodynamic phenotypes and regulation of cytoskeletal arborization. Using biomechanical simulations and observational data from human participants, we demonstrate that trabecular morphology is an important determinant of cardiac performance. Through genetic association studies with cardiac disease phenotypes and Mendelian randomization, we find a causal relationship between trabecular morphology and risk of cardiovascular disease. These findings suggest a previously unknown role for myocardial trabeculae in the function of the adult heart, identify conserved pathways that regulate structural complexity and reveal the influence of the myocardial trabeculae on susceptibility to cardiovascular disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2635-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7116759PMC
August 2020

The European Bioinformatics Institute in 2020: building a global infrastructure of interconnected data resources for the life sciences.

Nucleic Acids Res 2020 01;48(D1):D17-D23

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Data resources at the European Bioinformatics Institute (EMBL-EBI, https://www.ebi.ac.uk/) archive, organize and provide added-value analysis of research data produced around the world. This year's update for EMBL-EBI focuses on data exchanges among resources, both within the institute and with a wider global infrastructure. Within EMBL-EBI, data resources exchange data through a rich network of data flows mediated by automated systems. This network ensures that users are served with as much information as possible from any search and any starting point within EMBL-EBI's websites. EMBL-EBI data resources also exchange data with hundreds of other data resources worldwide and collectively are a key component of a global infrastructure of interconnected life sciences data resources. We also describe the BioImage Archive, a deposition database for raw images derived from primary research that will supply data for future knowledgebases that will add value through curation of primary image data. We also report a new release of the PRIDE database with an improved technical infrastructure, a new API, a new webpage, and improved data exchange with UniProt and Expression Atlas. Training is a core mission of EMBL-EBI and in 2018 our training team served more users, both in-person and through web-based programmes, than ever before.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz1033DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6943058PMC
January 2020

Comparison of Associations with Different Macular Inner Retinal Thickness Parameters in a Large Cohort: The UK Biobank.

Ophthalmology 2020 01 21;127(1):62-71. Epub 2019 Aug 21.

NIHR Biomedical Research Centre, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, United Kingdom; Discipline of Clinical Ophthalmology and Eye Health, University of Sydney Medical School, Sydney, Australia.

Purpose: To describe and compare associations with macular retinal nerve fiber layer (mRNFL), ganglion cell complex (GCC), and ganglion cell-inner plexiform layer (GCIPL) thicknesses in a large cohort.

Design: Cross-sectional study.

Participants: We included 42 044 participants in the UK Biobank. The mean age was 56 years.

Methods: Spectral-domain OCT macular images were segmented and analyzed. Corneal-compensated intraocular pressure (IOPcc) was measured with the Ocular Response Analyzer (Reichert, Corp., Buffalo, NY). Multivariable linear regression was used to examine associations with mean mRNFL, GCC, and GCIPL thicknesses. Factors examined were age, sex, ethnicity, height, body mass index (BMI), smoking status, alcohol intake, Townsend deprivation index, education level, diabetes status, spherical equivalent, and IOPcc.

Main Outcome Measures: Thicknesses of mRNFL, GCC, and GCIPL.

Results: We identified several novel independent associations with thinner inner retinal thickness. Thinner inner retina was associated with alcohol intake (most significant for GCIPL: -0.46 μm for daily or almost daily intake compared with special occasion only or never [95% confidence interval (CI), 0.61-0.30]; P = 1.1×10), greater social deprivation (most significant for GCIPL: -0.28 μm for most deprived quartile compared with least deprived quartile [95% CI, -0.42 to -0.14]; P = 6.6×10), lower educational attainment (most significant for mRNFL: -0.36 μm for less than O level compared with degree level [95% CI, -0.45 to 0.26]; P = 2.3×10), and nonwhite ethnicity (most significant for mRNFL comparing blacks with whites: -1.65 μm [95% CI, -1.86 to -1.43]; P = 2.4×10). Corneal-compensated intraocular pressure was associated most significantly with GCIPL (-0.04 μm/mmHg [95% CI, -0.05 to -0.03]; P = 4.0×10) and was not associated significantly with mRNFL (0.00 μm/mmHg [95% CI, -0.01 to 0.01]; P = 0.77). The variables examined explained a greater proportion of the variance of GCIPL (11%) than GCC (6%) or mRNFL (7%).

Conclusions: The novel associations we identified may be important to consider when using inner retinal parameters as a diagnostic tool. Associations generally were strongest with GCIPL, particularly for IOP. This suggests that GCIPL may be the superior inner retinal biomarker for macular pathophysiologic processes and especially for glaucoma.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ophtha.2019.08.015DOI Listing
January 2020

Leveraging European infrastructures to access 1 million human genomes by 2022.

Nat Rev Genet 2019 11 27;20(11):693-701. Epub 2019 Aug 27.

Global Alliance for Genomics and Health, Toronto, Ontario, Canada.

Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41576-019-0156-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115898PMC
November 2019

Integrative analysis of gene expression, DNA methylation, physiological traits, and genetic variation in human skeletal muscle.

Proc Natl Acad Sci U S A 2019 05 10;116(22):10883-10888. Epub 2019 May 10.

Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892;

We integrate comeasured gene expression and DNA methylation (DNAme) in 265 human skeletal muscle biopsies from the FUSION study with >7 million genetic variants and eight physiological traits: height, waist, weight, waist-hip ratio, body mass index, fasting serum insulin, fasting plasma glucose, and type 2 diabetes. We find hundreds of genes and DNAme sites associated with fasting insulin, waist, and body mass index, as well as thousands of DNAme sites associated with gene expression (eQTM). We find that controlling for heterogeneity in tissue/muscle fiber type reduces the number of physiological trait associations, and that long-range eQTMs (>1 Mb) are reduced when controlling for tissue/muscle fiber type or latent factors. We map genetic regulators (quantitative trait loci; QTLs) of expression (eQTLs) and DNAme (mQTLs). Using Mendelian randomization (MR) and mediation techniques, we leverage these genetic maps to predict 213 causal relationships between expression and DNAme, approximately two-thirds of which predict methylation to causally influence expression. We use MR to integrate FUSION mQTLs, FUSION eQTLs, and GTEx eQTLs for 48 tissues with genetic associations for 534 diseases and quantitative traits. We identify hundreds of genes and thousands of DNAme sites that may drive the reported disease/quantitative trait genetic associations. We identify 300 gene expression MR associations that are present in both FUSION and GTEx skeletal muscle and that show stronger evidence of MR association in skeletal muscle than other tissues, which may partially reflect differences in power across tissues. As one example, we find that increased muscle expression may decrease lean tissue mass.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1814263116DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6561151PMC
May 2019

The Convergence of Research and Clinical Genomics.

Authors:
Ewan Birney

Am J Hum Genet 2019 05;104(5):781-783

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK. Electronic address:

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2019.04.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6507036PMC
May 2019

Identifying Extrinsic versus Intrinsic Drivers of Variation in Cell Behavior in Human iPSC Lines from Healthy Donors.

Cell Rep 2019 02;26(8):2078-2087.e3

Centre for Stem Cells and Regenerative Medicine, King's College London, Floor 28, Tower Wing, Guy's Hospital, Great Maze Pond, London SE1 9RT, UK. Electronic address:

Large cohorts of human induced pluripotent stem cells (iPSCs) from healthy donors are a potentially powerful tool for investigating the relationship between genetic variants and cellular behavior. Here, we integrate high content imaging of cell shape, proliferation, and other phenotypes with gene expression and DNA sequence datasets from over 100 human iPSC lines. By applying a dimensionality reduction approach, Probabilistic Estimation of Expression Residuals (PEER), we extracted factors that captured the effects of intrinsic (genetic concordance between different cell lines from the same donor) and extrinsic (cell responses to different fibronectin concentrations) conditions. We identify genes that correlate in expression with intrinsic and extrinsic PEER factors and associate outlier cell behavior with genes containing rare deleterious non-synonymous SNVs. Our study, thus, establishes a strategy for examining the genetic basis of inter-individual variability in cell behavior.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2019.01.094DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6381787PMC
February 2019

GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals.

Nat Genet 2019 02 28;51(2):343-353. Epub 2019 Jan 28.

Human Genetics, Wellcome Sanger Institute, Hinxton, UK.

Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies' findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0322-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6908448PMC
February 2019

Author Correction: Landscape of somatic mutations in 560 breast cancer whole-genome sequences.

Nature 2019 02;566(7742):E1

Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

In the Methods section of this Article, 'greater than' should have been 'less than' in the sentence 'Putative regions of clustered rearrangements were identified as having an average inter-rearrangement distance that was at least 10 times greater than the whole-genome average for the individual sample. '. The Article has not been corrected.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-019-0883-2DOI Listing
February 2019

Integrating Genomics into Healthcare: A Global Responsibility.

Am J Hum Genet 2019 01;104(1):13-20

Australian Genomics Health Alliance, Melbourne VIC 3052, Australia; Murdoch Children's Research Institute, Melbourne VIC 3052, Australia; Department of Paediatrics, University of Melbourne, Melbourne VIC 3052, Australia; Global Alliance for Genomics and Health, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3, Canada. Electronic address:

Genomic sequencing is rapidly transitioning into clinical practice, and implementation into healthcare systems has been supported by substantial government investment, totaling over US$4 billion, in at least 14 countries. These national genomic-medicine initiatives are driving transformative change under real-life conditions while simultaneously addressing barriers to implementation and gathering evidence for wider adoption. We review the diversity of approaches and current progress made by national genomic-medicine initiatives in the UK, France, Australia, and US and provide a roadmap for sharing strategies, standards, and data internationally to accelerate implementation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2018.11.014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323624PMC
January 2019

The European Bioinformatics Institute in 2018: tools, infrastructure and training.

Nucleic Acids Res 2019 01;47(D1):D15-D22

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The European Bioinformatics Institute (https://www.ebi.ac.uk/) archives, curates and analyses life sciences data produced by researchers throughout the world, and makes these data available for re-use globally (https://www.ebi.ac.uk/). Data volumes continue to grow exponentially: total raw storage capacity now exceeds 160 petabytes, and we manage these increasing data flows while maintaining the quality of our services. This year we have improved the efficiency of our computational infrastructure and doubled the bandwidth of our connection to the worldwide web. We report two new data resources, the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/), which is a component of the Expression Atlas; and the PDBe-Knowledgebase (https://www.ebi.ac.uk/pdbe/pdbe-kb), which collates functional annotations and predictions for structure data in the Protein Data Bank. Additionally, Europe PMC (http://europepmc.org/) has added preprint abstracts to its search results, supplementing results from peer-reviewed publications. EMBL-EBI maintains over 150 analytical bioinformatics tools that complement our data resources. We make these tools available for users through a web interface as well as programmatically using application programming interfaces, whilst ensuring the latest versions are available for our users. Our training team, with support from all of our staff, continued to provide on-site, off-site and web-based training opportunities for thousands of researchers worldwide this year.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1124DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323906PMC
January 2019

A call for public archives for biological image data.

Nat Methods 2018 11;15(11):849-854

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-018-0195-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6884425PMC
November 2018

A roadmap for restoring trust in Big Data.

Lancet Oncol 2018 08;19(8):1014-1015

European Alliance for Personalised Medicine, Brussels, Belgium; Gustave Roussy Cancer Campus Grand Paris, Villejuif, France.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S1470-2045(18)30425-XDOI Listing
August 2018

The human leukemia virus HTLV-1 alters the structure and transcription of host chromatin in cis.

Elife 2018 06 26;7. Epub 2018 Jun 26.

Division of Infectious Diseases, Imperial College London, London, United Kingdom.

Chromatin looping controls gene expression by regulating promoter-enhancer contacts, the spread of epigenetic modifications, and the segregation of the genome into transcriptionally active and inactive compartments. We studied the impact on the structure and expression of host chromatin by the human retrovirus HTLV-1. We show that HTLV-1 disrupts host chromatin structure by forming loops between the provirus and the host genome; certain loops depend on the critical chromatin architectural protein CTCF, which we recently discovered binds to the HTLV-1 provirus. We show that the provirus causes two distinct patterns of abnormal transcription of the host genome in cis: bidirectional transcription in the host genome immediately flanking the provirus, and clone-specific transcription in cis at non-contiguous loci up to >300 kb from the integration site. We conclude that HTLV-1 causes insertional mutagenesis up to the megabase range in the host genome in >10 persistently-maintained HTLV-1 T-cell clones in vivo.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.36245DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019074PMC
June 2018

htsget: a protocol for securely streaming genomic data.

Bioinformatics 2019 01;35(1):119-121

European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.

Summary: Standardized interfaces for efficiently accessing high-throughput sequencing data are a fundamental requirement for large-scale genomic data sharing. We have developed htsget, a protocol for secure, efficient and reliable access to sequencing read and variation data. We demonstrate four independent client and server implementations, and the results of a comprehensive interoperability demonstration.

Availability And Implementation: http://samtools.github.io/hts-specs/htsget.html.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty492DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6298043PMC
January 2019

Interactions between genetic variation and cellular environment in skeletal muscle gene expression.

PLoS One 2018 16;13(4):e0195788. Epub 2018 Apr 16.

National Human Genome Research Institute, National Institutes of Health, Bethesda, United States of America.

From whole organisms to individual cells, responses to environmental conditions are influenced by genetic makeup, where the effect of genetic variation on a trait depends on the environmental context. RNA-sequencing quantifies gene expression as a molecular trait, and is capable of capturing both genetic and environmental effects. In this study, we explore opportunities of using allele-specific expression (ASE) to discover cis-acting genotype-environment interactions (GxE)-genetic effects on gene expression that depend on an environmental condition. Treating 17 common, clinical traits as approximations of the cellular environment of 267 skeletal muscle biopsies, we identify 10 candidate environmental response expression quantitative trait loci (reQTLs) across 6 traits (12 unique gene-environment trait pairs; 10% FDR per trait) including sex, systolic blood pressure, and low-density lipoprotein cholesterol. Although using ASE is in principle a promising approach to detect GxE effects, replication of such signals can be challenging as validation requires harmonization of environmental traits across cohorts and a sufficient sampling of heterozygotes for a transcribed SNP. Comprehensive discovery and replication will require large human transcriptome datasets, or the integration of multiple transcribed SNPs, coupled with standardized clinical phenotyping.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0195788PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5901994PMC
July 2018

PhenotypeSimulator: A comprehensive framework for simulating multi-trait, multi-locus genotype to phenotype relationships.

Bioinformatics 2018 09;34(17):2951-2956

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

Motivation: Simulation is a critical part of method development and assessment. With the increasing sophistication of multi-trait and multi-locus genetic analysis techniques, it is important that the community has flexible simulation tools to challenge and explore the properties of these methods.

Results: We have developed PhenotypeSimulator, a comprehensive phenotype simulation scheme that can model multiple traits with multiple underlying genetic loci as well as complex covariate and observational noise structure. This package has been designed to work with many common genetic tools both for input and output. We describe the underlying components of this simulation tool and illustrate its use on an example dataset.

Availability And Implementation: PhenotypeSimulator is available as a well documented R/CRAN package and the code is available on github: https://github.com/HannahVMeyer/PhenotypeSimulator.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty197DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129313PMC
September 2018

ChromoTrace: Computational reconstruction of 3D chromosome configurations for super-resolution microscopy.

PLoS Comput Biol 2018 03 9;14(3):e1006002. Epub 2018 Mar 9.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

The 3D structure of chromatin plays a key role in genome function, including gene expression, DNA replication, chromosome segregation, and DNA repair. Furthermore the location of genomic loci within the nucleus, especially relative to each other and nuclear structures such as the nuclear envelope and nuclear bodies strongly correlates with aspects of function such as gene expression. Therefore, determining the 3D position of the 6 billion DNA base pairs in each of the 23 chromosomes inside the nucleus of a human cell is a central challenge of biology. Recent advances of super-resolution microscopy in principle enable the mapping of specific molecular features with nanometer precision inside cells. Combined with highly specific, sensitive and multiplexed fluorescence labeling of DNA sequences this opens up the possibility of mapping the 3D path of the genome sequence in situ. Here we develop computational methodologies to reconstruct the sequence configuration of all human chromosomes in the nucleus from a super-resolution image of a set of fluorescent in situ probes hybridized to the genome in a cell. To test our approach, we develop a method for the simulation of DNA in an idealized human nucleus. Our reconstruction method, ChromoTrace, uses suffix trees to assign a known linear ordering of in situ probes on the genome to an unknown set of 3D in-situ probe positions in the nucleus from super-resolved images using the known genomic probe spacing as a set of physical distance constraints between probes. We find that ChromoTrace can assign the 3D positions of the majority of loci with high accuracy and reasonable sensitivity to specific genome sequences. By simulating appropriate spatial resolution, label multiplexing and noise scenarios we assess our algorithms performance. Our study shows that it is feasible to achieve genome-wide reconstruction of the 3D DNA path based on super-resolution microscopy images.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1006002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5862484PMC
March 2018

The Human Cell Atlas.

Elife 2017 12 5;6. Epub 2017 Dec 5.

National Institute of Biomedical Genomics, Kalyani, India.

The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.27041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5762154PMC
December 2017

The European Bioinformatics Institute in 2017: data coordination and integration.

Nucleic Acids Res 2018 01;46(D1):D21-D29

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The European Bioinformatics Institute (EMBL-EBI) supports life-science research throughout the world by providing open data, open-source software and analytical tools, and technical infrastructure (https://www.ebi.ac.uk). We accommodate an increasingly diverse range of data types and integrate them, so that biologists in all disciplines can explore life in ever-increasing detail. We maintain over 40 data resources, many of which are run collaboratively with partners in 16 countries (https://www.ebi.ac.uk/services). Submissions continue to increase exponentially: our data storage has doubled in less than two years to 120 petabytes. Recent advances in cellular imaging and single-cell sequencing techniques are generating a vast amount of high-dimensional data, bringing to light new cell types and new perspectives on anatomy. Accordingly, one of our main focus areas is integrating high-quality information from bioimaging, biobanking and other types of molecular data. This is reflected in our deep involvement in Open Targets, stewarding of plant phenotyping standards (MIAPPE) and partnership in the Human Cell Atlas data coordination platform, as well as the 2017 launch of the Omics Discovery Index. This update gives a birds-eye view of EMBL-EBI's approach to data integration and service development as genomics begins to enter the clinic.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1154DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753251PMC
January 2018

MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry.

F1000Res 2017 31;6:760. Epub 2017 May 31.

University of California at Santa Cruz, Santa Cruz, CA, USA.

Background: Long-read sequencing is rapidly evolving and reshaping the suite of opportunities for genomic analysis. For the MinION in particular, as both the platform and chemistry develop, the user community requires reference data to set performance expectations and maximally exploit third-generation sequencing. We performed an analysis of MinION data derived from whole genome sequencing of K-12 using the R9.0 chemistry, comparing the results with the older R7.3 chemistry.

Methods: We computed the error-rate estimates for insertions, deletions, and mismatches in MinION reads.

Results: Run-time characteristics of the flow cell and run scripts for R9.0 were similar to those observed for R7.3 chemistry, but with an 8-fold increase in bases per second (from 30 bps in R7.3 and SQK-MAP005 library preparation, to 250 bps in R9.0) processed by individual nanopores, and less drop-off in yield over time. The 2-dimensional ("2D") N50 read length was unchanged from the prior chemistry. Using the proportion of alignable reads as a measure of base-call accuracy, 99.9% of "pass" template reads from 1-dimensional ("1D")  experiments were mappable and ~97% from 2D experiments. The median identity of reads was ~89% for 1D and ~94% for 2D experiments. The total error rate (miscall + insertion + deletion ) decreased for 2D "pass" reads from 9.1% in R7.3 to 7.5% in R9.0 and for template "pass" reads from 26.7% in R7.3 to 14.5% in R9.0.

Conclusions: These Phase 2 MinION experiments serve as a baseline by providing estimates for read quality, throughput, and mappability. The datasets further enable the development of bioinformatic tools tailored to the new R9.0 chemistry and the design of novel biological applications for this technology.

Abbreviations: K: thousand, Kb: kilobase (one thousand base pairs), M: million, Mb: megabase (one million base pairs), Gb: gigabase (one billion base pairs).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.11354.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5538040PMC
May 2017
-->