Publications by authors named "Maria Keays"

16 Publications

  • Page 1 of 1

A user guide for the online exploration and visualization of PCAWG data.

Nat Commun 2020 07 7;11(1):3400. Epub 2020 Jul 7.

European Molecular Biology Laboratory, 69117, Heidelberg, Germany.

The Pan-Cancer Analysis of Whole Genomes (PCAWG) project generated a vast amount of whole-genome cancer sequencing resource data. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we provide a user's guide to the five publicly available online data exploration and visualization tools introduced in the PCAWG marker paper. These tools are ICGC Data Portal, UCSC Xena, Chromothripsis Explorer, Expression Atlas, and PCAWG-Scout. We detail use cases and analyses for each tool, show how they incorporate outside resources from the larger genomics ecosystem, and demonstrate how the tools can be used together to understand the biology of cancers more deeply. Together, the tools enable researchers to query the complex genomic PCAWG data dynamically and integrate external information, enabling and enhancing interpretation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-16785-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7340791PMC
July 2020

Expression Atlas: gene and protein expression across multiple studies and organisms.

Nucleic Acids Res 2018 01;46(D1):D246-D251

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK.

Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1158DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753389PMC
January 2018

Gramene 2018: unifying comparative genomics and pathway resources for plant research.

Nucleic Acids Res 2018 01;46(D1):D1181-D1189

Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA.

Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753211PMC
January 2018

Gramene Database: Navigating Plant Comparative Genomics Resources.

Curr Plant Biol 2016 Nov;7-8:10-15

Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA.

Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationships to enrich the annotation of genomic data and provides tools to perform powerful comparative analyses across a wide spectrum of plant species. It consists of an integrated portal for querying, visualizing and analyzing data for 44 plant reference genomes, genetic variation data sets for 12 species, expression data for 16 species, curated rice pathways and orthology-based pathway projections for 66 plant species including various crops. Here we briefly describe the functions and uses of the Gramene database.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cpb.2016.12.005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5509230PMC
November 2016

The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data.

Bioinformatics 2017 Jul;33(14):2218-2220

Functional Genomics Group, European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK.

Motivation: The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer.

Results: The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API.

Availability And Implementation: The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library .

Contact: rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx143DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870697PMC
July 2017

Open Targets: a platform for therapeutic target identification and validation.

Nucleic Acids Res 2017 01 29;45(D1):D985-D994. Epub 2016 Nov 29.

Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1055DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210543PMC
January 2017

Plant Reactome: a resource for plant pathways and comparative analysis.

Nucleic Acids Res 2017 01 30;45(D1):D1029-D1039. Epub 2016 Oct 30.

2082 Cordley Hall, Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR 97331, USA

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw932DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210633PMC
January 2017

Mitochondrial Complex I Is a Global Regulator of Secondary Metabolism, Virulence and Azole Sensitivity in Fungi.

PLoS One 2016 20;11(7):e0158724. Epub 2016 Jul 20.

Manchester Fungal Infection Group, Institute of Inflammation and Repair, Faculty of Medicine and Human Sciences, University of Manchester, 2.24 Core technology Building, Grafton St., Manchester, M13 9NT, United Kingdom.

Recent estimates of the global burden of fungal disease suggest that that their incidence has been drastically underestimated and that mortality may rival that of malaria or tuberculosis. Azoles are the principal class of antifungal drug and the only available oral treatment for fungal disease. Recent occurrence and increase in azole resistance is a major concern worldwide. Known azole resistance mechanisms include over-expression of efflux pumps and mutation of the gene encoding the target protein cyp51a, however, for one of the most important fungal pathogens of humans, Aspergillus fumigatus, much of the observed azole resistance does not appear to involve such mechanisms. Here we present evidence that azole resistance in A. fumigatus can arise through mutation of components of mitochondrial complex I. Gene deletions of the 29.9KD subunit of this complex are azole resistant, less virulent and exhibit dysregulation of secondary metabolite gene clusters in a manner analogous to deletion mutants of the secondary metabolism regulator, LaeA. Additionally we observe that a mutation leading to an E180D amino acid change in the 29.9 KD subunit is strongly associated with clinical azole resistant A. fumigatus isolates. Evidence presented in this paper suggests that complex I may play a role in the hypoxic response and that one possible mechanism for cell death during azole treatment is a dysfunctional hypoxic response that may be restored by dysregulation of complex I. Both deletion of the 29.9 KD subunit of complex I and azole treatment alone profoundly change expression of gene clusters involved in secondary metabolism and immunotoxin production raising potential concerns about long term azole therapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158724PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4954691PMC
July 2017

Gramene 2016: comparative plant genomics and pathway resources.

Nucleic Acids Res 2016 Jan 8;44(D1):D1133-40. Epub 2015 Nov 8.

EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.

Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1179DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702844PMC
January 2016

Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants.

Nucleic Acids Res 2016 Jan 19;44(D1):D746-52. Epub 2015 Oct 19.

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK.

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1045DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702781PMC
January 2016

ArrayExpress update--simplifying data submissions.

Nucleic Acids Res 2015 Jan 31;43(Database issue):D1113-6. Epub 2014 Oct 31.

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42,000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383899PMC
January 2015

Expression Atlas update--a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments.

Nucleic Acids Res 2014 Jan 4;42(Database issue):D926-32. Epub 2013 Dec 4.

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, CB10 1SD, UK.

Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt1270DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964963PMC
January 2014

ArrayExpress update--trends in database growth and links to data analysis tools.

Nucleic Acids Res 2013 Jan 27;41(Database issue):D987-90. Epub 2012 Nov 27.

Functional Genomics Team, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1174DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531147PMC
January 2013

Signatures of selection and sex-specific expression variation of a novel duplicate during the evolution of the Drosophila desaturase gene family.

Mol Ecol 2011 Sep 29;20(17):3617-30. Epub 2011 Jul 29.

Centre for Evolution, Genes and Genomics, School of Biology, University of St. Andrews, St. Andrews, Fife, UK.

The tempo and mode of evolution of loci with a large effect on adaptation and reproductive isolation will influence the rate of evolutionary divergence and speciation. Desaturase loci are involved in key biochemical changes in long-chain fatty acids. In insects, these have been shown to influence adaptation to starvation or desiccation resistance and in some cases act as important pheromones. The desaturase gene family of Drosophila is known to have evolved by gene duplication and diversification, and at least one locus shows rapid evolution of sex-specific expression variation. Here, we examine the evolution of the gene family in species representing the Drosophila phylogeny. We find that the family includes more loci than have been previously described. Most are represented as single-copy loci, but we also find additional examples of duplications in loci which influence pheromone blends. Most loci show patterns of variation associated with purifying selection, but there are strong signatures of diversifying selection in new duplicates. In the case of a new duplicate of desat1 in the obscura group species, we show that strong selection on the coding sequence is associated with the evolution of sex-specific expression variation. It seems likely that both sexual selection and ecological adaptation have influenced the evolution of this gene family in Drosophila.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1365-294X.2011.05208.xDOI Listing
September 2011