Publications by authors named "Andrew I Su"

113 Publications

A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses.

BMC Biol 2021 01 22;19(1):12. Epub 2021 Jan 22.

Department of Agrotechnology and Food Sciences, Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands.

Background: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.

Results: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.

Conclusions: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12915-020-00940-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820539PMC
January 2021

Multi-Omics Database Analysis of Aminoacyl-tRNA Synthetases in Cancer.

Genes (Basel) 2020 Nov 22;11(11). Epub 2020 Nov 22.

Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA.

Aminoacyl-tRNA synthetases (aaRSs) are key enzymes in the mRNA translation machinery, yet they possess numerous non-canonical functions developed during the evolution of complex organisms. The aaRSs and aaRS-interacting multi-functional proteins (AIMPs) are continually being implicated in tumorigenesis, but these connections are often limited in scope, focusing on specific aaRSs in distinct cancer subtypes. Here, we analyze publicly available genomic and transcriptomic data on human cytoplasmic and mitochondrial aaRSs across many cancer types. As high-throughput technologies have improved exponentially, large-scale projects have systematically quantified genetic alteration and expression from thousands of cancer patient samples. One such project is the Cancer Genome Atlas (TCGA), which processed over 20,000 primary cancer and matched normal samples from 33 cancer types. The wealth of knowledge provided from this undertaking has streamlined the identification of cancer drivers and suppressors. We examined aaRS expression data produced by the TCGA project and combined this with patient survival data to recognize trends in aaRSs' impact on cancer both molecularly and prognostically. We further compared these trends to an established tumor suppressor and a proto-oncogene. We observed apparent upregulation of many tRNA synthetase genes with aggressive cancer types, yet, at the individual gene level, some aaRSs resemble a tumor suppressor while others show similarities to an oncogene. This study provides an unbiased, overarching perspective on the relationship of aaRSs with cancers and identifies certain aaRS family members as promising therapeutic targets or potential leads for developing biological therapy for cancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/genes11111384DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7700366PMC
November 2020

Mohawk is a transcription factor that promotes meniscus cell phenotype and tissue repair and reduces osteoarthritis severity.

Sci Transl Med 2020 10;12(567)

Department of Molecular Medicine, Scripps Research, La Jolla, CA 92037, USA.

Meniscus tears are common knee injuries and a major osteoarthritis (OA) risk factor. Knowledge gaps that limit the development of therapies for meniscus injury and degeneration concern transcription factors that control the meniscus cell phenotype. Analysis of RNA sequencing data from 37 human tissues in the Genotype-Tissue Expression database and RNA sequencing data from meniscus and articular cartilage showed that transcription factor Mohawk (MKX) is highly enriched in meniscus. In human meniscus cells, MKX regulates the expression of meniscus marker genes, OA-related genes, and other transcription factors, including Scleraxis (), SRY Box 5 (), and Runt domain-related transcription factor 2 (). In mesenchymal stem cells (MSCs), the combination of adenoviral MKX (Ad-MKX) and transforming growth factor-β3 (TGF-β3) induced a meniscus cell phenotype. When Ad-MKX-transduced MSCs were seeded on TGF-β3-conjugated decellularized meniscus scaffold (DMS) and inserted into experimental tears in meniscus explants, they increased glycosaminoglycan content, extracellular matrix interconnectivity, cell infiltration into the DMS, and improved biomechanical properties. Ad-MKX injection into mouse knee joints with experimental OA induced by surgical destabilization of the meniscus suppressed meniscus and cartilage damage, reducing OA severity. Ad-MKX injection into human OA meniscus tissue explants corrected pathogenic gene expression. These results identify MKX as a previously unidentified key transcription factor that regulates the meniscus cell phenotype. The combination of Ad-MKX with TGF-β3 is effective for differentiation of MSCs to a meniscus cell phenotype and useful for meniscus repair. MKX is a promising therapeutic target for meniscus tissue engineering, repair, and prevention of OA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/scitranslmed.aan7967DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7955769PMC
October 2020

Discovery of SARS-CoV-2 antiviral drugs through large-scale compound repurposing.

Nature 2020 10 24;586(7827):113-119. Epub 2020 Jul 24.

Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in 2019 has triggered an ongoing global pandemic of the severe pneumonia-like disease coronavirus disease 2019 (COVID-19). The development of a vaccine is likely to take at least 12-18 months, and the typical timeline for approval of a new antiviral therapeutic agent can exceed 10 years. Thus, repurposing of known drugs could substantially accelerate the deployment of new therapies for COVID-19. Here we profiled a library of drugs encompassing approximately 12,000 clinical-stage or Food and Drug Administration (FDA)-approved small molecules to identify candidate therapeutic drugs for COVID-19. We report the identification of 100 molecules that inhibit viral replication of SARS-CoV-2, including 21 drugs that exhibit dose-response relationships. Of these, thirteen were found to harbour effective concentrations commensurate with probable achievable therapeutic doses in patients, including the PIKfyve kinase inhibitor apilimod and the cysteine protease inhibitors MDL-28170, Z LVG CHN2, VBY-825 and ONO 5334. Notably, MDL-28170, ONO 5334 and apilimod were found to antagonize viral replication in human pneumocyte-like cells derived from induced pluripotent stem cells, and apilimod also demonstrated antiviral efficacy in a primary human lung explant model. Since most of the molecules identified in this study have already advanced into the clinic, their known pharmacological and human safety profiles will enable accelerated preclinical and clinical evaluation of these drugs for the treatment of COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2577-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7603405PMC
October 2020

Structured reviews for data and knowledge-driven research.

Database (Oxford) 2020 01;2020

Department of Integrative Structural and Computational Biology, Scripps Research, 10550 N Torrey Pines Rd. La Jolla, CA 92037, USA.

Hypothesis generation is a critical step in research and a cornerstone in the rare disease field. Research is most efficient when those hypotheses are based on the entirety of knowledge known to date. Systematic review articles are commonly used in biomedicine to summarize existing knowledge and contextualize experimental data. But the information contained within review articles is typically only expressed as free-text, which is difficult to use computationally. Researchers struggle to navigate, collect and remix prior knowledge as it is scattered in several silos without seamless integration and access. This lack of a structured information framework hinders research by both experimental and computational scientists. To better organize knowledge and data, we built a structured review article that is specifically focused on NGLY1 Deficiency, an ultra-rare genetic disease first reported in 2012. We represented this structured review as a knowledge graph and then stored this knowledge graph in a Neo4j database to simplify dissemination, querying and visualization of the network. Relative to free-text, this structured review better promotes the principles of findability, accessibility, interoperability and reusability (FAIR). In collaboration with domain experts in NGLY1 Deficiency, we demonstrate how this resource can improve the efficiency and comprehensiveness of hypothesis generation. We also developed a read-write interface that allows domain experts to contribute FAIR structured knowledge to this community resource. In contrast to traditional free-text review articles, this structured review exists as a living knowledge graph that is curated by humans and accessible to computational analyses. Finally, we have generalized this workflow into modular and repurposable components that can be applied to other domain areas. This NGLY1 Deficiency-focused network is publicly available at http://ngly1graph.org/.

Availability And Implementation: Database URL: http://ngly1graph.org/. Network data files are at: https://github.com/SuLab/ngly1-graph and source code at: https://github.com/SuLab/bioknowledge-reviewer.

Contact: asu@scripps.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baaa015DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7153956PMC
January 2020

Wikidata as a knowledge graph for the life sciences.

Elife 2020 03 17;9. Epub 2020 Mar 17.

Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, United States.

Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.52614DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7077981PMC
March 2020

Functional Annotation of the Transcriptome of the Pig, , Based Upon Network Analysis of an RNAseq Transcriptional Atlas.

Front Genet 2019 14;10:1355. Epub 2020 Feb 14.

Mater Research Institute-University of Queensland, Translational Research Institute, Woolloongabba, QLD, Australia.

The domestic pig () is both an economically important livestock species and a model for biomedical research. Two highly contiguous pig reference genomes have recently been released. To support functional annotation of the pig genomes and comparative analysis with large human transcriptomic data sets, we aimed to create a pig gene expression atlas. To achieve this objective, we extended a previous approach developed for the chicken. We downloaded RNAseq data sets from public repositories, down-sampled to a common depth, and quantified expression against a reference transcriptome using the mRNA quantitation tool, Kallisto. We then used the network analysis tool Graphia to identify clusters of transcripts that were coexpressed across the merged data set. Consistent with the principle of guilt-by-association, we identified coexpression clusters that were highly tissue or cell-type restricted and contained transcription factors that have previously been implicated in lineage determination. Other clusters were enriched for transcripts associated with biological processes, such as the cell cycle and oxidative phosphorylation. The same approach was used to identify coexpression clusters within RNAseq data from multiple individual liver and brain samples, highlighting cell type, process, and region-specific gene expression. Evidence of conserved expression can add confidence to assignment of orthology between pig and human genes. Many transcripts currently identified as novel genes with ENSSSCG or LOC IDs were found to be coexpressed with annotated neighbouring transcripts in the same orientation, indicating they may be products of the same transcriptional unit. The meta-analytic approach to utilising public RNAseq data is extendable to include new data sets and new species and provides a framework to support the Functional Annotation of Animals Genomes (FAANG) initiative.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.01355DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7034361PMC
February 2020

Time-resolved evaluation of compound repositioning predictions on a text-mined knowledge network.

BMC Bioinformatics 2019 Dec 11;20(1):653. Epub 2019 Dec 11.

The Scripps Research Institute, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA.

Background: Computational compound repositioning has the potential for identifying new uses for existing drugs, and new algorithms and data source aggregation strategies provide ever-improving results via in silico metrics. However, even with these advances, the number of compounds successfully repositioned via computational screening remains low. New strategies for algorithm evaluation that more accurately reflect the repositioning potential of a compound could provide a better target for future optimizations.

Results: Using a text-mined database, we applied a previously described network-based computational repositioning algorithm, yielding strong results via cross-validation, averaging 0.95 AUROC on test-set indications. However, to better approximate a real-world scenario, we built a time-resolved evaluation framework. At various time points, we built networks corresponding to prior knowledge for use as a training set, and then predicted on a test set comprised of indications that were subsequently described. This framework showed a marked reduction in performance, peaking in performance metrics with the 1985 network at an AUROC of .797. Examining performance reductions due to removal of specific types of relationships highlighted the importance of drug-drug and disease-disease similarity metrics. Using data from future timepoints, we demonstrate that further acquisition of these kinds of data may help improve computational results.

Conclusions: Evaluating a repositioning algorithm using indications unknown to input network better tunes its ability to find emerging drug indications, rather than finding those which have been randomly withheld. Focusing efforts on improving algorithmic performance in a time-resolved paradigm may further improve computational repositioning predictions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-3297-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907279PMC
December 2019

Advancing computational biology and bioinformatics research through open innovation competitions.

PLoS One 2019 27;14(9):e0222165. Epub 2019 Sep 27.

Laboratory for Innovation Science at Harvard, Harvard University, Cambridge, MA, United States of America.

Open data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. We highlight three examples in computational biology and bioinformatics research in which the use of competitions has yielded significant performance gains over established algorithms. These include algorithms for antibody clustering, imputing gene expression data, and querying the Connectivity Map (CMap). Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets. The solutions produced through these competitions are then examined with respect to their utility and the prospects for implementation in the field. We present the decision process and competition design considerations that lead to these successful outcomes as a model for researchers who want to use competitions and non-domain crowds as collaborators to further their research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0222165PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764653PMC
March 2020

Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts.

Bioinformatics 2020 02;36(4):1226-1233

Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.

Motivation: Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depend on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction (RE).

Results: In this article, we introduce the Relationship Extraction Module of the web-based application Mark2Cure (M2C) and demonstrate that citizen scientists can perform RE. We confirm the importance of accurate named entity recognition on user performance of RE and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the M2C Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration and natural language processing.

Availability And Implementation: Mark2Cure platform: https://mark2cure.org; Mark2Cure source code: https://github.com/sulab/mark2cure; and data and analysis code for this article: https://github.com/gtsueng/M2C_rel_nb.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz678DOI Listing
February 2020

Aligning Needs: Integrating Citizen Science Efforts into Schools Through Service Requirements.

Hum Comput (Fairfax) 2019 ;6(1):56-82

The Scripps Research Institute.

Citizen science is the participation in scientific research by members of the public, and it is an increasingly valuable tool for both scientists and educators. For researchers, citizen science is a means of more quickly investigating questions which would otherwise be time-consuming and costly to study. For educators, citizen science offers a means to engage students in actual research and improve learning outcomes. Since most citizen science projects are usually designed with research goals in mind, many lack the necessary educator materials for successful integration in a formal science education (FSE) setting. In an ideal world, researchers and educators would build the necessary materials together; however, many researchers lack the time, resources, and networks to create these materials early on in the life of a citizen science project. For resource-poor projects, we propose an intermediate entry point for recruiting from the educational setting: community service or service learning requirements (CSSLRs). Many schools require students to participate in community service or service learning activities in order to graduate. When implemented well, CSSLRs provide students with growth and development opportunities outside the classroom while contributing to the community and other worthwhile causes. However, CSSLRs take time, resources, and effort to implement well. Just as citizen science projects need to establish relationships to transition well into formal science education, schools need to cultivate relationships with community service organizations. Students and educators at schools with CSSLRs where implementation is still a work in progress may be left with a burdensome requirement and inadequate support. With the help of a volunteer fulfilling a CSSLR, we investigated the number of students impacted by CSSLRs set at different levels of government and explored the qualifications needed for citizen science projects to fulfill CSSLRs by examining the explicitly-stated justifications for having CSSLRs, surveying how CSSLRs are verified, and using these qualifications to demonstrate how an online citizen science project, Mark2Cure, could use this information to meet the needs of students fulfilling CSSLRs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6667230PMC
January 2019

The Healthy Pregnancy Research Program: transforming pregnancy research through a ResearchKit app.

NPJ Digit Med 2018 5;1:45. Epub 2018 Sep 5.

Digital Medicine Division, Scripps Research Translational Institute, La Jolla, CA USA.

Although maternal morbidity and mortality in the US is among the worst of developed countries, pregnant women have been under-represented in research studies, resulting in deficiencies in evidence-based guidance for treatment. There are over two billion smartphone users worldwide, enabling researchers to easily and cheaply conduct extremely large-scale research studies through smartphone apps, especially among pregnant women in whom app use is exceptionally high, predominantly as an information conduit. We developed the first pregnancy research app that is embedded within an existing, popular pregnancy app for self-management and education of expectant mothers. Through the large-scale and simplified collection of survey and sensor generated data via the app, we aim to improve our understanding of factors that promote a healthy pregnancy for both the mother and developing fetus. From the launch of this cohort study on 16 March 2017 through 17 December 2017, we have enrolled 2058 pregnant women from all 50 states. Our study population is diverse geographically and demographically, and fairly representative of US population averages. We have collected 14,045 individual surveys and 107,102 total daily measurements of sleep, activity, blood pressure, and heart rate during this time. On average, women stayed engaged in the study for 59 days and 45 percent who reached their due date filled out the final outcome survey. During the first 9 months, we demonstrated the potential for a smartphone-based research platform to capture an ever-expanding array of longitudinal, objective, and subjective participant-generated data from a continuously growing and diverse population of pregnant women.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-018-0052-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550256PMC
September 2018

ChlamBase: a curated model organism database for the Chlamydia research community.

Database (Oxford) 2019 01;2019

Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.

The accelerating growth of genomic and proteomic information for Chlamydia species, coupled with unique biological aspects of these pathogens, necessitates bioinformatic tools and features that are not provided by major public databases. To meet these growing needs, we developed ChlamBase, a model organism database for Chlamydia that is built upon the WikiGenomes application framework, and Wikidata, a community-curated database. ChlamBase was designed to serve as a central access point for genomic and proteomic information for the Chlamydia research community. ChlamBase integrates information from numerous external databases, as well as important data extracted from the literature that are otherwise not available in structured formats that are easy to use. In addition, a key feature of ChlamBase is that it empowers users in the field to contribute new annotations and data as the field advances with continued discoveries. ChlamBase is freely and publicly available at chlambase.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baz041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463448PMC
January 2019

exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids.

Cell 2019 04;177(2):463-477.e15

Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Sema4, Stamford, CT 06902, USA.

To develop a map of cell-cell communication mediated by extracellular RNA (exRNA), the NIH Extracellular RNA Communication Consortium created the exRNA Atlas resource (https://exrna-atlas.org). The Atlas version 4P1 hosts 5,309 exRNA-seq and exRNA qPCR profiles from 19 studies and a suite of analysis and visualization tools. To analyze variation between profiles, we apply computational deconvolution. The analysis leads to a model with six exRNA cargo types (CT1, CT2, CT3A, CT3B, CT3C, CT4), each detectable in multiple biofluids (serum, plasma, CSF, saliva, urine). Five of the cargo types associate with known vesicular and non-vesicular (lipoprotein and ribonucleoprotein) exRNA carriers. To validate utility of this model, we re-analyze an exercise response study by deconvolution to identify physiologically relevant response pathways that were not detected previously. To enable wide application of this model, as part of the exRNA Atlas resource, we provide tools for deconvolution and analysis of user-provided case-control studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2019.02.018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616370PMC
April 2019

The ReFRAME library as a comprehensive drug repurposing library and its application to the treatment of cryptosporidiosis.

Proc Natl Acad Sci U S A 2018 10 3;115(42):10750-10755. Epub 2018 Oct 3.

California Institute for Biomedical Research, La Jolla, CA 92037;

The chemical diversity and known safety profiles of drugs previously tested in humans make them a valuable set of compounds to explore potential therapeutic utility in indications outside those originally targeted, especially neglected tropical diseases. This practice of "drug repurposing" has become commonplace in academic and other nonprofit drug-discovery efforts, with the appeal that significantly less time and resources are required to advance a candidate into the clinic. Here, we report a comprehensive open-access, drug repositioning screening set of 12,000 compounds (termed ReFRAME; Repurposing, Focused Rescue, and Accelerated Medchem) that was assembled by combining three widely used commercial drug competitive intelligence databases (Clarivate Integrity, GVK Excelra GoStar, and Citeline Pharmaprojects), together with extensive patent mining of small molecules that have been dosed in humans. To date, 12,000 compounds (∼80% of compounds identified from data mining) have been purchased or synthesized and subsequently plated for screening. To exemplify its utility, this collection was screened against spp., a major cause of childhood diarrhea in the developing world, and two active compounds previously tested in humans for other therapeutic indications were identified. Both compounds, VB-201 and a structurally related analog of ASP-7962, were subsequently shown to be efficacious in animal models of infection at clinically relevant doses, based on available human doses. In addition, an open-access data portal (https://reframedb.org) has been developed to share ReFRAME screen hits to encourage additional follow-up and maximize the impact of the ReFRAME screening collection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1810137115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6196526PMC
October 2018

Triflic Acid Treatment Enables LC-MS/MS Analysis of Insoluble Bacterial Biomass.

J Proteome Res 2018 09 8;17(9):2978-2986. Epub 2018 Aug 8.

Department of Molecular Medicine and Department of Integrative Structural and Computational Biology , The Scripps Research Institute , 10550 North Torrey Pines Road , La Jolla , California 92037 , United States.

The lysis and extraction of soluble bacterial proteins from cells is a common practice for proteomics analyses, but insoluble bacterial biomasses are often left behind. Here, we show that with triflic acid treatment, the insoluble bacterial biomass of Gram and Gram bacteria can be rendered soluble. We use LC-MS/MS shotgun proteomics to show that bacterial proteins in the soluble and insoluble postlysis fractions differ significantly. Additionally, in the case of Gram Pseudomonas aeruginosa, triflic acid treatment enables the enrichment of cell-envelope-associated proteins. Finally, we apply triflic acid to a human microbiome sample to show that this treatment is robust and enables the identification of a new, complementary subset of proteins from a complex microbial mixture.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.8b00166DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7108958PMC
September 2018

Correcting the F508del-CFTR variant by modulating eukaryotic translation initiation factor 3-mediated translation initiation.

J Biol Chem 2018 08 13;293(35):13477-13495. Epub 2018 Jul 13.

From the Departments of Molecular Medicine and

Inherited and somatic rare diseases result from >200,000 genetic variants leading to loss- or gain-of-toxic function, often caused by protein misfolding. Many of these misfolded variants fail to properly interact with other proteins. Understanding the link between factors mediating the transcription, translation, and protein folding of these disease-associated variants remains a major challenge in cell biology. Herein, we utilized the cystic fibrosis transmembrane conductance regulator (CFTR) protein as a model and performed a proteomics-based high-throughput screen (HTS) to identify pathways and components affecting the folding and function of the most common cystic fibrosis-associated mutation, the F508del variant of CFTR. Using a shortest-path algorithm we developed, we mapped HTS hits to the CFTR interactome to provide functional context to the targets and identified the eukaryotic translation initiation factor 3a (eIF3a) as a central hub for the biogenesis of CFTR. Of note, siRNA-mediated silencing of eIF3a reduced the polysome-to-monosome ratio in F508del-expressing cells, which, in turn, decreased the translation of CFTR variants, leading to increased CFTR stability, trafficking, and function at the cell surface. This finding suggested that eIF3a is involved in mediating the impact of genetic variations in CFTR on the folding of this protein. We posit that the number of ribosomes on a CFTR mRNA transcript is inversely correlated with the stability of the translated polypeptide. Polysome-based translation challenges the capacity of the proteostasis environment to balance message fidelity with protein folding, leading to disease. We suggest that this deficit can be corrected through control of translation initiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1074/jbc.RA118.003192DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120211PMC
August 2018

A genome-wide survey of mutations in the Jurkat cell line.

BMC Genomics 2018 May 8;19(1):334. Epub 2018 May 8.

Department of Molecular Medicine, The Scripps Research Institute, La Jolla, California, 92037, USA.

Background: The Jurkat cell line has an extensive history as a model of T cell signaling. But at the turn of the 21st century, some expression irregularities were observed, raising doubts about how closely the cell line paralleled normal human T cells. While numerous expression deficiencies have been described in Jurkat, genetic explanations have only been provided for a handful of defects.

Results: Here, we report a comprehensive catolog of genomic variation in the Jurkat cell line based on whole-genome sequencing. With this list of all detectable, non-reference sequences, we prioritize potentially damaging mutations by mining public databases for functional effects. We confirm documented mutations in Jurkat and propose links from detrimental gene variants to observed expression abnormalities in the cell line.

Conclusions: The Jurkat cell line harbors many mutations that are associated with cancer and contribute to Jurkat's unique characteristics. Genes with damaging mutations in the Jurkat cell line are involved in T-cell receptor signaling (PTEN, INPP5D, CTLA4, and SYK), maintenance of genome stability (TP53, BAX, and MSH2), and O-linked glycosylation (C1GALT1C1). This work ties together decades of molecular experiments and serves as a resource that will streamline both the interpretation of past research and the design of future Jurkat studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-4718-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5941560PMC
May 2018

Common PIEZO1 Allele in African Populations Causes RBC Dehydration and Attenuates Plasmodium Infection.

Cell 2018 04 22;173(2):443-455.e12. Epub 2018 Mar 22.

Howard Hughes Medical Institute, Department of Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, CA 92037, USA. Electronic address:

Hereditary xerocytosis is thought to be a rare genetic condition characterized by red blood cell (RBC) dehydration with mild hemolysis. RBC dehydration is linked to reduced Plasmodium infection in vitro; however, the role of RBC dehydration in protection against malaria in vivo is unknown. Most cases of hereditary xerocytosis are associated with gain-of-function mutations in PIEZO1, a mechanically activated ion channel. We engineered a mouse model of hereditary xerocytosis and show that Plasmodium infection fails to cause experimental cerebral malaria in these mice due to the action of Piezo1 in RBCs and in T cells. Remarkably, we identified a novel human gain-of-function PIEZO1 allele, E756del, present in a third of the African population. RBCs from individuals carrying this allele are dehydrated and display reduced Plasmodium infection in vitro. The existence of a gain-of-function PIEZO1 at such high frequencies is surprising and suggests an association with malaria resistance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2018.02.047DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5889333PMC
April 2018

Exploring applications of crowdsourcing to cryo-EM.

J Struct Biol 2018 07 24;203(1):37-45. Epub 2018 Feb 24.

Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037 USA. Electronic address:

Extraction of particles from cryo-electron microscopy (cryo-EM) micrographs is a crucial step in processing single-particle datasets. Although algorithms have been developed for automatic particle picking, these algorithms generally rely on two-dimensional templates for particle identification, which may exhibit biases that can propagate artifacts through the reconstruction pipeline. Manual picking is viewed as a gold-standard solution for particle selection, but it is too time-consuming to perform on data sets of thousands of images. In recent years, crowdsourcing has proven effective at leveraging the open web to manually curate datasets. In particular, citizen science projects such as Galaxy Zoo have shown the power of appealing to users' scientific interests to process enormous amounts of data. To this end, we explored the possible applications of crowdsourcing in cryo-EM particle picking, presenting a variety of novel experiments including the production of a fully annotated particle set from untrained citizen scientists. We show the possibilities and limitations of crowdsourcing particle selection tasks, and explore further options for crowdsourcing cryo-EM data processing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jsb.2018.02.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6086358PMC
July 2018

FoxO transcription factors modulate autophagy and proteoglycan 4 in cartilage homeostasis and osteoarthritis.

Sci Transl Med 2018 02;10(428)

Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA.

Aging is a main risk factor for osteoarthritis (OA). FoxO transcription factors protect against cellular and organismal aging, and FoxO expression in cartilage is reduced with aging and in OA. To investigate the role of FoxO in cartilage, Col2Cre-FoxO1, 3, and 4 single knockout (KO) and triple KO mice (Col2Cre-TKO) were analyzed. Articular cartilage in Col2Cre-TKO and Col2Cre-FoxO1 KO mice was thicker than in control mice at 1 or 2 months of age. This was associated with increased proliferation of chondrocytes of Col2Cre-TKO mice in vivo and in vitro. OA-like changes developed in cartilage, synovium, and subchondral bone between 4 and 6 months of age in Col2Cre-TKO and Col2Cre-FoxO1 KO mice. Col2Cre-FoxO3 and FoxO4 KO mice showed no cartilage abnormalities until 18 months of age when Col2Cre-FoxO3 KO mice had more severe OA than control mice. Autophagy and antioxidant defense genes were reduced in Col2Cre-TKO mice. Deletion of FoxO1/3/4 in mature mice using Aggrecan(Acan)-CreERT2 (AcanCreERT-TKO) also led to spontaneous cartilage degradation and increased OA severity in a surgical model or treadmill running. The superficial zone of knee articular cartilage of Col2Cre-TKO and AcanCreERT-TKO mice exhibited reduced cell density and markedly decreased In vitro, ectopic FoxO1 expression increased and synergized with transforming growth factor-β stimulation. In OA chondrocytes, overexpression of FoxO1 reduced inflammatory mediators and cartilage-degrading enzymes, increased protective genes, and antagonized interleukin-1β effects. Our observations suggest that FoxO play a key role in postnatal cartilage development, maturation, and homeostasis and protect against OA-associated cartilage damage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/scitranslmed.aan0746DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6204214PMC
February 2018

Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration.

BMC Bioinformatics 2018 02 1;19(1):30. Epub 2018 Feb 1.

Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.

Background: Application Programming Interfaces (APIs) are now widely used to distribute biological data. And many popular biological APIs developed by many different research teams have adopted Javascript Object Notation (JSON) as their primary data format. While usage of a common data format offers significant advantages, that alone is not sufficient for rich integrative queries across APIs.

Results: Here, we have implemented JSON for Linking Data (JSON-LD) technology on the BioThings APIs that we have developed, MyGene.info , MyVariant.info and MyChem.info . JSON-LD provides a standard way to add semantic context to the existing JSON data structure, for the purpose of enhancing the interoperability between APIs. We demonstrated several use cases that were facilitated by semantic annotations using JSON-LD, including simpler and more precise query capabilities as well as API cross-linking.

Conclusions: We believe that this pattern offers a generalizable solution for interoperability of APIs in the life sciences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2041-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5796402PMC
February 2018

Metaproteomics of Colonic Microbiota Unveils Discrete Protein Functions among Colitic Mice and Control Groups.

Proteomics 2018 02 2;18(3-4). Epub 2018 Feb 2.

Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, USA.

Metaproteomics can greatly assist established high-throughput sequencing methodologies to provide systems biological insights into the alterations of microbial protein functionalities correlated with disease-associated dysbiosis of the intestinal microbiota. Here, the authors utilize the well-characterized murine T cell transfer model of colitis to find specific changes within the intestinal luminal proteome associated with inflammation. MS proteomic analysis of colonic samples permitted the identification of ≈10 000-12 000 unique peptides that corresponded to 5610 protein clusters identified across three groups, including the colitic Rag1 T cell recipients, isogenic Rag1 controls, and wild-type mice. The authors demonstrate that the colitic mice exhibited a significant increase in Proteobacteria and Verrucomicrobia and show that such alterations in the microbial communities contributed to the enrichment of specific proteins with transcription and translation gene ontology terms. In combination with 16S sequencing, the authors' metaproteomics-based microbiome studies provide a foundation for assessing alterations in intestinal luminal protein functionalities in a robust and well-characterized mouse model of colitis, and set the stage for future studies to further explore the functional mechanisms of altered protein functionalities associated with dysbiosis and inflammation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pmic.201700391DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5921860PMC
February 2018

Gene Profiling and T Cell Receptor Sequencing from Antigen-Specific CD4 T Cells.

Methods Mol Biol 2018 ;1712:217-238

Department of Immunology and Microbiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.

The paucity of pathogenic T cells in circulating blood limits the information delivered by bulk analysis. Toward diagnosis and monitoring of treatments of autoimmune diseases, we have devised single-cell analysis approaches capable of identifying and characterizing rare circulating CD4 T cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-7514-3_14DOI Listing
July 2018

Academics can help shape Wikipedia.

Science 2017 Aug 10;357(6351):557-558. Epub 2017 Aug 10.

The Scripps Research Institute, La Jolla, CA 92037, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aao0462DOI Listing
August 2017

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata.

Database (Oxford) 2017 01;2017(1)

Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, 92037 USA.

With the advancement of genome-sequencing technologies, new genomes are being sequenced daily. Although these sequences are deposited in publicly available data warehouses, their functional and genomic annotations (beyond genes which are predicted automatically) mostly reside in the text of primary publications. Professional curators are hard at work extracting those annotations from the literature for the most studied organisms and depositing them in structured databases. However, the resources don't exist to fund the comprehensive curation of the thousands of newly sequenced organisms in this manner. Here, we describe WikiGenomes (wikigenomes.org), a web application that facilitates the consumption and curation of genomic data by the entire scientific community. WikiGenomes is based on Wikidata, an openly editable knowledge graph with the goal of aggregating published knowledge into a free and open database. WikiGenomes empowers the individual genomic researcher to contribute their expertise to the curation effort and integrates the knowledge into Wikidata, enabling it to be accessed by anyone without restriction.

Database Url: www.wikigenomes.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bax025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467579PMC
January 2017

Quantitative Metaproteomics and Activity-Based Probe Enrichment Reveals Significant Alterations in Protein Expression from a Mouse Model of Inflammatory Bowel Disease.

J Proteome Res 2017 02 23;16(2):1014-1026. Epub 2017 Jan 23.

Department of Molecular and Experimental Medicine, ‡Department of Integrative Structural and Computational Biology, and §Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States.

Tandem mass spectrometry based shotgun proteomics of distal gut microbiomes is exceedingly difficult due to the inherent complexity and taxonomic diversity of the samples. We introduce two new methodologies to improve metaproteomic studies of microbiome samples. These methods include the stable isotope labeling in mammals to permit protein quantitation across two mouse cohorts as well as the application of activity-based probes to enrich and analyze both host and microbial proteins with specific functionalities. We used these technologies to study the microbiota from the adoptive T cell transfer mouse model of inflammatory bowel disease (IBD) and compare these samples to an isogenic control, thereby limiting genetic and environmental variables that influence microbiome composition. The data generated highlight quantitative alterations in both host and microbial proteins due to intestinal inflammation and corroborates the observed phylogenetic changes in bacteria that accompany IBD in humans and mouse models. The combination of isotope labeling with shotgun proteomics resulted in the total identification of 4434 protein clusters expressed in the microbial proteomic environment, 276 of which demonstrated differential abundance between control and IBD mice. Notably, application of a novel cysteine-reactive probe uncovered several microbial proteases and hydrolases overrepresented in the IBD mice. Implementation of these methods demonstrated that substantial insights into the identity and dysregulation of host and microbial proteins altered in IBD can be accomplished and can be used in the interrogation of other microbiome-related diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.6b00938DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5441882PMC
February 2017

Data-Driven Approach To Determine Popular Proteins for Targeted Proteomics Translation of Six Organ Systems.

J Proteome Res 2016 11 19;15(11):4126-4134. Epub 2016 Jul 19.

Advanced Clinical Biosystems Research Institute, Department of Medicine and The Heart Institute, Cedars-Sinai Medical Center , Los Angeles, California 90048, United States.

Amidst the proteomes of human tissues lie subsets of proteins that are closely involved in conserved pathophysiological processes. Much of biomedical research concerns interrogating disease signature proteins and defining their roles in disease mechanisms. With advances in proteomics technologies, it is now feasible to develop targeted proteomics assays that can accurately quantify protein abundance as well as their post-translational modifications; however, with rapidly accumulating number of studies implicating proteins in diseases, current resources are insufficient to target every protein without judiciously prioritizing the proteins with high significance and impact for assay development. We describe here a data science method to prioritize and expedite assay development on high-impact proteins across research fields by leveraging the biomedical literature record to rank and normalize proteins that are popularly and preferentially published by biomedical researchers. We demonstrate this method by finding priority proteins across six major physiological systems (cardiovascular, cerebral, hepatic, renal, pulmonary, and intestinal). The described method is data-driven and builds upon the collective knowledge of previous publications referenced on PubMed to lend objectivity to target selection. The method and resulting popular protein lists may also be useful for exploring biological processes associated with various physiological systems and research topics, in addition to benefiting ongoing efforts to facilitate the broad translation of proteomics technologies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.6b00095DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5120959PMC
November 2016