64 results match your criteria processing bionlp

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19.

Genomics Inform 2021 Sep 30;19(3):e23. Epub 2021 Sep 30.

Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, 430070 Wuhan, China.

Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Read More

View Article and Full-Text PDF
September 2021

Continual Knowledge Infusion into Pre-trained Biomedical Language Models.

Bioinformatics 2021 Sep 23. Epub 2021 Sep 23.

Department of Computer Science, University of Virginia, Charlottesville, VA, 22903, USA.

Motivation: Biomedical language models produce meaningful concept representations that are useful for a variety of biomedical natural language processing (bioNLP) applications such as named entity recognition, relationship extraction, and question answering. Recent research trends have shown that the contextualized language models (e.g. Read More

View Article and Full-Text PDF
September 2021

Improved biomedical word embeddings in the transformer era.

J Biomed Inform 2021 08 18;120:103867. Epub 2021 Jul 18.

Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, United States of America; Department of Computer Science, University of Kentucky, United States of America. Electronic address:

Background: Recent natural language processing (NLP) research is dominated by neural network methods that employ word embeddings as basic building blocks. Pre-training with neural methods that capture local and global distributional properties (e.g. Read More

View Article and Full-Text PDF

Biomedical and clinical English model packages for the Stanza Python NLP library.

J Am Med Inform Assoc 2021 Aug;28(9):1892-1899

Department of Radiology, Stanford University, Stanford, California, USA.

Objective: The study sought to develop and evaluate neural natural language processing (NLP) packages for the syntactic analysis and named entity recognition of biomedical and clinical English text.

Materials And Methods: We implement and train biomedical and clinical English NLP pipelines by extending the widely used Stanza library originally designed for general NLP tasks. Our models are trained with a mix of public datasets such as the CRAFT treebank as well as with a private corpus of radiology reports annotated with 5 radiology-domain entities. Read More

View Article and Full-Text PDF

Evaluating sentence representations for biomedical text: Methods and experimental results.

J Biomed Inform 2020 04 6;104:103396. Epub 2020 Mar 6.

Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands. Electronic address:

Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. Read More

View Article and Full-Text PDF

Unsupervised inference of implicit biomedical events using context triggers.

BMC Bioinformatics 2020 Jan 28;21(1):29. Epub 2020 Jan 28.

School of Computing, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea.

Background: Event extraction from the biomedical literature is one of the most actively researched areas in biomedical text mining and natural language processing. However, most approaches have focused on events within single sentence boundaries, and have thus paid much less attention to events spanning multiple sentences. The Bacteria-Biotope event (BB-event) subtask presented in BioNLP Shared Task 2016 is one such example; a significant amount of relations between bacteria and biotope span more than one sentence, but existing systems have treated them as false negatives because labeled data is not sufficiently large enough to model a complex reasoning process using supervised learning frameworks. Read More

View Article and Full-Text PDF
January 2020

OryzaGP: rice gene and protein dataset for named-entity recognition.

Genomics Inform 2019 Jun 26;17(2):e17. Epub 2019 Jun 26.

Database Center for Life Science (DBCLS), Chiba 277-0871, Japan.

Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. Read More

View Article and Full-Text PDF

BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Sci Data 2019 May 10;6(1):52. Epub 2019 May 10.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, 20894, USA.

Distributed word representations have become an essential foundation for biomedical natural language processing (BioNLP), text mining and information retrieval. Word embeddings are traditionally computed at the word level from a large corpus of unlabeled text, ignoring the information present in the internal structure of words or any information available in domain specific structured resources such as ontologies. However, such information holds potentials for greatly improving the quality of the word representation, as suggested in some recent studies in the general domain. Read More

View Article and Full-Text PDF

Linking entities through an ontology using word embeddings and syntactic re-ranking.

BMC Bioinformatics 2019 Mar 27;20(1):156. Epub 2019 Mar 27.

Department of Computer Engineering, Boğaziçi University, İstanbul, 34342, Turkey.

Background: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The task of named entity recognition, which is the identification of entity names from text, is not adequate without a standardization step. Read More

View Article and Full-Text PDF

PMC text mining subset in BioC: about three million full-text articles and growing.

Bioinformatics 2019 09;35(18):3533-3535

National Center for Biotechnology Information (NCBI), U.S. Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.

Motivation: Interest in text mining full-text biomedical research articles is growing. To facilitate automated processing of nearly 3 million full-text articles (in PubMed Central® Open Access and Author Manuscript subsets) and to improve interoperability, we convert these articles to BioC, a community-driven simple data structure in either XML or JavaScript Object Notation format for conveniently sharing text and annotations.

Results: The resultant articles can be downloaded via both File Transfer Protocol for bulk access and a Web API for updates or a more focused collection. Read More

View Article and Full-Text PDF
September 2019

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.

J Cheminform 2019 Jan 10;11(1). Epub 2019 Jan 10.

Text Technology Lab, Goethe-University Frankfurt, Robert-Mayer-Straße 10, 60325, Frankfurt am Main, Germany.

Background: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. Read More

View Article and Full-Text PDF
January 2019

BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics.

Database (Oxford) 2018 01 1;2018. Epub 2018 Jan 1.

School of Computing and Information Systems, The University of Melbourne, Parkville VIC Australia.

Precision medicine aims to provide personalized treatments based on individual patient profiles. One critical step towards precision medicine is leveraging knowledge derived from biomedical publications-a tremendous literature resource presenting the latest scientific discoveries on genes, mutations and diseases. Biomedical natural language processing (BioNLP) plays a vital role in supporting automation of this process. Read More

View Article and Full-Text PDF
January 2018

Biomedical event extraction based on GRU integrating attention mechanism.

BMC Bioinformatics 2018 Aug 13;19(Suppl 9):285. Epub 2018 Aug 13.

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.

Background: Biomedical event extraction is a crucial task in biomedical text mining. As the primary forum for international evaluation of different biomedical event extraction technologies, BioNLP Shared Task represents a trend in biomedical text mining toward fine-grained information extraction (IE). The fourth series of BioNLP Shared Task in 2016 (BioNLP-ST'16) proposed three tasks, in which the Bacteria Biotope event extraction (BB) task has been put forward in the earlier BioNLP-ST. Read More

View Article and Full-Text PDF

A novel framework for biomedical entity sense induction.

J Biomed Inform 2018 08 20;84:31-41. Epub 2018 Jun 20.

Irstea, TETIS, Montpellier, France. Electronic address:

Background: Rapid advancements in biomedical research have accelerated the number of relevant electronic documents published online, ranging from scholarly articles to news, blogs, and user-generated social media content. Nevertheless, the vast amount of this information is poorly organized, making it difficult to navigate. Emerging technologies such as ontologies and knowledge bases (KBs) could help organize and track the information associated with biomedical research developments. Read More

View Article and Full-Text PDF

Scholarly Information Extraction Is Going to Make a Quantum Leap with PubMed Central (PMC).

Stud Health Technol Inform 2017 ;245:521-525

Jena University Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena 07743, Germany.

With the increasing availability of complete full texts (journal articles), rather than their surrogates (titles, abstracts), as resources for text analytics, entirely new opportunities arise for information extraction and text mining from scholarly publications. Yet, we gathered evidence that a range of problems are encountered for full-text processing when biomedical text analytics simply reuse existing NLP pipelines which were developed on the basis of abstracts (rather than full texts). We conducted experiments with four different relation extraction engines all of which were top performers in previous BioNLP Event Extraction Challenges. Read More

View Article and Full-Text PDF

PubRunner: A light-weight framework for updating text mining results.

F1000Res 2017 2;6:612. Epub 2017 May 2.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Read More

View Article and Full-Text PDF

Gold-standard ontology-based anatomical annotation in the CRAFT Corpus.

Database (Oxford) 2017 Jan;2017

School of Medicine, Department of Pharmacology, University of Colorado Anschutz Medical Campus, 12801 E. 17th Ave., P.O. Box 6511, MS 8303, Aurora, CO 80045-0511, USA.

Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resources. Bringing together the respective power of these, the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of full-length, open-access biomedical journal articles with extensive manually created syntactic, formatting and semantic markup, was previously created and released. This initial public release has already been used in multiple projects to drive development of systems focused on a variety of biocuration, search, visualization, and semantic and syntactic NLP tasks. Read More

View Article and Full-Text PDF
January 2017

Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.

J Biomed Semantics 2016 11;7:27. Epub 2016 May 11.

Department of Information Technology, University of Turku, Turku, Finland.

Background: Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task. Read More

View Article and Full-Text PDF
November 2017

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

J Biomed Semantics 2016 27;7:22. Epub 2016 Apr 27.

School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore.

Background: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. Read More

View Article and Full-Text PDF
October 2016

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.

Database (Oxford) 2016 13;2016. Epub 2016 Apr 13.

Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841 Korea and

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Read More

View Article and Full-Text PDF
January 2017

Optimizing graph-based patterns to extract biomedical events from the literature.

BMC Bioinformatics 2015 30;16 Suppl 16:S2. Epub 2015 Oct 30.

IN BIONLP-ST 2013: We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. Read More

View Article and Full-Text PDF

Sieve-based relation extraction of gene regulatory networks from biological literature.

BMC Bioinformatics 2015 30;16 Suppl 16:S1. Epub 2015 Oct 30.

Background: Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. Read More

View Article and Full-Text PDF

RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information.

IEEE/ACM Trans Comput Biol Bioinform 2015 Jan-Feb;12(1):17-29

We introduce RLIMS-P version 2.0, an enhanced rule-based information extraction (IE) system for mining kinase, substrate, and phosphorylation site information from scientific literature. Consisting of natural language processing and IE modules, the system has integrated several new features, including the capability of processing full-text articles and generalizability towards different post-translational modifications (PTMs). Read More

View Article and Full-Text PDF

Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task.

BMC Bioinformatics 2015 13;16 Suppl 10:S3. Epub 2015 Jul 13.

Background: The third edition of the BioNLP Shared Task was held with the grand theme "knowledge base construction (KB)". The Genia Event (GE) task was re-designed and implemented in light of this theme. For its final report, the participating systems were evaluated from a perspective of annotation. Read More

View Article and Full-Text PDF
February 2016

Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.

BMC Bioinformatics 2015 13;16 Suppl 10:S2. Epub 2015 Jul 13.

Background: Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two event extraction tasks introduced in the BioNLP Shared Task 2013. The CG task focuses on cancer, emphasizing the extraction of physiological and pathological processes at various levels of biological organization, and the PC task targets reactions relevant to the development of biomolecular pathway models, defining its extraction targets on the basis of established pathway representations and ontologies. Read More

View Article and Full-Text PDF
February 2016

Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task.

BMC Bioinformatics 2015 13;16 Suppl 10:S1. Epub 2015 Jul 13.

Background: We present the two Bacteria Track tasks of BioNLP 2013 Shared Task (ST): Gene Regulation Network (GRN) and Bacteria Biotope (BB). These tasks were previously introduced in the 2011 BioNLP-ST Bacteria Track as Bacteria Gene Interaction (BI) and Bacteria Biotope (BB). The Bacteria Track was motivated by a need to develop specific BioNLP tools for fine-grained event extraction in bacteria biology. Read More

View Article and Full-Text PDF
February 2016

Extracting biomedical events from pairs of text entities.

BMC Bioinformatics 2015 13;16 Suppl 10:S8. Epub 2015 Jul 13.

Background: Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Read More

View Article and Full-Text PDF
February 2016

Adaptable, high recall, event extraction system with minimal configuration.

BMC Bioinformatics 2015 13;16 Suppl 10:S7. Epub 2015 Jul 13.

Background: Biomedical event extraction has been a major focus of biomedical natural language processing (BioNLP) research since the first BioNLP shared task was held in 2009. Accordingly, a large number of event extraction systems have been developed. Most such systems, however, have been developed for specific tasks and/or incorporated task specific settings, making their application to new corpora and tasks problematic without modification of the systems themselves. Read More

View Article and Full-Text PDF
February 2016

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities.

BMC Bioinformatics 2015 13;16 Suppl 10:S6. Epub 2015 Jul 13.

Background: The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of each of these three types. To our knowledge, no participant in this task has investigated this aspect of the situation. The present work specifically addresses issues raised by this situation: (i) how to detect these co-reference links and associated co-reference chains; (ii) how to use them to prepare positive and negative examples to train a supervised system for the detection of relations between entity mentions; (iii) what context around which entity mentions contributes to relation detection when co-reference chains are provided. Read More

View Article and Full-Text PDF
February 2016

Detection and categorization of bacteria habitats using shallow linguistic analysis.

BMC Bioinformatics 2015 13;16 Suppl 10:S5. Epub 2015 Jul 13.

Background: Information regarding bacteria biotopes is important for several research areas including health sciences, microbiology, and food processing and preservation. One of the challenges for scientists in these domains is the huge amount of information buried in the text of electronic resources. Developing methods to automatically extract bacteria habitat relations from the text of these electronic resources is crucial for facilitating research in these areas. Read More

View Article and Full-Text PDF
February 2016