Publications by authors named "Zhiyong Lu"

208 Publications

Facile decoration of two-dimensional Ti3C2Tx nanoplates with CuS nanoparticles via a facile in-situ synthesis strategy at room temperature for superhigh specific capacitance of supercapacitors.

Nanotechnology 2021 Oct 19. Epub 2021 Oct 19.

Hohai University, Jiangning Campus, Hohai University, No. 8 Focheng West Road, Jiangning District, Nanjing City, Jiangsu Province, Nanjing, 210098, CHINA.

Although supercapacitors have attracted more and more attention owing to their fast charging speed and high power density, their wide applications have still been limited by their low energy density. In this study, a new CuS-nanoparticle-decorated Ti3C2Tx electrode material is fabricated via a facile in-situ synthesis strategy at room temperature. CuS nanoparticles, generated from the in-situ reaction of Cu (NO3)2·3H2O with Na2S, are anchored between the Ti3C2Tx interlayers through electrostatic interaction. This type of structural construction is found capable of not only reducing the surface oxidation of Ti3C2Tx, but also preventing the accumulation of CuS nanoparticles by the template effect of Ti3C2Tx nanoplates. As a result, the CuS/Ti3C2Tx nanohybrid delivers a maximum specific capacitance of 911 F g-1 at 1 A g-1. A symmetric supercapacitor (SSC) fabricated using the CuS/Ti3C2Tx nanohybrid as the electrode material exhibits an energy density of 43.56 W h kg-1 with a power density of 475 W kg-1. Consequently, this work provides a new perspective of microstructural design for the preparation of electrode materials with superhigh specific capacitance through an easy and low-cost in-situ-reaction method at room temperature.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1088/1361-6528/ac30f2DOI Listing
October 2021

Double-Walled [email protected] Multicomponent Senary Metal-Organic Polyhedral Framework and Its Isoreticular Evolution.

J Am Chem Soc 2021 Oct 19. Epub 2021 Oct 19.

Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States.

Metal-organic polyhedral frameworks are attractive in gas storage and separation due to large voids with windows that can serve as traps for guest molecules. Introducing multivariant/multicomponent functionalities in them are ways of improving performances for certain targets. The high compatibility of organic linkers can generate multivariant MOFs, but by far, the diversity of secondary building units (SBUs) in a single metal-organic framework is still limited (no more than two in most cases). Here we report a new double-walled [email protected] metal-organic polyhedral framework () with five types of topologically distinct SBUs and its isoreticular evolution to the [email protected] counterpart (). Both MOFs are the first to be constructed with such high numbers of topologically distinct SBUs as well as topologically distinct nodes, and their formation and evolution provide new insight into SBU's controllability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/jacs.1c08286DOI Listing
October 2021

Editor's introduction to the special section on the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Genomics Inform 2021 Sep 30;19(3):e20. Epub 2021 Sep 30.

Center for Convergence Research of Advanced Technologies, Ewha Womans University, Seoul 03760, Korea.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5808/gi.19.3.e1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510870PMC
September 2021

Woolly hair nevus caused by somatic mutation and Costello syndrome caused by germline mutation in HRAS: Consider parental mosaicism in prenatal counseling.

J Dermatol 2021 Oct 2. Epub 2021 Oct 2.

Department of Dermatology, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China.

Germline mutations in HRAS cause Costello syndrome (CS), while mosaic mutations in HRAS show a variability of phenotypes, ranging from mild features such as keratinocytic epidermal nevus (KEN), sebaceous nevus (SN), woolly hair nevus (WHN) with KEN, to severe manifestations of CS with cutis laxa. We report two individuals. The first was a 2-year-old boy with woolly hair nevus (WHN) without any other cutaneous involvement, in whom somatic HRAS mutation (c.34G>A; p.Gly12Ser) was identified in his affected scalp and hair follicle specimens. This is the first reported WHN type 1 (no cutaneous involvement) patient caused by somatic HRAS mutation. The other individual was a 12-year-old girl with CS caused by germline HRAS mutation (c.34G>A), that manifested with coarse face, palmoplantar keratoderma, deep palmar and plantar creases, hyperpigmented patches, asymmetry and deformity of lower limbs, atopic dermatitis, as well as mental retardation. Of note, a linear hyperpigmented plaque was observed in her father's lumbosacral region. Although the father refused to provide semen and skin tissue for further examination, this reminds us of possible mosaicism in parents of individuals with germline de novo HRAS mutation and underlines the importance of parental evaluation for prenatal counseling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1346-8138.16177DOI Listing
October 2021

Photoinduced Charge Transfer with a Small Driving Force Facilitated by Exciplex-like Complex Formation in Metal-Organic Frameworks.

J Am Chem Soc 2021 Sep 9;143(37):15286-15297. Epub 2021 Sep 9.

School of Chemical and Biomolecular Science, Southern Illinois University, 1245 Lincoln Drive, Carbondale, Illinois 62901, United States.

Photoinduced charge transfer (PCT) is a key step in the light-harvesting (LH) process producing the redox equivalents for energy conversion. However, like traditional macromolecular donor-acceptor assemblies, most MOF-derived LH systems are designed with a large Δ to drive PCT. To emulate the functionality of the reaction center of the natural LH complex that drives PCT within a pair of identical chromophores producing charge carriers with maximum potentials, we prepared two electronically diverse carboxy-terminated zinc porphyrins, BFBP(Zn)-COOH and TFP(Zn)-COOH, and installed them into the hexagonal pores of NU-1000 via solvent-assisted ligand incorporation (SALI), resulting in BFBP(Zn)@NU-1000 and TFP(Zn)@NU-1000 compositions. Varying the number of trifluoromethyl groups at the porphyrin core, we tuned the ground-state redox potentials of the porphyrins within ca. 0.1 V relative to that of NU-1000, defining a small Δ for PCT. For BFBP(Zn)@NU-1000, the relative ground- and excited-state redox potentials of the components facilitate an energy transfer (EnT) from NU-1000* to BFBP(Zn), forming BFBP(Zn)* which entails a long-lived charge-separated complex formed through an exciplex-like [BFBP(Zn)*-TBAPy] intermediate. Various time-resolved spectroscopic data suggest that EnT from NU-1000* may not involve a fast Förster-like resonance energy transfer (FRET) but rather through a slow [NU-1000*-BFBP(Zn)] intermediate formation. In contrast, TFP(Zn)@NU-1000 displays an efficient EnT from NU-1000* to [TFP(Zn)-TBAPy], a complex that formed at the ground state through electronic interaction, and thereon showed the excited-state feature of [TFP(Zn)-TBAPy]*. The results will help to develop synthetic LHC systems that can produce long-lived photogenerated charge carriers with high potentials, i.e., high open-circuit voltage in photoelectrochemical setups.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/jacs.1c06629DOI Listing
September 2021

Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing.

Annu Rev Biomed Data Sci 2021 07 14;4:313-339. Epub 2021 May 14.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA; email:

The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP)-the branch of artificial intelligence that interprets human language-can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, caseload forecasting, and misinformation detection. We conclude by discussing observable trends and remaining challenges.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1146/annurev-biodatasci-021821-061045DOI Listing
July 2021

COVID-19-CT-CXR: A Freely Accessible and Weakly Labeled Chest X-Ray and CT Image Collection on COVID-19 From Biomedical Literature.

IEEE Trans Big Data 2021 Mar 4;7(1):3-12. Epub 2020 Nov 4.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894 USA.

The latest threat to global health is the COVID-19 outbreak. Although there exist large datasets of chest X-rays (CXR) and computed tomography (CT) scans, few COVID-19 image collections are currently available due to patient privacy. At the same time, there is a rapid growth of COVID-19-relevant articles in the biomedical literature, including those that report findings on radiographs. Here, we present COVID-19-CT-CXR, a public database of COVID-19 CXR and CT images, which are automatically extracted from COVID-19-relevant articles from the PubMed Central Open Access (PMC-OA) Subset. We extracted figures, associated captions, and relevant figure descriptions in the article and separated compound figures into subfigures. Because a large portion of figures in COVID-19 articles are not CXR or CT, we designed a deep-learning model to distinguish them from other figure types and to classify them accordingly. The final database includes 1,327 CT and 263 CXR images (as of May 9, 2020) with their relevant text. To demonstrate the utility of COVID-19-CT-CXR, we conducted four case studies. (1) We show that COVID-19-CT-CXR, when used as additional training data, is able to contribute to improved deep-learning (DL) performance for the classification of COVID-19 and non-COVID-19 CT. (2) We collected CT images of influenza, another common infectious respiratory illness that may present similarly to COVID-19, and fine-tuned a baseline deep neural network to distinguish a diagnosis of COVID-19, influenza, or normal or other types of diseases on CT. (3) We fine-tuned an unsupervised one-class classifier from non-COVID-19 CXR and performed anomaly detection to detect COVID-19 CXR. (4) From text-mined captions and figure descriptions, we compared 15 clinical symptoms and 20 clinical findings of COVID-19 versus those of influenza to demonstrate the disease differences in the scientific publications. Our database is unique, as the figures are retrieved along with relevant text with fine-grained descriptions, and it can be extended easily in the future. We believe that our work is complementary to existing resources and hope that it will contribute to medical image analysis of the COVID-19 pandemic. The dataset, code, and DL models are publicly available at https://github.com/ncbi-nlp/COVID-19-CT-CXR.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/tbdata.2020.3035935DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117951PMC
March 2021

LitSuggest: a web-based system for literature recommendation and curation using machine learning.

Nucleic Acids Res 2021 07;49(W1):W352-W358

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA.

Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkab326DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262723PMC
July 2021

Ammonia Capture within Zirconium Metal-Organic Frameworks: Reversible and Irreversible Uptake.

ACS Appl Mater Interfaces 2021 May 22;13(17):20081-20093. Epub 2021 Apr 22.

Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States.

Ammonia uptake by high-capacity and high-porosity sorbents is a promising approach to its storage and release, capture and mitigation, and chemical separation. Here, we examined the ammonia sorption behavior of several versions of an archetypal zirconium-based metal-organic framework (MOF) material, NU-1000-a meso- and microporous crystalline compound having the empirical formula (1,3,6,8-tetrakis(-benzoate)pyrene) Zr(μ-O)(μ-OH)(HO)(OH) with linkers and nodes arranged to satisfy a topology. Depending on the thermal treatment protocol used prior to sorption measurements, ammonia can physisorb to NU-1000 via hydrogen-bonding and London-dispersion interactions and chemisorb via Brønsted acid-base reactions with node-integrated proton donors (μ-hydroxos) and node-ligated proton donors (terminal hydroxos), via simple coordination at open Zr(IV) sites, or via dissociative coordination to Zr(IV) as NH and protonation of a node-based μ-oxo. Ammonia adsorption occurs via both reversible and irreversible processes. The latter are of particular interest for protection and mitigation. Notably, the unexpected dissociative adsorption occurs only with nodes that have been fully dehydrated and irreversibly structurally distorted via thermal pre-treatment-a finding that is supported by density functional theory calculations. Differentiating and ranking the relative importance of the many modes of adsorption was facilitated, in part, by the availability of variants of NU-1000 that replace the majority of terminal aqua and hydroxo ligands with nonstructural formate ligands, auxiliary ditopic linkers, or both. The study provides insights into the chemical basis for both reversible and irreversible uptake of ammonia by Zr-MOFs and related compounds. The unexpectedly rich variety of sorption motifs suggest the criteria for designing or choosing MOFs that are optimal for specific ammonia-centric applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acsami.1c02370DOI Listing
May 2021

NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition.

J Biomed Inform 2021 06 9;118:103779. Epub 2021 Apr 9.

National Library of Medicine, National Institutes of Health, Bethesda, MD, USA. Electronic address:

The automatic recognition of gene names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. While current methods for tagging gene entities have been developed for biomedical literature, their performance on species other than human is substantially lower due to the lack of annotation data. We therefore present the NLM-Gene corpus, a high-quality manually annotated corpus for genes developed at the US National Library of Medicine (NLM), covering ambiguous gene names, with an average of 29 gene mentions (10 unique identifiers) per document, and a broader representation of different species (including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, etc.) when compared to previous gene annotation corpora. NLM-Gene consists of 550 PubMed abstracts from 156 biomedical journals, doubly annotated by six experienced NLM indexers, randomly paired for each document to control for bias. The annotators worked in three annotation rounds until they reached complete agreement. This gold-standard corpus can serve as a benchmark to develop & test new gene text mining algorithms. Using this new resource, we have developed a new gene finding algorithm based on deep learning which improved both on precision and recall from existing tools. The NLM-Gene annotated corpus is freely available at ftp://ftp.ncbi.nlm.nih.gov/pub/lu/NLMGene. We have also applied this tool to the entire PubMed/PMC with their results freely accessible through our web-based tool PubTator (www.ncbi.nlm.nih.gov/research/pubtator).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2021.103779DOI Listing
June 2021

Multimodal, multitask, multiattention (M3) deep learning detection of reticular pseudodrusen: Toward automated and accessible classification of age-related macular degeneration.

J Am Med Inform Assoc 2021 06;28(6):1135-1148

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.

Objective: Reticular pseudodrusen (RPD), a key feature of age-related macular degeneration (AMD), are poorly detected by human experts on standard color fundus photography (CFP) and typically require advanced imaging modalities such as fundus autofluorescence (FAF). The objective was to develop and evaluate the performance of a novel multimodal, multitask, multiattention (M3) deep learning framework on RPD detection.

Materials And Methods: A deep learning framework (M3) was developed to detect RPD presence accurately using CFP alone, FAF alone, or both, employing >8000 CFP-FAF image pairs obtained prospectively (Age-Related Eye Disease Study 2). The M3 framework includes multimodal (detection from single or multiple image modalities), multitask (training different tasks simultaneously to improve generalizability), and multiattention (improving ensembled feature representation) operation. Performance on RPD detection was compared with state-of-the-art deep learning models and 13 ophthalmologists; performance on detection of 2 other AMD features (geographic atrophy and pigmentary abnormalities) was also evaluated.

Results: For RPD detection, M3 achieved an area under the receiver-operating characteristic curve (AUROC) of 0.832, 0.931, and 0.933 for CFP alone, FAF alone, and both, respectively. M3 performance on CFP was very substantially superior to human retinal specialists (median F1 score = 0.644 vs 0.350). External validation (the Rotterdam Study) demonstrated high accuracy on CFP alone (AUROC, 0.965). The M3 framework also accurately detected geographic atrophy and pigmentary abnormalities (AUROC, 0.909 and 0.912, respectively), demonstrating its generalizability.

Conclusions: This study demonstrates the successful development, robust evaluation, and external validation of a novel deep learning framework that enables accessible, accurate, and automated AMD diagnosis and prognosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocaa302DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8200273PMC
June 2021

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.

Sci Data 2021 03 25;8(1):91. Epub 2021 Mar 25.

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-021-00875-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7994842PMC
March 2021

Isomer of linker for NU-1000 yields a new she-type, catalytic, and hierarchically porous, Zr-based metal-organic framework.

Chem Commun (Camb) 2021 Apr 11;57(29):3571-3574. Epub 2021 Mar 11.

College of Mechanics and Materials, Hohai University, Nanjing 210098, China.

The well-known MOF (metal-organic framework) linker tetrakis(p-benzoate)pyrene (TBAPy) lacks steric hindrance between its benzoates. Changing the 1,3,6,8-siting of benzoates in TBAPy to 4,5,9,10-siting introduces substantial steric hindrance and, in turn, enables the synthesis of a new hierarchically porous, she-type MOF Zr(μ-O)(μ-OH)(CHCOO)(COO)(TBAPy-2) (NU-601), where TBAPy-2 is the 4,5,9,10 isomer of TBAPy. NU-601 shows high catalytic activity for degradative hydrolysis of a simulant for G-type fluoro-phosphorus nerve agents.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/d0cc07974jDOI Listing
April 2021

THERAPY OF ENDOCRINE DISEASE: Novel protection and treatment strategies for chemotherapy-associated ovarian damage.

Eur J Endocrinol 2021 May;184(5):R177-R192

Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.

Fertility and ovarian protection against chemotherapy-associated ovarian damage has formed a new field called oncofertility, which is driven by the pursuit of fertility protection as well as good life quality for numerous female cancer survivors. However, the choice of fertility and ovarian protection method is a difficult problem during chemotherapy and there is no uniform guideline at present. To alleviate ovarian toxicity caused by anticancer drugs, effective methods combined with an individualized treatment plan that integrates an optimal strategy for preserving and restoring reproductive function should be offered from well-established to experimental stages before, during, and after chemotherapy. Although embryo, oocyte, and ovarian tissue cryopreservation are the major methods that have been proven effective and feasible for fertility protection, they are also subject to many limitations. Therefore, this paper mainly discusses the future potential methods and corresponding mechanisms for fertility protection in chemotherapy-associated ovarian damage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1530/EJE-20-1178DOI Listing
May 2021

Generalized Zero-Shot Chest X-Ray Diagnosis Through Trait-Guided Multi-View Semantic Embedding With Self-Training.

IEEE Trans Med Imaging 2021 10 30;40(10):2642-2655. Epub 2021 Sep 30.

Zero-shot learning (ZSL) is one of the most promising avenues of annotation-efficient machine learning. In the era of deep learning, ZSL techniques have achieved unprecedented success. However, the developments of ZSL methods have taken place mostly for natural images. ZSL for medical images has remained largely unexplored. We design a novel strategy for generalized zero-shot diagnosis of chest radiographs. In doing so, we leverage the potential of multi-view semantic embedding, a useful yet less-explored direction for ZSL. Our design also incorporates a self-training phase to tackle the problem of noisy labels alongside improving the performance for classes not seen during training. Through rigorous experiments, we show that our model trained on one dataset can produce consistent performance across test datasets from different sources including those with very different quality. Comparisons with a number of state-of-the-art techniques show the superiority of the proposed method for generalized zero-shot chest x-ray diagnosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TMI.2021.3054817DOI Listing
October 2021

PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology.

Bioinformatics 2021 Jan 20. Epub 2021 Jan 20.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.

Motivation: Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation.

Results: In this paper, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods.

Availability: The source code, API information and data for PhenoTagger are freely available at https://github.com/ncbi-nlp/PhenoTagger.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab019DOI Listing
January 2021

BERT-GT: Cross-sentence n-ary relation extraction with BERT and graph transformer.

Bioinformatics 2021 Jan 8. Epub 2021 Jan 8.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, 20894, USA.

Motivation: A biomedical relation statement is commonly expressed in multiple sentences and consists of many concepts, including gene, disease, chemical, and mutation. To automatically extract information from biomedical literature, existing biomedical text-mining approaches typically formulate the problem as a cross-sentence n-ary relation-extraction task that detects relations among n entities across multiple sentences, and use either a graph neural network (GNN) with long short-term memory (LSTM) or an attention mechanism. Recently, Transformer has been shown to outperform LSTM on many natural language processing (NLP) tasks.

Results: In this work, we propose a novel architecture that combines Bidirectional Encoder Representations from Transformers with Graph Transformer (BERT-GT), through integrating a neighbor-attention mechanism into the BERT architecture. Unlike the original Transformer architecture, which utilizes the whole sentence(s) to calculate the attention of the current token, the neighbor-attention mechanism in our method calculates its attention utilizing only its neighbor tokens. Thus, each token can pay attention to its neighbor information with little noise. We show that this is critically important when the text is very long, as in cross-sentence or abstract-level relation-extraction tasks. Our benchmarking results show improvements of 5.44% and 3.89% in accuracy and F1-measure over the state-of-the-art on n-ary and chemical-protein relation datasets, suggesting BERT-GT is a robust approach that is applicable to other biomedical relation extraction tasks or datasets.

Availability And Implementation: the source code of BERT-GT will be made freely available at https://github.com/ncbi-nlp/bert_gt upon publication.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa1087DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023679PMC
January 2021

Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks.

IEEE Trans Image Process 2021 11;30:1662-1675. Epub 2021 Jan 11.

Although deep neural networks have achieved great success on numerous large-scale tasks, poor interpretability is still a notorious obstacle for practical applications. In this paper, we propose a novel and general attention mechanism, loss-based attention, upon which we modify deep neural networks to mine significant image patches for explaining which parts determine the image decision-making. This is inspired by the fact that some patches contain significant objects or their parts for image-level decision. Unlike previous attention mechanisms that adopt different layers and parameters to learn weights and image prediction, the proposed loss-based attention mechanism mines significant patches by utilizing the same parameters to learn patch weights and logits (class vectors), and image prediction simultaneously, so as to connect the attention mechanism with the loss function for boosting the patch precision and recall. Additionally, different from previous popular networks that utilize max-pooling or stride operations in convolutional layers without considering the spatial relationship of features, the modified deep architectures first remove them to preserve the spatial relationship of image patches and greatly reduce their dependencies, and then add two convolutional or capsule layers to extract their features. With the learned patch weights, the image-level decision of the modified deep architectures is the weighted sum on patches. Extensive experiments on large-scale benchmark databases demonstrate that the proposed architectures can obtain better or competitive performance to state-of-the-art baseline networks with better interpretability. The source codes are available on: https://github.com/xsshi2015/Loss-based-Attention-for-Interpreting-Image-level-Prediction-of-Convolutional-Neural-Networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2020.3046875DOI Listing
January 2021

Unexpected "Spontaneous" Evolution of Catalytic, MOF-Supported Single Cu(II) Cations to Catalytic, MOF-Supported Cu(0) Nanoparticles.

J Am Chem Soc 2020 Dec 3;142(50):21169-21177. Epub 2020 Dec 3.

Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States.

A desirable feature of metal-organic frameworks (MOFs) is their well-defined structural periodicity and the presence of well-defined catalyst grafting sites (e.g., reactive -OH and -OH groups) that can support single-site heterogeneous catalysts. However, one should not overlook the potential role of residual organic moieties, specifically formate ions that can occupy the catalyst anchoring sites during MOF synthesis. Here we show how these residual formate species in a Zr-based MOF, NU-1000, critically alter the structure, redox capability, and catalytic activity of postsynthetically incorporated Cu(II) ions. Single-crystal X-ray diffraction measurements established that there are two structurally distinct types of Cu(II) ions in NU-1000: one type with residual formate and one without. In NU-1000 with formate, Cu(II) solely binds to the node via the formate-unoccupied, bridging μ-OH, whereas in the formate-free case, it displaces protons from two node hydroxo ligands and resides close to the terminal -OH. Under an inert atmosphere, node-bound formate facilitates the unanticipated reduction of isolated Cu(II) to nanoparticulate Cu(0)-a behavior which is essentially absent in the formate-free analogue because no other sacrificial reductant is present. When the two MOFs were tested as benzyl alcohol oxidation catalysts, we observed that residual formate boosts the catalytic turnover frequency. Density functional calculations showed that node-bound formate acts as a sacrificial two-electron donor and assists in reducing Cu(II) to Cu(0) by a nonradical pathway. The negative Gibbs free energy of reaction (Δ) and enthalpy of reaction (Δ) indicate that the reduction is thermodynamically favorable. The work presented here highlights how the often-neglected residual formate prevalent in nearly all zirconium-based MOFs can significantly modulate the properties of supported catalysts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/jacs.0c10367DOI Listing
December 2020

Node-Accessible Zirconium MOFs.

J Am Chem Soc 2020 Dec 2;142(50):21110-21121. Epub 2020 Dec 2.

Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States.

High-stability, zirconium-based metal-organic frameworks are attractive as heterogeneous catalysts and as model supports for uniform arrays of subsequently constructed heterogeneous catalysts-for example, MOF-node-grafted metal-oxy and metal-sulfur clusters. For hexa-Zr(IV)-MOFs characterized by nodes that are less than 12-connected, sites not used for linkers are ideally occupied by reactive and displaceable OH/HO pairs. The desired pairs are ideal for grafting the aforementioned catalytic clusters, while aqua-ligand lability renders them effective for exposing highly Lewis-acidic Zr(IV) sites (catalytic sites) to candidate reactants. New single-crystal X-ray studies of an eight-connected Zr-MOF, , reveal that conventional activation fully removes modulator ligands, but replaces them with three node-blocking formate ligands (from solvent decomposition) and only one OH/HO pair, not four-a largely overlooked complication that now appears to be general for Zr-MOFs. Here we describe an alternative activation protocol that effectively removes modulators, avoids formate, and installs the full complement of terminal OH/HO pairs. It does so via an unusual isolatable intermediate featuring eight aqua ligands and four non-ligated chlorides-again as supported by single-crystal X-ray data. We find that complete replacement of node-blocking modulators/formate with the originally envisioned OH/OH pairs has striking consequences; here we touch upon just three. First, elimination of unrecognized formate renders aqua ligands much more thermally labile, enabling open Zr(IV) sites to be obtained at lower temperature. Second, in the absence of formate, which otherwise links and locks pairs of node Zr(IV) ions, reversible removal of aqua ligands engenders reversible contraction of MOF meso- and micropores, as evidenced by X-ray diffraction. Third, formate replacement with OH/OH pairs renders ca.10× more active for catalytic hydrolytic degradation of a representative simulant of G-type chemical warfare agents.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/jacs.0c09782DOI Listing
December 2020

LitCovid: an open database of COVID-19 literature.

Nucleic Acids Res 2021 01;49(D1):D1534-D1540

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA.

Since the outbreak of the current pandemic in 2020, there has been a rapid growth of published articles on COVID-19 and SARS-CoV-2, with about 10,000 new articles added each month. This is causing an increasingly serious information overload, making it difficult for scientists, healthcare professionals and the general public to remain up to date on the latest SARS-CoV-2 and COVID-19 research. Hence, we developed LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/), a curated literature hub, to track up-to-date scientific information in PubMed. LitCovid is updated daily with newly identified relevant articles organized into curated categories. To support manual curation, advanced machine-learning and deep-learning algorithms have been developed, evaluated and integrated into the curation workflow. To the best of our knowledge, LitCovid is the first-of-its-kind COVID-19-specific literature resource, with all of its collected articles and curated data freely available. Since its release, LitCovid has been widely used, with millions of accesses by users worldwide for various information needs, such as evidence synthesis, drug discovery and text and data mining, among others.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa952DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778958PMC
January 2021

Database resources of the National Center for Biotechnology Information.

Nucleic Acids Res 2021 01;49(D1):D10-D17

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 34 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface and NCBI datasets. Additional resources that were updated in the past year include PMC, Bookshelf, Genome Data Viewer, SRA, ClinVar, dbSNP, dbVar, Pathogen Detection, BLAST, Primer-BLAST, IgBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa892DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778943PMC
January 2021

Better synonyms for enriching biomedical search.

J Am Med Inform Assoc 2020 12;27(12):1894-1902

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.

Objective: In a biomedical literature search, the link between a query and a document is often not established, because they use different terms to refer to the same concept. Distributional word embeddings are frequently used for detecting related words by computing the cosine similarity between them. However, previous research has not established either the best embedding methods for detecting synonyms among related word pairs or how effective such methods may be.

Materials And Methods: In this study, we first create the BioSearchSyn set, a manually annotated set of synonyms, to assess and compare 3 widely used word-embedding methods (word2vec, fastText, and GloVe) in their ability to detect synonyms among related pairs of words. We demonstrate the shortcomings of the cosine similarity score between word embeddings for this task: the same scores have very different meanings for the different methods. To address the problem, we propose utilizing pool adjacent violators (PAV), an isotonic regression algorithm, to transform a cosine similarity into a probability of 2 words being synonyms.

Results: Experimental results using the BioSearchSyn set as a gold standard reveal which embedding methods have the best performance in identifying synonym pairs. The BioSearchSyn set also allows converting cosine similarity scores into probabilities, which provides a uniform interpretation of the synonymy score over different methods.

Conclusions: We introduced the BioSearchSyn corpus of 1000 term pairs, which allowed us to identify the best embedding method for detecting synonymy for biomedical search. Using the proposed method, we created PubTermVariants2.0: a large, automatically extracted set of synonym pairs that have augmented PubMed searches since the spring of 2019.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocaa151DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7727334PMC
December 2020

Predicting risk of late age-related macular degeneration using deep learning.

NPJ Digit Med 2020 27;3:111. Epub 2020 Aug 27.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD USA.

By 2040, age-related macular degeneration (AMD) will affect ~288 million people worldwide. Identifying individuals at high risk of progression to late AMD, the sight-threatening stage, is critical for clinical actions, including medical interventions and timely monitoring. Although deep learning has shown promise in diagnosing/screening AMD using color fundus photographs, it remains difficult to predict individuals' risks of late AMD accurately. For both tasks, these initial deep learning attempts have remained largely unvalidated in independent cohorts. Here, we demonstrate how deep learning and survival analysis can predict the probability of progression to late AMD using 3298 participants (over 80,000 images) from the Age-Related Eye Disease Studies AREDS and AREDS2, the largest longitudinal clinical trials in AMD. When validated against an independent test data set of 601 participants, our model achieved high prognostic accuracy (5-year -statistic 86.4 (95% confidence interval 86.2-86.6)) that substantially exceeded that of retinal specialists using two existing clinical standards (81.3 (81.1-81.5) and 82.0 (81.8-82.3), respectively). Interestingly, our approach offers additional strengths over the existing clinical standards in AMD prognosis (e.g., risk ascertainment above 50%) and is likely to be highly generalizable, given the breadth of training data from 82 US retinal specialty clinics. Indeed, during external validation through training on AREDS and testing on AREDS2 as an independent cohort, our model retained substantially higher prognostic accuracy than existing clinical standards. These results highlight the potential of deep learning systems to enhance clinical decision-making in AMD patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-020-00317-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7453007PMC
August 2020

Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature.

Brief Bioinform 2021 05;22(3)

National Center for Biotechnology Information.

Motivation: To obtain key information for personalized medicine and cancer research, clinicians and researchers in the biomedical field are in great need of searching genomic variant information from the biomedical literature now than ever before. Due to the various written forms of genomic variants, however, it is difficult to locate the right information from the literature when using a general literature search system. To address the difficulty of locating genomic variant information from the literature, researchers have suggested various solutions based on automated literature-mining techniques. There is, however, no study for summarizing and comparing existing tools for genomic variant literature mining in terms of how to search easily for information in the literature on genomic variants.

Results: In this article, we systematically compared currently available genomic variant recognition and normalization tools as well as the literature search engines that adopted these literature-mining techniques. First, we explain the problems that are caused by the use of non-standard formats of genomic variants in the PubMed literature by considering examples from the literature and show the prevalence of the problem. Second, we review literature-mining tools that address the problem by recognizing and normalizing the various forms of genomic variants in the literature and systematically compare them. Third, we present and compare existing literature search engines that are designed for a genomic variant search by using the literature-mining techniques. We expect this work to be helpful for researchers who seek information about genomic variants from the literature, developers who integrate genomic variant information from the literature and beyond.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa142DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8138883PMC
May 2021

Large field of view beaconless laser nutation tracking sensor based on a micro-electro-mechanical system mirror.

Appl Opt 2020 Aug;59(22):6534-6539

We propose a laser nutation tracking sensor for beaconless laser communication, which uses a micro-electro-mechanical system (MEMS) mirror to achieve high-efficiency and large-amplitude nutation at its resonant frequency. We derive a new formula for the case of incompletely detectable optical power in the nutation cycle. In the experiment, we measure the performance of the sensor in calculating boresight error under three different nutation radii. Combining with the proposed algorithm for the new scene, we complete the accurate boresight calculation in the range of ±200µ, at the nutation radius of 4.9 µm. We trust that the receiving field of view (FOV) of this tracking sensor can be further expanded by increasing the nutation radius. The sensor, as proposed in this paper, will be of constructive help to simplify tracking systems in the future.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1364/AO.396490DOI Listing
August 2020

Editor's introduction to the special issue of the 6th Biomedical Linked Annotation Hackathon (BLAH6).

Genomics Inform 2020 Jun 24;18(2):e12. Epub 2020 Jun 24.

Center for Convergence Research of Advanced Technologies, Ewha Womans University, Seoul 03760, Korea.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5808/GI.2020.18.2.e12DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362940PMC
June 2020

COVID-19-CT-CXR: a freely accessible and weakly labeled chest X-ray and CT image collection on COVID-19 from biomedical literature.

ArXiv 2020 Jun 11. Epub 2020 Jun 11.

The latest threat to global health is the COVID-19 outbreak. Although there exist large datasets of chest X-rays (CXR) and computed tomography (CT) scans, few COVID-19 image collections are currently available due to patient privacy. At the same time, there is a rapid growth of COVID-19-relevant articles in the biomedical literature. Here, we present COVID-19-CT-CXR, a public database of COVID-19 CXR and CT images, which are automatically extracted from COVID-19-relevant articles from the PubMed Central Open Access (PMC-OA) Subset. We extracted figures, associated captions, and relevant figure descriptions in the article and separated compound figures into subfigures. We also designed a deep-learning model to distinguish them from other figure types and to classify them accordingly. The final database includes 1,327 CT and 263 CXR images (as of May 9, 2020) with their relevant text. To demonstrate the utility of COVID-19-CT-CXR, we conducted four case studies. (1) We show that COVID-19-CT-CXR, when used as additional training data, is able to contribute to improved DL performance for the classification of COVID-19 and non-COVID-19 CT. (2) We collected CT images of influenza and trained a DL baseline to distinguish a diagnosis of COVID-19, influenza, or normal or other types of diseases on CT. (3) We trained an unsupervised one-class classifier from non-COVID-19 CXR and performed anomaly detection to detect COVID-19 CXR. (4) From text-mined captions and figure descriptions, we compared clinical symptoms and clinical findings of COVID-19 vs. those of influenza to demonstrate the disease differences in the scientific publications. We believe that our work is complementary to existing resources and hope that it will contribute to medical image analysis of the COVID-19 pandemic. The dataset, code, and DL models are publicly available at https://github.com/ncbi-nlp/COVID-19-CT-CXR.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298731PMC
June 2020

Phase-shift laser range finder technique based on optical carrier phase modulation.

Appl Opt 2020 Jun;59(17):5079-5085

A coherent laser range finder based on optical phase modulation and phase shift measurement is presented. In the proposed laser range finder, the emitted laser is modulated by an electro-optic phase modulator using a 20 MHz sine signal, and the received laser is mixed with a local oscillator using a 90° optical hybrid. Compared with traditional laser phase shift range finders, the proposed laser range finder can measure the velocity and range at high precision simultaneously. An algorithm to calculate the range and velocity is deduced. Our preliminary experiments on moving targets indicate that when the measurement rate is 100 kHz, the root mean square errors of range and velocity, respectively, are 9.35×10 and 4.74×10/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1364/AO.387196DOI Listing
June 2020

Ten tips for a text-mining-ready article: How to improve automated discoverability and interpretability.

PLoS Biol 2020 06 1;18(6):e3000716. Epub 2020 Jun 1.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America.

Data-driven research in biomedical science requires structured, computable data. Increasingly, these data are created with support from automated text mining. Text-mining tools have rapidly matured: although not perfect, they now frequently provide outstanding results. We describe 10 straightforward writing tips-and a web tool, PubReCheck-guiding authors to help address the most common cases that remain difficult for text-mining tools. We anticipate these guides will help authors' work be found more readily and used more widely, ultimately increasing the impact of their work and the overall benefit to both authors and readers. PubReCheck is available at http://www.ncbi.nlm.nih.gov/research/pubrecheck.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.3000716DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289435PMC
June 2020
-->