Publications by authors named "Kaiyin Zhou"

2 Publications

  • Page 1 of 1

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19.

Genomics Inform 2021 Sep 30;19(3):e23. Epub 2021 Sep 30.

Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, 430070 Wuhan, China.

Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5808/gi.21013DOI Listing
September 2021

Bridging heterogeneous mutation data to enhance disease gene discovery.

Brief Bioinform 2021 09;22(5)

Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China.

Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations that link genotype and phenotype. However, due to the differences of size and quality between GWAS studies, not all de facto vital variations are able to pass the multiple testing. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain of function and loss of function. To bring together the heterogeneous mutation data, we propose a 'Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)' pipeline with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types and recovers false-negative GWAS mutations that fail to pass significant test but represent supportive evidences of functional biological process in literature. Eventually, we applied GDAMDB in Alzheimer's disease (AD) and predicted 79 AD-associated genes. Besides, 12 of them from the original GWAS, 60 of them are supported to be AD-related by other GWAS or literature report, and rest of them are newly predicted genes. Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbab079DOI Listing
September 2021
-->