Publications by authors named "Victor Maojo"

48 Publications

Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory.

JMIR Med Inform 2021 Feb 25;9(2):e22976. Epub 2021 Feb 25.

Biomedical Informatics Group, School of Computer Science, Universidad Politecnica de Madrid, Madrid, Spain.

Background: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases.

Objective: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly.

Methods: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection.

Results: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to "omics" and the other related to the COVID-19 pandemic.

Conclusions: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2196/22976DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7952234PMC
February 2021

A deep learning approach using synthetic images for segmenting and estimating 3D orientation of nanoparticles in EM images.

Comput Methods Programs Biomed 2021 Apr 2;202:105958. Epub 2021 Feb 2.

Biomedical Informatics Group (GIB), Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, Madrid 28660, Spain.

Background And Objective: Nanoparticles present properties that can be applied to a wide range of fields such as biomedicine, electronics or optics. The type of properties depends on several characteristics, being some of them related with the particle structure. A proper characterization of nanoparticles is crucial since it could affect their applications. To characterize a particle shape and size, the nanotechnologists employ Electron Microscopy (EM) to obtain images of nanoparticles and perform measures over them. This task could be tedious, repetitive and slow, we present a Deep Learning method based on Convolutional Neural Networks (CNNs) to detect, segment, infer orientations and reconstruct microscope images of nanoparticles. Since machine learning algorithms depend on annotated data and there is a lack of annotated datasets of nanoparticles, our work makes use of artificial datasets of images resembling real nanoparticles photographs.

Methods: Our work is divided into three tasks. Firstly, a method to create annotated datasets of artificial images resembling Scanning Electron Microscope (SEM). Secondly, two models of convolutional neural networks are trained using the artificial datasets previously generated, the first one is in charge of the detection and segmentation of the nanoparticles while the second one will infer the nanoparticle orientation. Finally, the 3D reconstruction module will recreate in a 3D scene the set of detected particles.

Results: We have tested our method with five different shapes of basic nanoparticles: spheres, cubes, ellipsoids, hexagonal discs and octahedrons. An analysis of the reconstructions was conducted by manually comparing each of them with the real images. The results obtained have been promising, the particles are segmented and reconstructed accordingly to their shapes and orientations.

Conclusions: We have developed a method for nanoparticle detection and segmentation in microscope images. Moreover, we can also infer an approximation of the 3D orientation of the particles and, in conjunction with the detections, create a 3D reconstruction of the photographs. The novelty of our approximation lies in the dataset used. Instead of using annotated images, we have created the datasets simulating the microscope images by using basic geometrical objects that imitate real nanoparticles.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2021.105958DOI Listing
April 2021

genoDraw: A Web Tool for Developing Pedigree Diagrams Using the Standardized Human Pedigree Nomenclature Integrated with Biomedical Vocabularies.

AMIA Annu Symp Proc 2019 4;2019:457-466. Epub 2020 Mar 4.

Biomedical Informatics Group, DIA & DLSIIS, ETSI Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Spain.

The integration of genetic information in current clinical routine has raised a need for tools to exploit family genetic knowledge. On the clinical side, an application for managing and visualizing pedigree diagrams could provide genetics specialists with an integrated environment with potential positive impact on their current practice. This article presents a web tool (genoDraw) that provides clinical practitioners with the ability to create, maintain and visualize patients' and their families' information in the form of pedigree diagrams. genoDraw implements a graph-based three-step process for generating diagrams according to a de facto standard in the area and clinical terminologies. It also complies with five characteristics identified as indispensable for the next-generation of pedigree drawing software: comprehensiveness, data-drivenness, automation, interactivity and compatibility with biomedical vocabularies. The platform was implemented and tested, confirming its potential interest to clinical routine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7153108PMC
August 2020

Carbon Nanotubes' Effect on Mitochondrial Oxygen Flux Dynamics: Polarography Experimental Study and Machine Learning Models using Star Graph Trace Invariants of Raman Spectra.

Nanomaterials (Basel) 2017 Nov 11;7(11). Epub 2017 Nov 11.

INIBIC Institute of Biomedical Research, CHUAC, UDC, 15006 Coruña, Spain.

This study presents the impact of carbon nanotubes (CNTs) on mitochondrial oxygen mass flux () under three experimental conditions. New experimental results and a new methodology are reported for the first time and they are based on CNT Raman spectra star graph transform (spectral moments) and perturbation theory. The experimental measures of showed that no tested CNT family can inhibit the oxygen consumption profiles of mitochondria. The best model for the prediction of for other CNTs was provided by random forest using eight features, obtaining test R-squared (²) of 0.863 and test root-mean-square error (RMSE) of 0.0461. The results demonstrate the capability of encoding CNT information into spectral moments of the Raman star graphs (SG) transform with a potential applicability as predictive tools in nanotechnology and material risk assessments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/nano7110386DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5707603PMC
November 2017

A semantic interoperability approach to support integration of gene expression and clinical data in breast cancer.

Comput Biol Med 2017 08 5;87:179-186. Epub 2017 Jun 5.

Biomedical Informatics Group, DIA & DLSIIS, ETSI Informáticos, Universidad Politécnica de Madrid, Spain. Electronic address:

Introduction: The introduction of omics data and advances in technologies involved in clinical treatment has led to a broad range of approaches to represent clinical information. Within this context, patient stratification across health institutions due to omic profiling presents a complex scenario to carry out multi-center clinical trials.

Methods: This paper presents a standards-based approach to ensure semantic integration required to facilitate the analysis of clinico-genomic clinical trials. To ensure interoperability across different institutions, we have developed a Semantic Interoperability Layer (SIL) to facilitate homogeneous access to clinical and genetic information, based on different well-established biomedical standards and following International Health (IHE) recommendations.

Results: The SIL has shown suitability for integrating biomedical knowledge and technologies to match the latest clinical advances in healthcare and the use of genomic information. This genomic data integration in the SIL has been tested with a diagnostic classifier tool that takes advantage of harmonized multi-center clinico-genomic data for training statistical predictive models.

Conclusions: The SIL has been adopted in national and international research initiatives, such as the EURECA-EU research project and the CIMED collaborative Spanish project, where the proposed solution has been applied and evaluated by clinical experts focused on clinico-genomic studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2017.06.005DOI Listing
August 2017

Research Strategies for Biomedical and Health Informatics. Some Thought-provoking and Critical Proposals to Encourage Scientific Debate on the Nature of Good Research in Medical Informatics.

Methods Inf Med 2017 Jan 25;56(S 01):e1-e10. Epub 2017 Jan 25.

Background: Medical informatics, or biomedical and health informatics (BMHI), has become an established scientific discipline. In all such disciplines there is a certain inertia to persist in focusing on well-established research areas and to hold on to well-known research methodologies rather than adopting new ones, which may be more appropriate.

Objectives: To search for answers to the following questions: What are research fields in informatics, which are not being currently adequately addressed, and which methodological approaches might be insufficiently used? Do we know about reasons? What could be consequences of change for research and for education?

Methods: Outstanding informatics scientists were invited to three panel sessions on this topic in leading international conferences (MIE 2015, Medinfo 2015, HEC 2016) in order to get their answers to these questions.

Results: A variety of themes emerged in the set of answers provided by the panellists. Some panellists took the theoretical foundations of the field for granted, while several questioned whether the field was actually grounded in a strong theoretical foundation. Panellists proposed a range of suggestions for new or improved approaches, methodologies, and techniques to enhance the BMHI research agenda.

Conclusions: The field of BMHI is on the one hand maturing as an academic community and intellectual endeavour. On the other hand vendor-supplied solutions may be too readily and uncritically accepted in health care practice. There is a high chance that BMHI will continue to flourish as an important discipline; its innovative interventions might then reach the original objectives of advancing science and improving health care outcomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3414/ME16-01-0125DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5388922PMC
January 2017

Discussion of "The New Role of Biomedical Informatics in the Age of Digital Medicine".

Methods Inf Med 2016 Oct 15;55(5):403-421. Epub 2016 Aug 15.

Najeeb Al-Shorbaji, Vice-President for Knowledge, Research, and Ethics, e-Marefa (www.e-marefa.net), P.O. Box 2351, Amman 11953, Jordan, E-mail:

This article is part of a For-Discussion-Section of Methods of Information in Medicine about the paper "The New Role of Biomedical Informatics in the Age of Digital Medicine" written by Fernando J. Martin-Sanchez and Guillermo H. Lopez-Campos [1]. It is introduced by an editorial. This article contains the combined commentaries invited to independently comment on the paper of Martin-Sanchez and Lopez-Campos. In subsequent issues the discussion can continue through letters to the editor.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3414/ME15-12-0005DOI Listing
October 2016

A method and software framework for enriching private biomedical sources with data from public online repositories.

J Biomed Inform 2016 Apr 10;60:177-86. Epub 2016 Feb 10.

Biomedical Informatics Group, Universidad Politécnica de Madrid, Spain.

Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public databases in order to enrich the information about the identified entities-e.g. scientific literature, published clinical trial results, etc. While semantic integration techniques have traditionally focused on providing homogeneous access to private datasets-thus helping automate the first part of the research, and there exist different solutions for browsing public data, there is still a need for tools that facilitate merging public repositories with private datasets. This paper presents a framework that automatically locates public data of interest to the researcher and semantically integrates it with existing private datasets. The framework has been designed as an extension of traditional data integration systems, and has been validated with an existing data integration platform from a European research project by integrating a private biological dataset with data from the National Center for Biotechnology Information (NCBI).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2016.02.004DOI Listing
April 2016

A machine learning approach to identify clinical trials involving nanodrugs and nanodevices from ClinicalTrials.gov.

PLoS One 2014 27;9(10):e110331. Epub 2014 Oct 27.

Biomedical Informatics Group, Dept. Inteligencia Artificial, Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.

Background: Clinical Trials (CTs) are essential for bridging the gap between experimental research on new drugs and their clinical application. Just like CTs for traditional drugs and biologics have helped accelerate the translation of biomedical findings into medical practice, CTs for nanodrugs and nanodevices could advance novel nanomaterials as agents for diagnosis and therapy. Although there is publicly available information about nanomedicine-related CTs, the online archiving of this information is carried out without adhering to criteria that discriminate between studies involving nanomaterials or nanotechnology-based processes (nano), and CTs that do not involve nanotechnology (non-nano). Finding out whether nanodrugs and nanodevices were involved in a study from CT summaries alone is a challenging task. At the time of writing, CTs archived in the well-known online registry ClinicalTrials.gov are not easily told apart as to whether they are nano or non-nano CTs-even when performed by domain experts, due to the lack of both a common definition for nanotechnology and of standards for reporting nanomedical experiments and results.

Methods: We propose a supervised learning approach for classifying CT summaries from ClinicalTrials.gov according to whether they fall into the nano or the non-nano categories. Our method involves several stages: i) extraction and manual annotation of CTs as nano vs. non-nano, ii) pre-processing and automatic classification, and iii) performance evaluation using several state-of-the-art classifiers under different transformations of the original dataset.

Results And Conclusions: The performance of the best automated classifier closely matches that of experts (AUC over 0.95), suggesting that it is feasible to automatically detect the presence of nanotechnology products in CT summaries with a high degree of accuracy. This can significantly speed up the process of finding whether reports on ClinicalTrials.gov might be relevant to a particular nanoparticle or nanodevice, which is essential to discover any precedents for nanotoxicity events or advantages for targeted drug therapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0110331PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4210133PMC
June 2015

Towards the taxonomic categorization and recognition of nanoparticle shapes.

Nanomedicine 2015 Feb 27;11(2):457-65. Epub 2014 Jul 27.

Biomedical Informatics Group, Universidad Politécnica de Madrid, Campus de Montegancedo, L3204, Boadilla del Monte, Madrid, Spain. Electronic address:

Unlabelled: The shape of nanoparticles and nanomaterials is a fundamental characteristic that has been shown to influence a number of their properties and effects, particularly for nanomedical applications. The information related with this feature of nanoparticles and nanomaterials is, therefore, crucial to exploit and foster in existing and future research in this area. We have found that descriptions of morphological and spatial properties are consistently reported in the nanotechnology literature, and in general, these morphological properties can be observed and measured using various microscopy techniques. In this paper, we outline a taxonomy of nanoparticle shapes constructed according to nanotechnologists' descriptions and formal geometric concepts that can be used to address the problem of nanomaterial categorization. We employ an image segmentation technique, belonging to the mathematical morphology field, which is capable of identifying shapes in images that can be used to (semi-) automatically annotate nanoparticle images.

From The Clinical Editor: This team of authors outlines a taxonomy of nanoparticle shapes constructed according to nanotechnologists' descriptions and formal geometric concepts enabling nanomaterial categorization. They also employ a mathematical morphology-based image segmentation system, capable of identifying shapes and can be utilized in semi-automated annotation of nanoparticle images.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.nano.2014.07.006DOI Listing
February 2015

Past and next 10 years of medical informatics.

J Med Syst 2014 Jul;38(7):74

Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Mainz, Obere Zahlbacher Str. 69, 55131, Mainz, Germany,

More than 10 years ago Haux et al. tried to answer the question how health care provision will look like in the year 2013. A follow-up workshop was held in Braunschweig, Germany, for 2 days in May, 2013, with 20 invited international experts in biomedical and health informatics. Among other things it had the objectives to discuss the suggested goals and measures of 2002 and how priorities on MI research in this context should be set from the viewpoint of today. The goals from 2002 are now as up-to-date as they were then. The experts stated that the three goals: "patient-centred recording and use of medical data for cooperative care"; "process-integrated decision support through current medical knowledge" and "comprehensive use of patient data for research and health care reporting" have not been reached yet and are still relevant. A new goal for ICT in health care should be the support of patient centred personalized (individual) medicine. MI as an academic discipline carries out research concerning tools that support health care professionals in their work. This research should be carried out without the pressure that it should lead to systems that are immediately and directly accepted in practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10916-014-0074-5DOI Listing
July 2014

Assessing the prognoses on Health care in the information society 2013--thirteen years after.

J Med Syst 2014 Jul;38(7):73

University of Heidelberg, Institute for Medical Biometry and Informatics, Heidelberg, Germany,

Health care and information technology in health care is advancing at tremendous speed. We analysed whether the prognoses by Haux et al. - first presented in 2000 and published in 2002 - have been fulfilled in 2013 and which might be the reasons for match or mismatch. Twenty international experts in biomedical and health informatics met in May 2013 in a workshop to discuss match or mismatch of each of the 71 prognoses. After this meeting a web-based survey among workshop participants took place. Thirty-three prognoses were assessed matching; they reflect e.g. that there is good progress in storing patient data electronically in health care institutions. Twenty-three prognoses were assessed mismatching; they reflect e.g. that telemedicine and home monitoring as well as electronic exchange of patient data between institutions is not established as widespread as expected. Fifteen prognoses were assessed neither matching nor mismatching. ICT tools have considerably influenced health care in the last decade, but in many cases not as far as it was expected by Haux et al. in 2002. In most cases this is not a matter of the availability of technical solutions but of organizational and ethical issues. We need innovative and modern information system architectures which support multiple use of data for patient care as well as for research and reporting and which are able to integrate data from home monitoring into a patient centered health record. Since innovative technology is available the efficient and wide-spread use in health care has to be enabled by systematic information management.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10916-014-0073-6DOI Listing
July 2014

NCBI2RDF: enabling full RDF-based access to NCBI databases.

Biomed Res Int 2013 28;2013:983805. Epub 2013 Jul 28.

Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, Boadilla del Monte, 28660 Madrid, Spain.

RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2013/983805DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3745940PMC
March 2014

Enhancing research capacity of African institutions through social networking.

Stud Health Technol Inform 2013 ;192:1099

Biomedical Informatics Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain.

Traditionally, participation of African researchers in top Biomedical Informatics (BMI) scientific journals and conferences has been scarce. Looking beyond these numbers, an educational goal should be to improve overall research and, therefore, to increase the number of scientists/authors able to produce and publish high quality research. In such scenario, we are carrying out various efforts to expand the capacities of various institutions located at four African countries - Egypt, Ghana, Cameroon and Mali - in the framework of a European Commission-funded project, AFRICA BUILD. This project is currently carrying out activities such as e-learning, collaborative development of informatics tools, mobility of researchers, various pilot projects, and others. Our main objective is to create a self-sustained South-South network of BMI developers.
View Article and Find Full Text PDF

Download full-text PDF

Source
April 2015

Analyzing SNOMED CT and HL7 terminology binding for semantic interoperability on post-genomic clinical trials.

Stud Health Technol Inform 2013 ;192:980

Biomedical Informatics Group, DIA & DLSIIS, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain.

Current post-genomic clinical trials in cancer involve the collaboration of several institutions. Multi-centric retrospective analysis requires advanced methods to ensure semantic interoperability. In this scenario, the objective of the EU funded INTEGRATE project, is to provide an infrastructure to share knowledge and data in post-genomic breast cancer clinical trials. This paper presents the process carried out in this project, to bind domain terminologies in the area, such as SNOMED CT, with the HL7 v3 Reference Information Model (RIM). The proposed terminology binding follow the HL7 recommendations, but should also consider important issues such as overlapping concepts and domain terminology coverage. Although there are limitations due to the large heterogeneity of the data in the area, the proposed process has been successfully applied within the context of the INTEGRATE project. An improvement in semantic interoperability of patient data from modern breast cancer clinical trials, aims to enhance the clinical practice in oncology.
View Article and Find Full Text PDF

Download full-text PDF

Source
April 2015

A data model based on semantically enhanced HL7 RIM for sharing patient data of breast cancer clinical trials.

Stud Health Technol Inform 2013 ;192:971

Biomedical Informatics Group, DIA & DLSIIS, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, Spain.

Breast cancer clinical trial researchers have to handle heterogeneous data coming from different data sources, overloading biomedical researchers when they need to query data for retrospective analysis. This paper presents the Common Data Model (CDM) proposed within the INTEGRATE EU project to homogenize data coming from different clinical partners. This CDM is based on the Reference Information Model (RIM) from the Health Level 7 (HL7) version 3. Semantic capabilities through an SPARQL endpoint were also required to ensure the sustainability of the platform. For the SPARQL endpoint implementation, a comparison has been carried out between a Relational SQL database + D2R and a RDF database. The results show that the first option can store all clinical data received from institutions participating in the project with a better performance. It has been also evaluated by the EU Commission within a patient recruitment demonstrator.
View Article and Find Full Text PDF

Download full-text PDF

Source
April 2015

Fostering ontology alignment sharing: a general-purpose RDF mapping format.

Stud Health Technol Inform 2013 ;192:970

Biomedical Informatics Group, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660 Madrid, Spain.

RDF has established in the last years as the language for describing, publishing and sharing biomedical resources. Following this trend, a great amount of RDF-based data sources, as well as ontologies, have appeared. Using a common language as RDF has provided a unified syntactic for sharing resources, but the semantics remain as the main cause of heterogeneity, hampering data integration and homogenization efforts. To overcome this issue, ontology alignment based solutions have been typically used. However, alignment information is usually codified using ad-hoc formats. In this paper, we present a general purpose ontology mapping format, totally independent from the homogenization approach to be applied. The format is accompanied with a Java API that offers mapping construction and parsing features, as well as some basic algorithms for applying it to data translation solutions.
View Article and Find Full Text PDF

Download full-text PDF

Source
April 2015

Note on Friedman's 'what informatics is and isn't'.

J Am Med Inform Assoc 2013 Dec 16;20(e2):e365-6. Epub 2013 Apr 16.

Biomedical Informatics Group, Universidad Politecnica de Madrid, Madrid, Spain.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/amiajnl-2013-001807DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861923PMC
December 2013

RDFBuilder: a tool to automatically build RDF-based interfaces for MAGE-OM microarray data sources.

Comput Methods Programs Biomed 2013 Jul 11;111(1):220-7. Epub 2013 May 11.

Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain.

This paper presents RDFBuilder, a tool that enables RDF-based access to MAGE-ML-compliant microarray databases. We have developed a system that automatically transforms the MAGE-OM model and microarray data stored in the ArrayExpress database into RDF format. Additionally, the system automatically enables a SPARQL endpoint. This allows users to execute SPARQL queries for retrieving microarray data, either from specific experiments or from more than one experiment at a time. Our system optimizes response times by caching and reusing information from previous queries. In this paper, we describe our methods for achieving this transformation. We show that our approach is complementary to other existing initiatives, such as Bio2RDF, for accessing and retrieving data from the ArrayExpress database.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2013.04.009DOI Listing
July 2013

The impact of computer science in molecular medicine: enabling high-throughput research.

Curr Top Med Chem 2013 ;13(5):526-75

Biomedical Informatics Group, Facultad de Informatica, Universidad Politécnica de Madrid, Spain.

The Human Genome Project and the explosion of high-throughput data have transformed the areas of molecular and personalized medicine, which are producing a wide range of studies and experimental results and providing new insights for developing medical applications. Research in many interdisciplinary fields is resulting in data repositories and computational tools that support a wide diversity of tasks: genome sequencing, genome-wide association studies, analysis of genotype-phenotype interactions, drug toxicity and side effects assessment, prediction of protein interactions and diseases, development of computational models, biomarker discovery, and many others. The authors of the present paper have developed several inventories covering tools, initiatives and studies in different computational fields related to molecular medicine: medical informatics, bioinformatics, clinical informatics and nanoinformatics. With these inventories, created by mining the scientific literature, we have carried out several reviews of these fields, providing researchers with a useful framework to locate, discover, search and integrate resources. In this paper we present an analysis of the state-of-the-art as it relates to computational resources for molecular medicine, based on results compiled in our inventories, as well as results extracted from a systematic review of the literature and other scientific media. The present review is based on the impact of their related publications and the available data and software resources for molecular medicine. It aims to provide information that can be useful to support ongoing research and work to improve diagnostics and therapeutics based on molecular-level insights.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2174/1568026611313050002DOI Listing
February 2014

Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature.

Biomed Res Int 2013 27;2013:410294. Epub 2012 Dec 27.

Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, 28660 Madrid, Spain.

Nanoinformatics is an emerging research field that uses informatics techniques to collect, process, store, and retrieve data, information, and knowledge on nanoparticles, nanomaterials, and nanodevices and their potential applications in health care. In this paper, we have focused on the solutions that nanoinformatics can provide to facilitate nanotoxicology research. For this, we have taken a computational approach to automatically recognize and extract nanotoxicology-related entities from the scientific literature. The desired entities belong to four different categories: nanoparticles, routes of exposure, toxic effects, and targets. The entity recognizer was trained using a corpus that we specifically created for this purpose and was validated by two nanomedicine/nanotoxicology experts. We evaluated the performance of our entity recognizer using 10-fold cross-validation. The precisions range from 87.6% (targets) to 93.0% (routes of exposure), while recall values range from 82.6% (routes of exposure) to 87.4% (toxic effects). These results prove the feasibility of using computational approaches to reliably perform different named entity recognition (NER)-dependent tasks, such as for instance augmented reading or semantic searches. This research is a "proof of concept" that can be expanded to stimulate further developments that could assist researchers in managing data, information, and knowledge at the nanolevel, thus accelerating research in nanomedicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2013/410294DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591181PMC
November 2013

Nanoinformatics: a new area of research in nanomedicine.

Int J Nanomedicine 2012 24;7:3867-90. Epub 2012 Jul 24.

Biomedical Informatics Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Spain.

Over a decade ago, nanotechnologists began research on applications of nanomaterials for medicine. This research has revealed a wide range of different challenges, as well as many opportunities. Some of these challenges are strongly related to informatics issues, dealing, for instance, with the management and integration of heterogeneous information, defining nomenclatures, taxonomies and classifications for various types of nanomaterials, and research on new modeling and simulation techniques for nanoparticles. Nanoinformatics has recently emerged in the USA and Europe to address these issues. In this paper, we present a review of nanoinformatics, describing its origins, the problems it addresses, areas of interest, and examples of current research initiatives and informatics resources. We suggest that nanoinformatics could accelerate research and development in nanomedicine, as has occurred in the past in other fields. For instance, biomedical informatics served as a fundamental catalyst for the Human Genome Project, and other genomic and -omics projects, as well as the translational efforts that link resulting molecular-level research to clinical problems and findings.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2147/IJN.S24582DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410693PMC
February 2013

e-MIR2: a public online inventory of medical informatics resources.

BMC Med Inform Decis Mak 2012 Aug 2;12:82. Epub 2012 Aug 2.

Biomedical Informatics Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, 28660 Madrid, Spain.

Background: Over the past years, the number of available informatics resources in medicine has grown exponentially. While specific inventories of such resources have already begun to be developed for Bioinformatics (BI), comparable inventories are as yet not available for the Medical Informatics (MI) field, so that locating and accessing them currently remains a difficult and time-consuming task.

Description: We have created a repository of MI resources from the scientific literature, providing free access to its contents through a web-based service. We define informatics resources as all those elements that constitute, serve to define or are used by informatics systems, ranging from architectures or development methodologies to terminologies, vocabularies, databases or tools. Relevant information describing the resources is automatically extracted from manuscripts published in top-ranked MI journals. We used a pattern matching approach to detect the resources' names and their main features. Detected resources are classified according to three different criteria: functionality, resource type and domain. To facilitate these tasks, we have built three different classification schemas by following a novel approach based on folksonomies and social tagging. We adopted the terminology most frequently used by MI researchers in their publications to create the concepts and hierarchical relationships belonging to the classification schemas. The classification algorithm identifies the categories associated with resources and annotates them accordingly. The database is then populated with this data after manual curation and validation.

Conclusions: We have created an online repository of MI resources to assist researchers in locating and accessing the most suitable resources to perform specific tasks. The database contains 609 resources at the time of writing and is available at http://www.gib.fi.upm.es/eMIR2. We are continuing to expand the number of available resources by taking into account further publications as well as suggestions from users and resource developers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1472-6947-12-82DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441434PMC
August 2012

CDAPubMed: a browser extension to retrieve EHR-based biomedical literature.

BMC Med Inform Decis Mak 2012 Apr 5;12:29. Epub 2012 Apr 5.

Biomedical Informatics Group, Facultad de Informática, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Campus de Montegancedo, s/n, 28660 Madrid, Spain.

Background: Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs.

Results: We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination.

Conclusions: CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1472-6947-12-29DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3366875PMC
April 2012

A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

Annu Int Conf IEEE Eng Med Biol Soc 2010 ;2010:1081-4

Biomedical Informatics Group and the Dept. Inteligencia Artificial, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Spain.

In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/IEMBS.2010.5627316DOI Listing
March 2011

PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

Bioinformatics 2010 Nov 9;26(21):2801-2. Epub 2010 Sep 9.

Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegacedo S/N, Boadilla del Monte, Madrid, Spain.

Summary: PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided.

Availability: PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btq520DOI Listing
November 2010

A method for automatically extracting infectious disease-related primers and probes from the literature.

BMC Bioinformatics 2010 Aug 3;11:410. Epub 2010 Aug 3.

Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain.

Background: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information.

Results: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name.

Conclusions: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-11-410DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923139PMC
August 2010

Nanoinformatics and DNA-based computing: catalyzing nanomedicine.

Pediatr Res 2010 May;67(5):481-9

Departamento de Inteligencia Artificial, Universidad Politecnica de Madrid, Madrid 28660 Spain.

Five decades of research and practical application of computers in biomedicine has given rise to the discipline of medical informatics, which has made many advances in genomic and translational medicine possible. Developments in nanotechnology are opening up the prospects for nanomedicine and regenerative medicine where informatics and DNA computing can become the catalysts enabling health care applications at sub-molecular or atomic scales. Although nanomedicine promises a new exciting frontier for clinical practice and biomedical research, issues involving cost-effectiveness studies, clinical trials and toxicity assays, drug delivery methods, and the implementation of new personalized therapies still remain challenging. Nanoinformatics can accelerate the introduction of nano-related research and applications into clinical practice, leading to an area that could be called "translational nanoinformatics." At the same time, DNA and RNA computing presents an entirely novel paradigm for computation. Nanoinformatics and DNA-based computing are together likely to completely change the way we model and process information in biomedicine and impact the emerging field of nanomedicine most strongly. In this article, we review work in nanoinformatics and DNA (and RNA)-based computing, including applications in nanopediatrics. We analyze their scientific foundations, current research and projects, envisioned applications and potential problems that might arise from them.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1203/PDR.0b013e3181d6245eDOI Listing
May 2010

BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature.

BMC Bioinformatics 2009 Oct 7;10:320. Epub 2009 Oct 7.

Dept Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain.

Background: The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities.

Results: We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted.

Conclusion: These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at http://edelman.dia.fi.upm.es/biri/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-10-320DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2765974PMC
October 2009