Publications by authors named "Frank J Manion"

16 Publications

  • Page 1 of 1

COVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model.

J Am Med Inform Assoc 2021 Mar 1. Epub 2021 Mar 1.

Melax Technologies, Inc, Houston, Texas, USA.

The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocab015DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7989301PMC
March 2021

Quantitative Imaging Assessment for Clinical Trials in Oncology.

J Natl Compr Canc Netw 2019 12;17(12):1505-1511

aDepartment of Internal Medicine, University of Michigan Medical School.

Background: Objective radiographic assessment is crucial for accurately evaluating therapeutic efficacy and patient outcomes in oncology clinical trials. Imaging assessment workflow can be complex; can vary with institution; may burden medical oncologists, who are often inadequately trained in radiology and response criteria; and can lead to high interobserver variability and investigator bias. This article reviews the development of a tumor response assessment core (TRAC) at a comprehensive cancer center with the goal of providing standardized, objective, unbiased tumor imaging assessments, and highlights the web-based platform and overall workflow. In addition, quantitative response assessments by the medical oncologists, radiologist, and TRAC are compared in a retrospective cohort of patients to determine concordance.

Patients And Methods: The TRAC workflow includes an image analyst who pre-reviews scans before review with a board-certified radiologist and then manually uploads annotated data on the proprietary TRAC web portal. Patients previously enrolled in 10 lung cancer clinical trials between January 2005 and December 2015 were identified, and the prospectively collected quantitative response assessments by the medical oncologists were compared with retrospective analysis of the same dataset by a radiologist and TRAC.

Results: This study enlisted 49 consecutive patients (53% female) with a median age of 60 years (range, 29-78 years); 2 patients did not meet study criteria and were excluded. A linearly weighted kappa test for concordance for TRAC versus radiologist was substantial at 0.65 (95% CI, 0.46-0.85; standard error [SE], 0.10). The kappa value was moderate at 0.42 (95% CI, 0.20-0.64; SE, 0.11) for TRAC versus oncologists and only fair at 0.34 (95% CI, 0.12-0.55; SE, 0.11) for oncologists versus radiologist.

Conclusions: Medical oncologists burdened with the task of tumor measurements in patients on clinical trials may introduce significant variability and investigator bias, with the potential to affect therapeutic response and clinical trial outcomes. Institutional imaging cores may help bridge the gap by providing unbiased and reproducible measurements and enable a leaner workflow.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.6004/jnccn.2019.7331DOI Listing
December 2019

Novel Common Genetic Susceptibility Loci for Colorectal Cancer.

J Natl Cancer Inst 2019 02;111(2):146-157

Division of Research, Kaiser Permanente Medical Care Program of Northern California, Oakland, CA.

Background: Previous genome-wide association studies (GWAS) have identified 42 loci (P < 5 × 10-8) associated with risk of colorectal cancer (CRC). Expanded consortium efforts facilitating the discovery of additional susceptibility loci may capture unexplained familial risk.

Methods: We conducted a GWAS in European descent CRC cases and control subjects using a discovery-replication design, followed by examination of novel findings in a multiethnic sample (cumulative n = 163 315). In the discovery stage (36 948 case subjects/30 864 control subjects), we identified genetic variants with a minor allele frequency of 1% or greater associated with risk of CRC using logistic regression followed by a fixed-effects inverse variance weighted meta-analysis. All novel independent variants reaching genome-wide statistical significance (two-sided P < 5 × 10-8) were tested for replication in separate European ancestry samples (12 952 case subjects/48 383 control subjects). Next, we examined the generalizability of discovered variants in East Asians, African Americans, and Hispanics (12 085 case subjects/22 083 control subjects). Finally, we examined the contributions of novel risk variants to familial relative risk and examined the prediction capabilities of a polygenic risk score. All statistical tests were two-sided.

Results: The discovery GWAS identified 11 variants associated with CRC at P < 5 × 10-8, of which nine (at 4q22.2/5p15.33/5p13.1/6p21.31/6p12.1/10q11.23/12q24.21/16q24.1/20q13.13) independently replicated at a P value of less than .05. Multiethnic follow-up supported the generalizability of discovery findings. These results demonstrated a 14.7% increase in familial relative risk explained by common risk alleles from 10.3% (95% confidence interval [CI] = 7.9% to 13.7%; known variants) to 11.9% (95% CI = 9.2% to 15.5%; known and novel variants). A polygenic risk score identified 4.3% of the population at an odds ratio for developing CRC of at least 2.0.

Conclusions: This study provides insight into the architecture of common genetic variation contributing to CRC etiology and improves risk prediction for individualized screening.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jnci/djy099DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6555904PMC
February 2019

Expressing Biomedical Ontologies in Natural Language for Expert Evaluation.

Stud Health Technol Inform 2017 ;245:838-842

School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas, United States.

We report on a study of our custom Hootation software for the purposes of assessing its ability to produce clear and accurate natural language phrases from axioms embedded in three biomedical ontologies. Using multiple domain experts and three discrete rating scales, we evaluated the tool on clarity of the natural language produced, fidelity of the natural language produced from the ontology to the axiom, and the fidelity of the domain knowledge represented by the axioms. Results show that Hootation provided relatively clear natural language equivalents for a select set of OWL axioms, although the clarity of statements hinges on the accuracy and representation of axioms in the ontology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6644701PMC
June 2018

Genome-wide association study of colorectal cancer identifies six new susceptibility loci.

Nat Commun 2015 Jul 7;6:7138. Epub 2015 Jul 7.

Harvard Medical School, Boston, Massachusetts 02114, USA.

Genetic susceptibility to colorectal cancer is caused by rare pathogenic mutations and common genetic variants that contribute to familial risk. Here we report the results of a two-stage association study with 18,299 cases of colorectal cancer and 19,656 controls, with follow-up of the most statistically significant genetic loci in 4,725 cases and 9,969 controls from two Asian consortia. We describe six new susceptibility loci reaching a genome-wide threshold of P<5.0E-08. These findings provide additional insight into the underlying biological mechanisms of colorectal cancer and demonstrate the scientific value of large consortia-based genetic epidemiology studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms8138DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4967357PMC
July 2015

Hedging their mets: the use of uncertainty terms in clinical documents and its potential implications when sharing the documents with patients.

AMIA Annu Symp Proc 2012 3;2012:321-30. Epub 2012 Nov 3.

Department of Pediatrics, Univ. of Michigan, Ann Arbor, MI, USA.

In this study, we quantified the use of uncertainty expressions, referred to as 'hedge' phrases, among a corpus of 100,000 clinical documents retrieved from our institution's electronic health record system. The frequency of each hedge phrase appearing in the corpus was characterized across document types and clinical departments. We also used a natural language processing tool to identify clinical concepts that were spatially, and potentially semantically, associated with the hedge phrases identified. The objective was to delineate the prevalence of hedge phrase usage in clinical documentation which may have a profound impact on patient care and provider-patient communication, and may become a source of unintended consequences when such documents are made directly accessible to patients via patient portals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3540426PMC
July 2013

Leveraging EHR data for outcomes and comparative effectiveness research in oncology.

Curr Oncol Rep 2012 Dec;14(6):494-501

University of Michigan Comprehensive Cancer Center, 1600 Huron Parkway, SPC 2800, Ann Arbor, MI 48109-2800, USA.

Along with the increasing adoption of electronic health records (EHRs) are expectations that data collected within EHRs will be readily available for outcomes and comparative effectiveness research. Yet the ability to effectively share and reuse data depends on implementing and configuring EHRs with these goals in mind from the beginning. Data sharing and integration must be planned both locally as well as nationally. The rich data transmission and semantic infrastructure developed by the National Cancer Institute (NCI) for research provides an excellent example of moving beyond paper-based paradigms and exploiting the power of semantically robust, network-based systems, and engaging both domain and informatics expertise. Similar efforts are required to address current challenges in sharing EHR data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s11912-012-0272-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3490017PMC
December 2012

Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing.

AMIA Annu Symp Proc 2011 22;2011:1630-8. Epub 2011 Oct 22.

School of Public Health Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA.

In this study, we comparatively examined the linguistic properties of narrative clinician notes created through voice dictation versus those directly entered by clinicians via a computer keyboard. Intuitively, the nature of voice-dictated notes would resemble that of natural language, while typed-in notes may demonstrate distinctive language features for reasons such as intensive usage of acronyms. The study analyses were based on an empirical dataset retrieved from our institutional electronic health records system. The dataset contains 30,000 voice-dictated notes and 30,000 notes that were entered manually; both were encounter notes generated in ambulatory care settings. The results suggest that between the narrative clinician notes created via these two different methods, there exists a considerable amount of lexical and distributional differences. Such differences could have a significant impact on the performance of natural language processing tools, necessitating these two different types of documents being differentially treated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243272PMC
February 2013

Delivery of Internet-based cancer genetic counselling services to patients' homes: a feasibility study.

J Telemed Telecare 2011 19;17(1):36-40. Epub 2010 Nov 19.

University Hospitals Case Medical Center, Case Comprehensive Cancer Center, Case Western Reserve University, 11100 Euclid Avenue, Lakeside 1200, Cleveland, OH 44106-5065, USA.

We examined the feasibility of home videoconferencing for providing cancer genetic education and risk information to people at risk. Adults with possible hereditary colon or breast and ovarian cancer syndromes were offered Internet-based counselling. Participants were sent web cameras and software to install on their home PCs. They watched a prerecorded educational video and then took part in a live counselling session with a genetic counsellor. A total of 31 participants took part in Internet counselling sessions. Satisfaction with counselling was high in all domains studied, including technical (mean 4.3 on a 1-5 scale), education (mean 4.7), communication (mean 4.8), psychosocial (mean 4.1) and overall (mean 4.2). Qualitative data identified technical aspects that could be improved. All participants reported that they would recommend Internet-based counselling to others. Internet-based genetic counselling is feasible and associated with a high level of satisfaction among participants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1258/jtt.2010.100116DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3263376PMC
May 2011

FuGEFlow: data model and markup language for flow cytometry.

BMC Bioinformatics 2009 Jun 16;10:184. Epub 2009 Jun 16.

Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA.

Background: Flow cytometry technology is widely used in both health care and research. The rapid expansion of flow cytometry applications has outpaced the development of data storage and analysis tools. Collaborative efforts being taken to eliminate this gap include building common vocabularies and ontologies, designing generic data models, and defining data exchange formats. The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard was recently adopted by the International Society for Advancement of Cytometry. This standard guides researchers on the information that should be included in peer reviewed publications, but it is insufficient for data exchange and integration between computational systems. The Functional Genomics Experiment (FuGE) formalizes common aspects of comprehensive and high throughput experiments across different biological technologies. We have extended FuGE object model to accommodate flow cytometry data and metadata.

Methods: We used the MagicDraw modelling tool to design a UML model (Flow-OM) according to the FuGE extension guidelines and the AndroMDA toolkit to transform the model to a markup language (Flow-ML). We mapped each MIFlowCyt term to either an existing FuGE class or to a new FuGEFlow class. The development environment was validated by comparing the official FuGE XSD to the schema we generated from the FuGE object model using our configuration. After the Flow-OM model was completed, the final version of the Flow-ML was generated and validated against an example MIFlowCyt compliant experiment description.

Results: The extension of FuGE for flow cytometry has resulted in a generic FuGE-compliant data model (FuGEFlow), which accommodates and links together all information required by MIFlowCyt. The FuGEFlow model can be used to build software and databases using FuGE software toolkits to facilitate automated exchange and manipulation of potentially large flow cytometry experimental data sets. Additional project documentation, including reusable design patterns and a guide for setting up a development environment, was contributed back to the FuGE project.

Conclusion: We have shown that an extension of FuGE can be used to transform minimum information requirements in natural language to markup language in XML. Extending FuGE required significant effort, but in our experiences the benefits outweighed the costs. The FuGEFlow is expected to play a central role in describing flow cytometry experiments and ultimately facilitating data exchange including public flow cytometry repositories currently under development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-10-184DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2711079PMC
June 2009

Security and privacy requirements for a multi-institutional cancer research data grid: an interview-based study.

BMC Med Inform Decis Mak 2009 Jun 15;9:31. Epub 2009 Jun 15.

Information Science and Technology, Fox Chase Cancer Center, Philadelphia, PA, USA.

Background: Data protection is important for all information systems that deal with human-subjects data. Grid-based systems--such as the cancer Biomedical Informatics Grid (caBIG)--seek to develop new mechanisms to facilitate real-time federation of cancer-relevant data sources, including sources protected under a variety of regulatory laws, such as HIPAA and 21CFR11. These systems embody new models for data sharing, and hence pose new challenges to the regulatory community, and to those who would develop or adopt them. These challenges must be understood by both systems developers and system adopters. In this paper, we describe our work collecting policy statements, expectations, and requirements from regulatory decision makers at academic cancer centers in the United States. We use these statements to examine fundamental assumptions regarding data sharing using data federations and grid computing.

Methods: An interview-based study of key stakeholders from a sample of US cancer centers. Interviews were structured, and used an instrument that was developed for the purpose of this study. The instrument included a set of problem scenarios--difficult policy situations that were derived during a full-day discussion of potentially problematic issues by a set of project participants with diverse expertise. Each problem scenario included a set of open-ended questions that were designed to elucidate stakeholder opinions and concerns. Interviews were transcribed verbatim and used for both qualitative and quantitative analysis. For quantitative analysis, data was aggregated at the individual or institutional unit of analysis, depending on the specific interview question.

Results: Thirty-one (31) individuals at six cancer centers were contacted to participate. Twenty-four out of thirty-one (24/31) individuals responded to our request- yielding a total response rate of 77%. Respondents included IRB directors and policy-makers, privacy and security officers, directors of offices of research, information security officers and university legal counsel. Nineteen total interviews were conducted over a period of 16 weeks. Respondents provided answers for all four scenarios (a total of 87 questions). Results were grouped by broad themes, including among others: governance, legal and financial issues, partnership agreements, de-identification, institutional technical infrastructure for security and privacy protection, training, risk management, auditing, IRB issues, and patient/subject consent.

Conclusion: The findings suggest that with additional work, large scale federated sharing of data within a regulated environment is possible. A key challenge is developing suitable models for authentication and authorization practices within a federated environment. Authentication--the recognition and validation of a person's identity--is in fact a global property of such systems, while authorization - the permission to access data or resources--mimics data sharing agreements in being best served at a local level. Nine specific recommendations result from the work and are discussed in detail. These include: (1) the necessity to construct separate legal or corporate entities for governance of federated sharing initiatives on this scale; (2) consensus on the treatment of foreign and commercial partnerships; (3) the development of risk models and risk management processes; (4) development of technical infrastructure to support the credentialing process associated with research including human subjects; (5) exploring the feasibility of developing large-scale, federated honest broker approaches; (6) the development of suitable, federated identity provisioning processes to support federated authentication and authorization; (7) community development of requisite HIPAA and research ethics training modules by federation members; (8) the recognition of the need for central auditing requirements and authority, and; (9) use of two-protocol data exchange models where possible in the federation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1472-6947-9-31DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2709611PMC
June 2009

Integration of prostate cancer clinical data using an ontology.

J Biomed Inform 2009 Dec 2;42(6):1035-45. Epub 2009 Jun 2.

Fox Chase Cancer Center, Philadelphia, PA 19111, USA.

It is increasingly important for investigators to efficiently and effectively access, interpret, and analyze the data from diverse biological, literature, and annotation sources in a unified way. The heterogeneity of biomedical data and the lack of metadata are the primary sources of the difficulty for integration, presenting major challenges to effective search and retrieval of the information. As a proof of concept, the Prostate Cancer Ontology (PCO) is created for the development of the Prostate Cancer Information System (PCIS). PCIS is applied to demonstrate how the ontology is utilized to solve the semantic heterogeneity problem from the integration of two prostate cancer related database systems at the Fox Chase Cancer Center. As the results of the integration process, the semantic query language SPARQL is applied to perform the integrated queries across the two database systems based on PCO.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2009.05.007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2784120PMC
December 2009

WaveRead: automatic measurement of relative gene expression levels from microarrays using wavelet analysis.

J Biomed Inform 2006 Aug 15;39(4):379-88. Epub 2005 Nov 15.

Bioinformatics, Division of Population Science, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111-2497, USA.

Gene expression microarrays monitor the expression levels of thousands of genes in an experiment simultaneously. To utilize the information generated, each of the thousands of spots on a microarray image must be properly quantified, including background correction. Most present methods require manual alignment of grids to the image data, and still often require additional minor adjustments on a spot by spot basis to correct for spotting irregularities. Such intervention is time consuming and also introduces inconsistency in the handling of data. A fully automatic, tested system would increase throughput and reliability in this field. In this paper, we describe WaveRead, a fully automated, standalone, open-source system for quantifying gene expression array images. Through the use of wavelet analysis to identify the spot locations and diameters, the system is able to automatically grid the image and quantify signal intensities and background corrections without any user intervention. The ability of WaveRead to perform proper quantification is demonstrated by analysis of both simulated images containing spots with donut shapes, elliptical shapes, and Gaussian intensity distributions, as well as of standard images from the National Cancer Institute.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2005.10.001DOI Listing
August 2006

FGDP: functional genomics data pipeline for automated, multiple microarray data analyses.

Bioinformatics 2004 Jan;20(2):282-3

Bioinformatics, Department of Information Science and Technology, Division of Basic Science, Fox Chase Cancer Center, 7701 Burholme Avenue, Philadelphia, PA 19111, USA.

Unlabelled: Gene expression microarrays and oligonucleotide GeneChips have provided biologists with a means of measuring, in a single experiment, the expression levels of entire genomes under a variety of conditions. As with any nascent field, there is no single accepted method for analyzing the new data types, with new methods appearing monthly. Investigators using the new technology must constantly seek access to the latest tools and explore their data in multiple ways. The functional genomics data pipeline provides an integrated, extendable analysis environment permitting multiple, simultaneous analyses to be automatically performed and provides a web server and interface for presenting results.

Availability: Source code and executables are available under the GNU public license at http://bioinformatics.fccc.edu/
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btg407DOI Listing
January 2004

ASAP: automated sequence annotation pipeline for web-based updating of sequence information with a local dynamic database.

Bioinformatics 2003 Mar;19(5):675-6

Cybernetics Department, Moscow Engineering Physics Institute, Moscow, Russian Federation.

The automated sequence annotation pipeline (ASAP) is designed to ease routine investigation of new functional annotations on unknown sequences, such as expressed sequence tags (ESTs), through querying of web-accessible resources and maintenance of a local database. The system allows easy use of the output from one search as the input for a new search, as well as the filtering of results. The database is used to store formats and parameters and information for parsing data from web sites. The database permits easy updating of format information should a site modify the format of a query or of a returned web page.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btg056DOI Listing
March 2003