Publications by authors named "Dana Ludwig"

7 Publications

  • Page 1 of 1

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

NPJ Digit Med 2020 14;3:57. Epub 2020 Apr 14.

1Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA.

There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-020-0258-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7156708PMC
April 2020

Scalable and accurate deep learning with electronic health records.

NPJ Digit Med 2018 8;1:18. Epub 2018 May 8.

4Stanford University, Stanford, CA USA.

Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-018-0029-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550175PMC
May 2018

Using health-system-wide data to understand hepatitis B virus prophylaxis and reactivation outcomes in patients receiving rituximab.

Medicine (Baltimore) 2017 Mar;96(13):e6528

Division of Rheumatology, University of California-San Francisco Veterans Affairs Medical Center - San Francisco Center for Vulnerable Populations & Division of General Internal Medicine at the Zuckerberg San Francisco General Hospital, Department of Medicine, University of California-San Francisco University of California-San Francisco , Enterprise Information and Analytics Department of Epidemiology and Biostatistics, University of California-San Francisco Institute for Computational Health Sciences, University of California-San Francisco Center for Healthcare Value, Philip R. Lee Institute for Health Policy Studies, University of California-San Francisco Department of Medicine, University of California-San Francisco.

Hepatitis B virus (HBV) reactivation in the setting of rituximab use is a potentially fatal but preventable safety event. The rate of HBV screening and proportion of patients at risk who receive antiviral prophylaxis in patients initiating rituximab is unknown.We analyzed electronic health record (EHR) data from 2 health systems, a university center and a safety net health system, including diagnosis grouper codes, problem lists, medications, laboratory results, procedures codes, clinical encounter notes, and scanned documents. We identified all patients who received rituximab between 6/1/2012 and 1/1/2016. We calculated the proportion of rituximab users with inadequate screening for HBV according to the Centers for Disease Control guidelines for detecting latent HBV infection before their first rituximab infusion during the study period. We also assessed the proportion of patients with positive hepatitis B screening tests who were prescribed antiviral prophylaxis. Finally, we characterized safety failures and adverse events.We included 926 patients from the university and 132 patients from the safety net health system. Sixty-one percent of patients from the university had adequate screening for HBV compared with 90% from the safety net. Among patients at risk for reactivation based on results of HBV testing, 66% and 92% received antiviral prophylaxis at the university and safety net, respectively.We found wide variations in hepatitis B screening practices among patients receiving rituximab, resulting in unnecessary risks to patients. Interventions should be developed to improve patient safety procedures in this high-risk patient population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/MD.0000000000006528DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5380298PMC
March 2017

Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.

Genetics 2015 Aug 19;200(4):1051-60. Epub 2015 Jun 19.

Kaiser Permanente Northern California Division of Research, Oakland, California 94612.

The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California-San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1-95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.115.178905DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4574249PMC
August 2015

Automated Assay of Telomere Length Measurement and Informatics for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.

Genetics 2015 Aug 19;200(4):1061-72. Epub 2015 Jun 19.

Department of Biochemistry and Biophysics, University of California, San Francisco, California 94158-2517

The Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort includes DNA specimens extracted from saliva samples of 110,266 individuals. Because of its relationship to aging, telomere length measurement was considered an important biomarker to develop on these subjects. To assay relative telomere length (TL) on this large cohort over a short time period, we created a novel high throughput robotic system for TL analysis and informatics. Samples were run in triplicate, along with control samples, in a randomized design. As part of quality control, we determined the within-sample variability and employed thresholds for the elimination of outlying measurements. Of 106,902 samples assayed, 105,539 (98.7%) passed all quality control (QC) measures. As expected, TL in general showed a decline with age and a sex difference. While telomeres showed a negative correlation with age up to 75 years, in those older than 75 years, age positively correlated with longer telomeres, indicative of an association of longer telomeres with more years of survival in those older than 75. Furthermore, while females in general had longer telomeres than males, this difference was significant only for those older than age 50. An additional novel finding was that the variance of TL between individuals increased with age. This study establishes reliable assay and analysis methodologies for measurement of TL in large, population-based human studies. The GERA cohort represents the largest currently available such resource, linked to comprehensive electronic health and genotype data for analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.115.178624DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4574243PMC
August 2015

Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.

Genetics 2015 Aug 19;200(4):1285-95. Epub 2015 Jun 19.

Institute for Human Genetics, University of California, San Francisco, California 94143-0794 Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94158-2549 Kaiser Permanente Northern California Division of Research, Oakland, California 94612-2304

Using genome-wide genotypes, we characterized the genetic structure of 103,006 participants in the Kaiser Permanente Northern California multi-ethnic Genetic Epidemiology Research on Adult Health and Aging Cohort and analyzed the relationship to self-reported race/ethnicity. Participants endorsed any of 23 race/ethnicity/nationality categories, which were collapsed into seven major race/ethnicity groups. By self-report the cohort is 80.8% white and 19.2% minority; 93.8% endorsed a single race/ethnicity group, while 6.2% endorsed two or more. Principal component (PC) and admixture analyses were generally consistent with prior studies. Approximately 17% of subjects had genetic ancestry from more than one continent, and 12% were genetically admixed, considering only nonadjacent geographical origins. Self-reported whites were spread on a continuum along the first two PCs, indicating extensive mixing among European nationalities. Self-identified East Asian nationalities correlated with genetic clustering, consistent with extensive endogamy. Individuals of mixed East Asian-European genetic ancestry were easily identified; we also observed a modest amount of European genetic ancestry in individuals self-identified as Filipinos. Self-reported African Americans and Latinos showed extensive European and African genetic ancestry, and Native American genetic ancestry for the latter. Among 3741 genetically identified parent-child pairs, 93% were concordant for self-reported race/ethnicity; among 2018 genetically identified full-sib pairs, 96% were concordant; the lower rate for parent-child pairs was largely due to intermarriage. The parent-child pairs revealed a trend toward increasing exogamy over time; the presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.115.178616DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4574246PMC
August 2015

Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array.

Genomics 2011 Aug 30;98(2):79-89. Epub 2011 Apr 30.

Institute for Human Genetics, University of California, San Francisco 94143-0794, CA, USA.

The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ygeno.2011.04.005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146553PMC
August 2011
-->