Publications by authors named "Gail L Rosen"

32 Publications

Critical Assessment of Metagenome Interpretation: the second round of challenges.

Nat Methods 2022 04 8;19(4):429-440. Epub 2022 Apr 8.

University of California, Davis, Davis, CA, USA.

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-022-01431-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9007738PMC
April 2022

Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences.

mSystems 2022 Apr 21;7(2):e0003522. Epub 2022 Mar 21.

Drexel Universitygrid.166341.7, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA.

Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces "black box" models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/msystems.00035-22DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9040592PMC
April 2022

MetaMutationalSigs: Comparison of mutational signature refitting results made easy.

Bioinformatics 2022 02 14. Epub 2022 Feb 14.

Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA, USA.

Motivation: The analysis of mutational signatures is becoming increasingly common in cancer genetics, with emerging implications in cancer evolution, classification, treatment decision and prognosis. Recently, several packages have been developed for mutational signature analysis, with each using different methodology and yielding significantly different results. Because of the nontrivial differences in tools' refitting results, researchers may desire to survey and compare the available tools, in order to objectively evaluate the results for their specific research question, such as which mutational signatures are prevalent in different cancer types.

Results: Due to the need for effective comparison of refitting mutational signatures, we introduce a user-friendly software that can aggregate and visually present results from different refitting packages.

Availability: MetaMutationalSigs is implemented using R and python and is available for installation using Docker and available at: https://github.com/EESI/MetaMutationalSigs.

Supplementary Information: More information about the package including test data and results are available at https://github.com/EESI/MetaMutationalSigs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btac091DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004636PMC
February 2022

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network.

PLoS Comput Biol 2021 09 22;17(9):e1009345. Epub 2021 Sep 22.

Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1009345DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496832PMC
September 2021

Examining Cultural Structures and Functions in Biology.

Integr Comp Biol 2022 02;61(6):2282-2293

Friday Harbor Laboratories, University of Washington, Friday Harbor, WA 98250, USA.

Scientific culture and structure organize biological sciences in many ways. We make choices concerning the systems and questions we study. Our research then amplifies these choices into factors that influence the directions of future research by shaping our hypotheses, data analyses, interpretation, publication venues, and dissemination via other methods. But our choices are shaped by more than objective curiosity-we are influenced by cultural paradigms reinforced by societal upbringing and scientific indoctrination during training. This extends to the systems and data that we consider to be ethically obtainable or available for study, and who is considered qualified to do research, ask questions, and communicate about research. It is also influenced by the profitability of concepts like open-access-a system designed to improve equity, but which enacts gatekeeping in unintended but foreseeable ways. Creating truly integrative biology programs will require more than intentionally developing departments or institutes that allow overlapping expertise in two or more subfields of biology. Interdisciplinary work requires the expertise of large and diverse teams of scientists working together-this is impossible without an authentic commitment to addressing, not denying, racism when practiced by individuals, institutions, and cultural aspects of academic science. We have identified starting points for remedying how our field has discouraged and caused harm, but we acknowledge there is a long path forward. This path must be paved with field-wide solutions and institutional buy-in: our solutions must match the scale of the problem. Together, we can integrate-not reintegrate-the nuances of biology into our field.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/icb/icab140DOI Listing
February 2022

Teaching Microbiome Analysis: From Design to Computation Through Inquiry.

Front Microbiol 2020 29;11:528051. Epub 2020 Oct 29.

School of Education, Drexel University, Philadelphia, PA, United States.

In this article, we present our three-class course sequence to educate students about microbiome analysis and metagenomics through experiential learning by taking them from inquiry to analysis of the microbiome: Molecular Ecology Lab, Bioinformatics, and Computational Microbiome Analysis. Students developed hypotheses, designed lab experiments, sequenced the DNA from microbiomes, learned basic python/R scripting, became proficient in at least one microbiome analysis software, and were able to analyze data generated from the microbiome experiments. While over 150 students (graduate and undergraduate) were impacted by the development of the series of courses, our assessment was only on undergraduate learning, where 45 students enrolled in at least one of the three courses and 4 students took all three. Students gained skills in bioinformatics through the courses, and several positive comments were received through surveys and private correspondence. Through a summative assessment, general trends show that students became more proficient in comparative genomic techniques and had positive attitudes toward their abilities to bridge biology and bioinformatics. While most students took individual or 2 of the courses, we show that pre- and post-surveys of these individual classes still showed progress toward learning objectives. It is expected that students trained will enter the workforce with skills needed to innovate in the biotechnology, health, and environmental industries. Students are trained to maximize impact and tackle real world problems in biology and medicine with their learned knowledge of data science and machine learning. The course materials for the new microbiome analysis course are available on Github: https://github.com/EESI/Comp_Metagenomics_resources.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmicb.2020.528051DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7658192PMC
October 2020

Healthcare Shift Workers' Temporal Habits for Eating, Sleeping, and Light Exposure: A Multi-Instrument Pilot Study.

J Circadian Rhythms 2020 Oct 21;18. Epub 2020 Oct 21.

School of Nursing, University at Buffalo, Buffalo, NY, US.

Background: Circadian misalignment can impair healthcare shift workers' physical and mental health, resulting in sleep deprivation, obesity, and chronic disease. This multidisciplinary research team assessed eating patterns and sleep/physical activity of healthcare workers on three different shifts (day, night, and rotating-shift). To date, no study of real-world shift workers' daily eating and sleep has utilized a largely-objective measurement.

Method: During this fourteen-day observational study, participants wore two devices (Actiwatch and Bite Technologies counter) to measure physical activity, sleep, light exposure, and eating time. Participants also reported food intake via food diaries on personal mobile devices.

Results: In fourteen (5 day-, 5 night-, and 4 rotating-shift) participants, no baseline difference in BMI was observed. Overall, rotating-shift workers consumed fewer calories and had less activity and sleep than day- and night-shift workers. For eating patterns, compared to night- and rotating-shift, day-shift workers ate more frequently during work days. Night workers, however, consumed more calories at work relative to day and rotating workers. For physical activity and sleep, night-shift workers had the highest activity and least sleep on work days.

Conclusion: This pilot study utilized primarily objective measurement to examine shift workers' habits outside the laboratory. Although no association between BMI and eating patterns/activity/sleep was observed across groups, a small, homogeneous sample may have influenced this. Overall, shift work was associated with 1) increased calorie intake and higher-fat and -carbohydrate diets and 2) sleep deprivation. A larger, more diverse sample can participate in future studies that objectively measure shift workers' real-world habits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5334/jcr.199DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7583716PMC
October 2020

Amino Acid -mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights.

Biology (Basel) 2020 Oct 28;9(11). Epub 2020 Oct 28.

Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA 19104, USA.

Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide -mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid -mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide -mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/biology9110365DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7694136PMC
October 2020

Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization.

PLoS Comput Biol 2020 09 17;16(9):e1008269. Epub 2020 Sep 17.

Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA, USA.

We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies. However, identifying viral subtypes in real-time is challenging: SARS-CoV-2 is a novel virus, and the pandemic is rapidly expanding. Viral subtypes may be difficult to detect due to rapid evolution; founder effects are more significant than selection pressure; and the clustering threshold for subtyping is not standardized. We propose to identify mutational signatures of available SARS-CoV-2 sequences using a population-based approach: an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Through ISM compression, we find that certain distant nucleotide variants covary, including non-coding and ORF1ab sites covarying with the D614G spike protein mutation which has become increasingly prevalent as the pandemic has spread. ISMs are also useful for downstream analyses, such as spatiotemporal visualization of viral dynamics. By analyzing sequence data available in the GISAID database, we validate the utility of ISM-based subtyping by comparing spatiotemporal analyses using ISMs to epidemiological studies of viral transmission in Asia, Europe, and the United States. In addition, we show the relationship of ISMs to phylogenetic reconstructions of SARS-CoV-2 evolution, and therefore, ISMs can play an important complementary role to phylogenetic tree-based analysis, such as is done in the Nextstrain project. The developed pipeline dynamically generates ISMs for newly added SARS-CoV-2 sequences and updates the visualization of pandemic spatiotemporal dynamics, and is available on Github at https://github.com/EESI/ISM (Jupyter notebook), https://github.com/EESI/ncov_ism (command line tool) and via an interactive website at https://covid19-ism.coe.drexel.edu/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1008269DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523987PMC
September 2020

Emerging Priorities for Microbiome Research.

Front Microbiol 2020 19;11:136. Epub 2020 Feb 19.

School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States.

Microbiome research has increased dramatically in recent years, driven by advances in technology and significant reductions in the cost of analysis. Such research has unlocked a wealth of data, which has yielded tremendous insight into the nature of the microbial communities, including their interactions and effects, both within a host and in an external environment as part of an ecological community. Understanding the role of microbiota, including their dynamic interactions with their hosts and other microbes, can enable the engineering of new diagnostic techniques and interventional strategies that can be used in a diverse spectrum of fields, spanning from ecology and agriculture to medicine and from forensics to exobiology. From June 19-23 in 2017, the NIH and NSF jointly held an Innovation Lab on . This review is inspired by some of the topics that arose as priority areas from this unique, interactive workshop. The goal of this review is to summarize the Innovation Lab's findings by introducing the reader to emerging challenges, exciting potential, and current directions in microbiome research. The review is broken into five key topic areas: (1) interactions between microbes and the human body, (2) evolution and ecology of microbes, including the role played by the environment and microbe-microbe interactions, (3) analytical and mathematical methods currently used in microbiome research, (4) leveraging knowledge of microbial composition and interactions to develop engineering solutions, and (5) interventional approaches and engineered microbiota that may be enabled by selectively altering microbial composition. As such, this review seeks to arm the reader with a broad understanding of the priorities and challenges in microbiome research today and provide inspiration for future investigation and multi-disciplinary collaboration.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmicb.2020.00136DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7042322PMC
February 2020

Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.

PLoS One 2019 11;14(12):e0219235. Epub 2019 Dec 11.

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.

Analysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state). Elucidating such relations is often difficult as microbiome data are compositional, sparse, and have high dimensionality. Also, the configuration of co-occurring taxa may represent overlapping subcommunities that contribute to sample characteristics such as host status. Preserving the configuration of co-occurring microbes rather than detecting specific indicator species is more likely to facilitate biologically meaningful interpretations. Additionally, analyses that use taxonomic relative abundances to predict the abundances of different gene functions aggregate predicted functional profiles across taxa. This precludes straightforward identification of predicted functional components associated with subsets of co-occurring taxa. We provide an approach to explore co-occurring taxa using "topics" generated via a topic model and link these topics to specific sample features (e.g., disease state). Rather than inferring predicted functional content based on overall taxonomic relative abundances, we instead focus on inference of functional content within topics, which we parse by estimating interactions between topics and pathways through a multilevel, fully Bayesian regression model. We apply our methods to three publicly available 16S amplicon sequencing datasets: an inflammatory bowel disease dataset, an oral cancer dataset, and a time-series dataset. Using our topic model approach to uncover latent structure in 16S rRNA amplicon surveys, investigators can (1) capture groups of co-occurring taxa termed topics; (2) uncover within-topic functional potential; (3) link taxa co-occurrence, gene function, and environmental/host features; and (4) explore the way in which sets of co-occurring taxa behave and evolve over time. These methods have been implemented in a freely available R package: https://cran.r-project.org/package=themetagenomics, https://github.com/EESI/themetagenomics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0219235PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6905537PMC
March 2020

Correction to: Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

Genome Biol 2019 04 5;20(1):72. Epub 2019 Apr 5.

Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.

Following publication of the original article [1], the authors would like to highlight the following two corrections.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1687-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450011PMC
April 2019

16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.

PLoS Comput Biol 2019 02 26;15(2):e1006721. Epub 2019 Feb 26.

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding ("embedding") each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1006721DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6407789PMC
February 2019

Opportunities and obstacles for deep learning in biology and medicine.

J R Soc Interface 2018 04;15(141)

Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA.

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1098/rsif.2017.0387DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5938574PMC
April 2018

Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

Genome Biol 2017 09 21;18(1):182. Epub 2017 Sep 21.

Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.

Background: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited.

Results: In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages.

Conclusions: This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-017-1299-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5609029PMC
September 2017

Nanopore sequencing in microgravity.

NPJ Microgravity 2016 20;2:16035. Epub 2016 Oct 20.

Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, USA.

Rapid DNA sequencing and analysis has been a long-sought goal in remote research and point-of-care medicine. In microgravity, DNA sequencing can facilitate novel astrobiological research and close monitoring of crew health, but spaceflight places stringent restrictions on the mass and volume of instruments, crew operation time, and instrument functionality. The recent emergence of portable, nanopore-based tools with streamlined sample preparation protocols finally enables DNA sequencing on missions in microgravity. As a first step toward sequencing in space and aboard the International Space Station (ISS), we tested the Oxford Nanopore Technologies MinION during a parabolic flight to understand the effects of variable gravity on the instrument and data. In a successful proof-of-principle experiment, we found that the instrument generated DNA reads over the course of the flight, including the first ever sequenced in microgravity, and additional reads measured after the flight concluded its parabolas. Here we detail modifications to the sample-loading procedures to facilitate nanopore sequencing aboard the ISS and in other microgravity environments. We also evaluate existing analysis methods and outline two new approaches, the first based on a wave-fingerprint method and the second on entropy signal mapping. Computationally light analysis methods offer the potential for species identification, but are limited by the error profiles (stays, skips, and mismatches) of older nanopore data. Higher accuracies attainable with modified sample processing methods and the latest version of flow cells will further enable the use of nanopore sequencers for diagnostics and research in space.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/npjmgrav.2016.35DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5515536PMC
October 2016

Fizzy: feature subset selection for metagenomics.

BMC Bioinformatics 2015 Nov 4;16:358. Epub 2015 Nov 4.

Department of Electrical & Computer Engineering, Drexel University, 3141 Chestnut St., Philadelphia, 19104, PA, USA.

Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome.

Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets.

Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-015-0793-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634798PMC
November 2015

A toolkit for ARB to integrate custom databases and externally built phylogenies.

PLoS One 2015 21;10(1):e0109277. Epub 2015 Jan 21.

Department of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.

Unlabelled: Researchers are perpetually amassing biological sequence data. The computational approaches employed by ecologists for organizing this data (e.g. alignment, phylogeny, etc.) typically scale nonlinearly in execution time with the size of the dataset. This often serves as a bottleneck for processing experimental data since many molecular studies are characterized by massive datasets. To keep up with experimental data demands, ecologists are forced to choose between continually upgrading expensive in-house computer hardware or outsourcing the most demanding computations to the cloud. Outsourcing is attractive since it is the least expensive option, but does not necessarily allow direct user interaction with the data for exploratory analysis. Desktop analytical tools such as ARB are indispensable for this purpose, but they do not necessarily offer a convenient solution for the coordination and integration of datasets between local and outsourced destinations. Therefore, researchers are currently left with an undesirable tradeoff between computational throughput and analytical capability. To mitigate this tradeoff we introduce a software package to leverage the utility of the interactive exploratory tools offered by ARB with the computational throughput of cloud-based resources. Our pipeline serves as middleware between the desktop and the cloud allowing researchers to form local custom databases containing sequences and metadata from multiple resources and a method for linking data outsourced for computation back to the local database. A tutorial implementation of the toolkit is provided in the supporting information, S1 Tutorial.

Availability: http://www.ece.drexel.edu/gailr/EESI/tutorial.php.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0109277PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4301908PMC
December 2015

Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.

J Comput Biol 2014 Oct 30;21(10):732-40. Epub 2014 Jul 30.

1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia, Pennsylvania.

Characterizing the empirical distribution of the frequency of n-mers is a vital step in understanding the entire genome. This will allow for researchers to examine how complex the genome really is, and move beyond simple, traditional modeling frameworks that are often biased in the presence of abundant and/or extremely rare words. We hypothesize that models based on the negative binomial distribution and its zero-inflated counterpart will characterize the n-mer distributions of genomes better than the Poisson. Our study examined the empirical distribution of the frequency of n-mers (6 ≤ n ≤ 11) in 2,199 genomes. We considered four distributions: Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (ZINB). The number of genomes that have nullomers in 6-, 7-, and 8-mers was 150, 602 and 2,012, respectively, whereas all of the genomes for the 9-, 10-, and 11-mers had nullomers. In each n-mer considered, the negative binomial model performed the best for at least 93% of the 2,199 genomes; however, a small percentage (i.e., <7%) of the genomes did prefer the ZINB. The negative binomial and zero-inflation distributions extend the traditional Poisson setting and are more flexible in handling overdispersion that can be caused by an increase in nullomers. In an effort to characterize the distribution of the frequency of n-mers, researchers should also consider other discrete distributions that are more flexible and adjust for possible overdispersion.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2014.0108DOI Listing
October 2014

Selecting age-related functional characteristics in the human gut microbiome.

Microbiome 2013 Jan 9;1(1). Epub 2013 Jan 9.

Background: Human gut microbial functions are often associated with various diseases and host physiologies. Aging, a less explored factor, is also suspected to affect or be affected by microbiome alterations. By combining functional feature selection with supervised classification, we aim to facilitate identification of age-related functional characteristics in metagenomes from several human gut microbiome studies (MetaHIT, MicroAge, MicroObes, Kurokawa et al.'s and Gill et al.'s dataset).

Results: We apply two feature selection methods, term frequency-inverse document frequency (TF-iDF) and minimum-redundancy maximum-relevancy (mRMR), to identify functional signatures that differentiate metagenomes by age. After features are reduced, we use a support vector machine (SVM) to predict host age of new metagenomes. Functional features are from protein families (Pfams), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, KEGG ontologies and the Gene Ontology (GO) database. Initial investigations demonstrate that ordination of the functional principal components shows great overlap between different age groups. However, when feature selection is applied, mRMR tightens the ordination cluster for each age group, and TF-iDF offers better linear separation. Both TF-iDF and mRMR were used in conjunction with a SVM classifier and achieved areas under receiver operating characteristic curves (AUCs) 10 to 15% above chance to classify individuals above/below mid-ages (about 38 to 43 years old) using Pfams. Better performance around mid-ages is also observed when using other functional categories and age-balanced dataset. We also identified some age-related Pfams that improved age discrimination at age 65 with another feature selection method called LEfSe, on an age-balanced dataset. The selected functional characteristics identify a broad range of age-relevant metabolisms, such as reduced vitamin B12 synthesis, reduced activity of reductases, increased DNA damage, occurrences of stress responses and immune system compromise, and upregulated glycosyltransferases in the aging population.

Conclusions: Feature selection can yield biologically meaningful results when used in conjunction with classification, and makes age classification of new human gut metagenomes feasible. While we demonstrate the promise of this approach, the data-dependent prediction performance could be further improved. We hypothesize that while the Qin et al. dataset is the most comprehensive to date, even deeper sampling is needed to better characterize and predict the microbiomes' functional content.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/2049-2618-1-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3869192PMC
January 2013

POGO-DB--a database of pairwise-comparisons of genomes and conserved orthologous genes.

Nucleic Acids Res 2014 Jan 5;42(Database issue):D625-32. Epub 2013 Nov 5.

School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA, Electrical & Computer Engineering Department, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA and Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics, the Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel.

POGO-DB (http://pogo.ece.drexel.edu/) provides an easy platform for comparative microbial genomics. POGO-DB allows users to compare genomes using pre-computed metrics that were derived from extensive computationally intensive BLAST comparisons of >2000 microbes. These metrics include (i) average protein sequence identity across all orthologs shared by two genomes, (ii) genomic fluidity (a measure of gene content dissimilarity), (iii) number of 'orthologs' shared between two genomes, (iv) pairwise identity of the 16S ribosomal RNA genes and (v) pairwise identity of an additional 73 marker genes present in >90% prokaryotes. Users can visualize these metrics against each other in a 2D plot for exploratory analysis of genome similarity and of how different aspects of genome similarity relate to each other. The results of these comparisons are fully downloadable. In addition, users can download raw BLAST results for all or user-selected comparisons. Therefore, we provide users with full flexibility to carry out their own downstream analyses, by creating easy access to data that would normally require heavy computational resources to generate. POGO-DB should prove highly useful for researchers interested in comparative microbiology and benefit the microbiome/metagenomic communities by providing the information needed to select suitable phylogenetic marker genes within particular lineages.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt1094DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964953PMC
January 2014

Environmental and ecological factors that shape the gut bacterial communities of fish: a meta-analysis.

Mol Ecol 2012 Jul 4;21(13):3363-78. Epub 2012 Apr 4.

Department of Biology, Drexel University, Philadelphia, PA 19104, USA.

Symbiotic bacteria often help their hosts acquire nutrients from their diet, showing trends of co-evolution and independent acquisition by hosts from the same trophic levels. While these trends hint at important roles for biotic factors, the effects of the abiotic environment on symbiotic community composition remain comparably understudied. In this investigation, we examined the influence of abiotic and biotic factors on the gut bacterial communities of fish from different taxa, trophic levels and habitats. Phylogenetic and statistical analyses of 25 16S rRNA libraries revealed that salinity, trophic level and possibly host phylogeny shape the composition of fish gut bacteria. When analysed alongside bacterial communities from other environments, fish gut communities typically clustered with gut communities from mammals and insects. Similar consideration of individual phylotypes (vs. communities) revealed evolutionary ties between fish gut microbes and symbionts of animals, as many of the bacteria from the guts of herbivorous fish were closely related to those from mammals. Our results indicate that fish harbour more specialized gut communities than previously recognized. They also highlight a trend of convergent acquisition of similar bacterial communities by fish and mammals, raising the possibility that fish were the first to evolve symbioses resembling those found among extant gut fermenting mammals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1365-294X.2012.05552.xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3882143PMC
July 2012

Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms.

PLoS One 2012 5;7(3):e32491. Epub 2012 Mar 5.

School of Biomedical Engineering, Science, and Health Systems, Drexel University, Philadelphia, Pennsylvania, United States of America.

Background: Currently, the naïve Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa.

Principal Findings: Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we "train" a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naïve Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3-15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present).

Conclusions: We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are "highly" similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database "doubles" in the future.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032491PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3293824PMC
July 2012

NBC update: The addition of viral and fungal databases to the Naïve Bayes classification tool.

BMC Res Notes 2012 Jan 31;5:81. Epub 2012 Jan 31.

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA.

Background: Classifying the fungal and viral content of a sample is an important component of analyzing microbial communities in environmental media. Therefore, a method to classify any fragment from these organisms' DNA should be implemented.

Results: We update the näive Bayes classification (NBC) tool to classify reads originating from viral and fungal organisms. NBC classifies a fungal dataset similarly to Basic Local Alignment Search Tool (BLAST) and the Ribosomal Database Project (RDP) classifier. We also show NBC's similarities and differences to RDP on a fungal large subunit (LSU) ribosomal DNA dataset. For viruses in the training database, strain classification accuracy is 98%, while for those reads originating from sequences not in the database, the order-level accuracy is 78%, where order indicates the taxonomic level in the tree of life.

Conclusions: In addition to being competitive to other classifiers available, NBC has the potential to handle reads originating from any location in the genome. We recommend using the Bacteria/Archaea, Fungal, and Virus databases separately due to algorithmic biases towards long genomes. The tool is publicly available at: http://nbc.ece.drexel.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1756-0500-5-81DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3284397PMC
January 2012

Exploiting the functional and taxonomic structure of genomic data by probabilistic topic modeling.

IEEE/ACM Trans Comput Biol Bioinform 2012 Jul-Aug;9(4):980-91

College of Information Science & Technology, Drexel University, Philadelphia, PA 19104, USA.

In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a “document,” which has a mixture of functional groups, while each functional group (also known as a “latent topic”) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of “N-mer” features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the “N-mer” features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2011.113DOI Listing
October 2012

Discovering the unknown: improving detection of novel species and genera from short reads.

J Biomed Biotechnol 2011 23;2011:495849. Epub 2011 Mar 23.

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA.

High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments ("reads") from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between "known" and "unknown" taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an "unknown" class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate the performance of several algorithms on a real acid mine drainage dataset.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2011/495849DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085467PMC
August 2011

Combining gene prediction methods to improve metagenomic gene annotation.

BMC Bioinformatics 2011 Jan 13;12:20. Epub 2011 Jan 13.

Genomic Signal Processing Laboratory, Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA.

Background: Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental) samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation.

Results: We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset.

Conclusions: To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote) is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding) reads on a real human gut sample sequenced by Illumina technology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-20DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042383PMC
January 2011

NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.

Bioinformatics 2011 Jan 8;27(1):127-9. Epub 2010 Nov 8.

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA.

Motivation: Datasets from high-throughput sequencing technologies have yielded a vast amount of data about organisms in environmental samples. Yet, it is still a challenge to assess the exact organism content in these samples because the task of taxonomic classification is too computationally complex to annotate all reads in a dataset. An easy-to-use webserver is needed to process these reads. While many methods exist, only a few are publicly available on webservers, and out of those, most do not annotate all reads.

Results: We introduce a webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss.

Availability: Publicly available at: http://nbc.ece.drexel.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btq619DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3008645PMC
January 2011

Comparison of statistical methods to classify environmental genomic fragments.

IEEE Trans Nanobioscience 2010 Dec 27;9(4):310-6. Epub 2010 Sep 27.

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA.

"Binning" (or taxonomic classification) of DNA sequence reads is an initial step to analyzing an environmental biological sample. Currently, a homology-based tool, BLAST, is one of the most commonly used tools to label DNA reads, but it is argued that BLAST will quickly lose its classification ability as the genome databases grow. In this paper, we compare the accuracies of a naïve Bayes classifier (NBC) and statistical language model to BLAST for binning reads and demonstrate that NBC obtains good performance for the low cost of computational complexity. On the other hand, the back-off n-gram language model can improve accuracy when only partial training data is available (such as in-progress sequencing projects). NBC demonstrates comparable performance to BLAST and can also be optimized on partial training datasets by adjusting the word feature size. A fivefold cross validation is conducted to compare each method's accuracy for determining novel genomes at different taxonomic levels, with NBC outperforming BLAST for species-level classification but BLAST outperforming NBC for genus-level and phyla-level classification. In conclusion, the NBC is a competitive taxonomic classifier, and language models can improve performance when only partial training data is available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNB.2010.2081375DOI Listing
December 2010

Signal processing for metagenomics: extracting information from the soup.

Curr Genomics 2009 Nov;10(7):493-510

Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA, USA.

Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2174/138920209789208255DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808676PMC
November 2009
-->