Publications by authors named "David J Galas"

53 Publications

Cerebrospinal Fluid MicroRNA Changes in Cognitively Normal Veterans With a History of Deployment-Associated Mild Traumatic Brain Injury.

Front Neurosci 2021 9;15:720778. Epub 2021 Sep 9.

Department of Anesthesiology & Perioperative Medicine, Oregon Health & Science University, Portland, OR, United States.

A history of traumatic brain injury (TBI) increases the odds of developing Alzheimer's disease (AD). The long latent period between injury and dementia makes it difficult to study molecular changes initiated by TBI that may increase the risk of developing AD. MicroRNA (miRNA) levels are altered in TBI at acute times post-injury (<4 weeks), and in AD. We hypothesized that miRNA levels in cerebrospinal fluid (CSF) following TBI in veterans may be indicative of increased risk for developing AD. Our population of interest is cognitively normal veterans with a history of one or more mild TBI (mTBI) at a chronic time following TBI. We measured miRNA levels in CSF from three groups of participants: (1) community controls with no lifetime history of TBI (ComC); (2) deployed Iraq/Afghanistan veterans with no lifetime history of TBI (DepC), and (3) deployed Iraq/Afghanistan veterans with a history of repetitive blast mTBI (DepTBI). CSF samples were collected at the baseline visit in a longitudinal, multimodal assessment of Gulf War veterans, and represent a heterogenous group of male veterans and community controls. The average time since the last blast mTBI experienced was 4.7 ± 2.2 years [1.5 - 11.5]. Statistical analysis of TaqMan miRNA array data revealed 18 miRNAs with significant differential expression in the group comparisons: 10 between DepTBI and ComC, 7 between DepC and ComC, and 8 between DepTBI and DepC. We also identified 8 miRNAs with significant differential detection in the group comparisons: 5 in DepTBI vs. ComC, 3 in DepC vs. ComC, and 2 in DepTBI vs. DepC. When we applied our previously developed multivariable dependence analysis, we found 13 miRNAs (6 of which are altered in levels or detection) that show dependencies with participant phenotypes, e.g., ApoE. Target prediction and pathway analysis with miRNAs differentially expressed in DepTBI vs. either DepC or ComC identified canonical pathways highly relevant to TBI including senescence and ephrin receptor signaling, respectively. This study shows that both TBI and deployment result in persistent changes in CSF miRNA levels that are relevant to known miRNA-mediated AD pathology, and which may reflect early events in AD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fnins.2021.720778DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8463659PMC
September 2021

Optimized permutation testing for information theoretic measures of multi-gene interactions.

BMC Bioinformatics 2021 Apr 7;22(1):180. Epub 2021 Apr 7.

Pacific Northwest Research Institute, 720 Broadway, Seattle, WA, 98122, USA.

Background: Permutation testing is often considered the "gold standard" for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large.

Results: In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP-SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 10 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples.

Conclusions: The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-021-04107-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8028212PMC
April 2021

Toward an Information Theory of Quantitative Genetics.

J Comput Biol 2021 06 31;28(6):527-559. Epub 2020 Dec 31.

Pacific Northwest Research Institute, Seattle, Washington, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2020.0032DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8220575PMC
June 2021

Complex genetic dependencies among growth and neurological phenotypes in healthy children: Towards deciphering developmental mechanisms.

PLoS One 2020 3;15(12):e0242684. Epub 2020 Dec 3.

Pacific Northwest Research Institute, Seattle, Washington, United States of America.

The genetic mechanisms of childhood development in its many facets remain largely undeciphered. In the population of healthy infants studied in the Growing Up in Singapore Towards Healthy Outcomes (GUSTO) program, we have identified a range of dependencies among the observed phenotypes of fetal and early childhood growth, neurological development, and a number of genetic variants. We have quantified these dependencies using our information theory-based methods. The genetic variants show dependencies with single phenotypes as well as pleiotropic effects on more than one phenotype and thereby point to a large number of brain-specific and brain-expressed gene candidates. These dependencies provide a basis for connecting a range of variants with a spectrum of phenotypes (pleiotropy) as well as with each other. A broad survey of known regulatory expression characteristics, and other function-related information from the literature for these sets of candidate genes allowed us to assemble an integrated body of evidence, including a partial regulatory network, that points towards the biological basis of these general dependencies. Notable among the implicated loci are RAB11FIP4 (next to NF1), MTMR7 and PLD5, all highly expressed in the brain; DNMT1 (DNA methyl transferase), highly expressed in the placenta; and PPP1R12B and DMD (dystrophin), known to be important growth and development genes. While we cannot specify and decipher the mechanisms responsible for the phenotypes in this study, a number of connections for further investigation of fetal and early childhood growth and neurological development are indicated. These results and this approach open the door to new explorations of early human development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242684PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7714163PMC
January 2021

Extracting Reproducible Time-Resolved Resting State Networks Using Dynamic Mode Decomposition.

Front Comput Neurosci 2019 31;13:75. Epub 2019 Oct 31.

Department of Biology, University of Washington, Seattle, WA, United States.

Resting state networks (RSNs) extracted from functional magnetic resonance imaging (fMRI) scans are believed to reflect the intrinsic organization and network structure of brain regions. Most traditional methods for computing RSNs typically assume these functional networks are static throughout the duration of a scan lasting 5-15 min. However, they are known to vary on timescales ranging from seconds to years; in addition, the dynamic properties of RSNs are affected in a wide variety of neurological disorders. Recently, there has been a proliferation of methods for characterizing RSN dynamics, yet it remains a challenge to extract reproducible time-resolved networks. In this paper, we develop a novel method based on dynamic mode decomposition (DMD) to extract networks from short windows of noisy, high-dimensional fMRI data, allowing RSNs from single scans to be resolved robustly at a temporal resolution of seconds. After validating the method on a synthetic dataset, we analyze data from 120 individuals from the Human Connectome Project and show that unsupervised clustering of DMD modes discovers RSNs at both the group (gDMD) and the single subject (sDMD) levels. The gDMD modes closely resemble canonical RSNs. Compared to established methods, sDMD modes capture individualized RSN structure that both better resembles the population RSN and better captures subject-level variation. We further leverage this time-resolved sDMD analysis to infer occupancy and transitions among RSNs with high reproducibility. This automated DMD-based method is a powerful tool to characterize spatial and temporal structures of RSNs in individual subjects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fncom.2019.00075DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6834549PMC
October 2019

Computational Inference Software for Tetrad Assembly from Randomly Arrayed Yeast Colonies.

G3 (Bethesda) 2019 07 9;9(7):2071-2088. Epub 2019 Jul 9.

Pacific Northwest Research Institute, Seattle, WA 98122.

We describe an information-theory-based method and associated software for computationally identifying sister spores derived from the same meiotic tetrad. The method exploits specific DNA sequence features of tetrads that result from meiotic centromere and allele segregation patterns. Because the method uses only the genomic sequence, it alleviates the need for tetrad-specific barcodes or other genetic modifications to the strains. Using this method, strains derived from randomly arrayed spores can be efficiently grouped back into tetrads.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.119.400166DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6643883PMC
July 2019

Phospho-RNA-seq: a modified small RNA-seq method that reveals circulating mRNA and lncRNA fragments as potential biomarkers in human plasma.

EMBO J 2019 06 3;38(11). Epub 2019 May 3.

Department of Internal Medicine, Hematology/Oncology Division, University of Michigan, Ann Arbor, MI, USA

Extracellular RNAs (exRNAs) in biofluids have attracted great interest as potential biomarkers. Although extracellular microRNAs in blood plasma are extensively characterized, extracellular messenger RNA (mRNA) and long non-coding RNA (lncRNA) studies are limited. We report that plasma contains fragmented mRNAs and lncRNAs that are missed by standard small RNA-seq protocols due to lack of 5' phosphate or presence of 3' phosphate. These fragments were revealed using a modified protocol ("phospho-RNA-seq") incorporating RNA treatment with T4-polynucleotide kinase, which we compared with standard small RNA-seq for sequencing synthetic RNAs with varied 5' and 3' ends, as well as human plasma exRNA Analyzing phospho-RNA-seq data using a custom, high-stringency bioinformatic pipeline, we identified mRNA/lncRNA transcriptome fingerprints in plasma, including tissue-specific gene sets. In a longitudinal study of hematopoietic stem cell transplant patients, bone marrow- and liver-enriched exRNA genes were tracked with bone marrow recovery and liver injury, respectively, providing proof-of-concept validation as a biomarker approach. By enabling access to an unexplored realm of mRNA and lncRNA fragments, phospho-RNA-seq opens up new possibilities for plasma transcriptomic biomarker development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.15252/embj.2019101695DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6545557PMC
June 2019

exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids.

Cell 2019 04;177(2):463-477.e15

Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Sema4, Stamford, CT 06902, USA.

To develop a map of cell-cell communication mediated by extracellular RNA (exRNA), the NIH Extracellular RNA Communication Consortium created the exRNA Atlas resource (https://exrna-atlas.org). The Atlas version 4P1 hosts 5,309 exRNA-seq and exRNA qPCR profiles from 19 studies and a suite of analysis and visualization tools. To analyze variation between profiles, we apply computational deconvolution. The analysis leads to a model with six exRNA cargo types (CT1, CT2, CT3A, CT3B, CT3C, CT4), each detectable in multiple biofluids (serum, plasma, CSF, saliva, urine). Five of the cargo types associate with known vesicular and non-vesicular (lipoprotein and ribonucleoprotein) exRNA carriers. To validate utility of this model, we re-analyze an exercise response study by deconvolution to identify physiologically relevant response pathways that were not detected previously. To enable wide application of this model, as part of the exRNA Atlas resource, we provide tools for deconvolution and analysis of user-provided case-control studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2019.02.018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616370PMC
April 2019

The Extracellular RNA Communication Consortium: Establishing Foundational Knowledge and Technologies for Extracellular RNA Research.

Cell 2019 04;177(2):231-242

Department of Obstetrics, Gynecology, and Reproductive Sciences and Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA 92093, USA. Electronic address:

The Extracellular RNA Communication Consortium (ERCC) was launched to accelerate progress in the new field of extracellular RNA (exRNA) biology and to establish whether exRNAs and their carriers, including extracellular vesicles (EVs), can mediate intercellular communication and be utilized for clinical applications. Phase 1 of the ERCC focused on exRNA/EV biogenesis and function, discovery of exRNA biomarkers, development of exRNA/EV-based therapeutics, and construction of a robust set of reference exRNA profiles for a variety of biofluids. Here, we present progress by ERCC investigators in these areas, and we discuss collaborative projects directed at development of robust methods for EV/exRNA isolation and analysis and tools for sharing and computational analysis of exRNA profiling data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2019.03.023DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6601620PMC
April 2019

Symmetries among Multivariate Information Measures Explored Using Möbius Operators.

Entropy (Basel) 2019 Jan 18;21(1). Epub 2019 Jan 18.

Pacific Northwest Research Institute, 720 Broadway, Seattle, WA 98122, USA.

Relations between common information measures include the duality relations based on Möbius inversion on lattices, which are the direct consequence of the symmetries of the lattices of the sets of variables (subsets ordered by inclusion). In this paper we use the lattice and functional symmetries to provide a unifying formalism that reveals some new relations and systematizes the symmetries of the information functions. To our knowledge, this is the first systematic examination of the full range of relationships of this class of functions. We define operators on functions on these lattices based on the Möbius inversions that map functions into one another, which we call Möbius operators, and show that they form a simple group isomorphic to the symmetric group S. Relations among the set of functions on the lattice are transparently expressed in terms of the operator algebra, and, when applied to the information measures, can be used to derive a wide range of relationships among diverse information measures. The Möbius operator algebra is then naturally generalized which yields an even wider range of new relationships.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/e21010088DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514198PMC
January 2019

Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function.

J Comput Biol 2019 02 29;26(2):152-171. Epub 2018 Nov 29.

Pacific Northwest Research Institute, Seattle, Washington.

Missing values in complex biological data sets have significant impacts on our ability to correctly detect and quantify interactions in biological systems and to infer relationships accurately. In this article, we propose a useful metaphor to show that information theory measures, such as mutual information and interaction information, can be employed directly for evaluating multivariable dependencies even if data contain some missing values. The metaphor is that of thinking of variable dependencies as information channels between and among variables. In this view, missing data can be thought of as noise that reduces the channel capacity in predictable ways. We extract the available information in the data even if there are missing values and use the notion of channel capacity to assess the reliability of the result. This avoids the common practice-in the absence of prior knowledge of random imputation-of eliminating samples entirely, thus losing the information they can provide. We show how this reliability function can be implemented for pairs of variables, and generalize it for an arbitrary number of variables. Illustrations of the reliability functions for several cases are provided using simulated data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2018.0179DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6383577PMC
February 2019

Comprehensive multi-center assessment of small RNA-seq methods for quantitative miRNA profiling.

Nat Biotechnol 2018 09 16;36(8):746-757. Epub 2018 Jul 16.

Department of Internal Medicine, Hematology/Oncology Division, University of Michigan, Ann Arbor, Michigan, USA.

RNA-seq is increasingly used for quantitative profiling of small RNAs (for example, microRNAs, piRNAs and snoRNAs) in diverse sample types, including isolated cells, tissues and cell-free biofluids. The accuracy and reproducibility of the currently used small RNA-seq library preparation methods have not been systematically tested. Here we report results obtained by a consortium of nine labs that independently sequenced reference, 'ground truth' samples of synthetic small RNAs and human plasma-derived RNA. We assessed three commercially available library preparation methods that use adapters of defined sequence and six methods using adapters with degenerate bases. Both protocol- and sequence-specific biases were identified, including biases that reduced the ability of small RNA-seq to accurately measure adenosine-to-inosine editing in microRNAs. We found that these biases were mitigated by library preparation methods that incorporate adapters with degenerate bases. MicroRNA relative quantification between samples using small RNA-seq was accurate and reproducible across laboratories and methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.4183DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6078798PMC
September 2018

Small RNA profiling of low biomass samples: identification and removal of contaminants.

BMC Biol 2018 05 14;16(1):52. Epub 2018 May 14.

Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 4362, Esch-sur-Alzette, Luxembourg.

Background: Sequencing-based analyses of low-biomass samples are known to be prone to misinterpretation due to the potential presence of contaminating molecules derived from laboratory reagents and environments. DNA contamination has been previously reported, yet contamination with RNA is usually considered to be very unlikely due to its inherent instability. Small RNAs (sRNAs) identified in tissues and bodily fluids, such as blood plasma, have implications for physiology and pathology, and therefore the potential to act as disease biomarkers. Thus, the possibility for RNA contaminants demands careful evaluation.

Results: Herein, we report on the presence of small RNA (sRNA) contaminants in widely used microRNA extraction kits and propose an approach for their depletion. We sequenced sRNAs extracted from human plasma samples and detected important levels of non-human (exogenous) sequences whose source could be traced to the microRNA extraction columns through a careful qPCR-based analysis of several laboratory reagents. Furthermore, we also detected the presence of artefactual sequences related to these contaminants in a range of published datasets, thereby arguing in particular for a re-evaluation of reports suggesting the presence of exogenous RNAs of microbial and dietary origin in blood plasma. To avoid artefacts in future experiments, we also devise several protocols for the removal of contaminant RNAs, define minimal amounts of starting material for artefact-free analyses, and confirm the reduction of contaminant levels for identification of bona fide sequences using 'ultra-clean' extraction kits.

Conclusion: This is the first report on the presence of RNA molecules as contaminants in RNA extraction kits. The described protocols should be applied in the future to avoid confounding sRNA studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12915-018-0522-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5952572PMC
May 2018

sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline.

Nucleic Acids Res 2017 Dec;45(21):12140-12151

Institute for Systems Biology, Seattle, WA 98109, USA.

Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx999DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5716150PMC
December 2017

The Information Content of Discrete Functions and Their Application in Genetic Data Analysis.

J Comput Biol 2017 Dec 13;24(12):1153-1178. Epub 2017 Oct 13.

Pacific Northwest Research Institute , Seattle, Washington.

The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis-that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2017.0143DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5729883PMC
December 2017

Reprogramming progeria fibroblasts re-establishes a normal epigenetic landscape.

Aging Cell 2017 08 8;16(4):870-887. Epub 2017 Jun 8.

The Sprott Centre for Stem Cell Research, Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada, K1H 8L6.

Ideally, disease modeling using patient-derived induced pluripotent stem cells (iPSCs) enables analysis of disease initiation and progression. This requires any pathological features of the patient cells used for reprogramming to be eliminated during iPSC generation. Hutchinson-Gilford progeria syndrome (HGPS) is a segmental premature aging disorder caused by the accumulation of the truncated form of Lamin A known as Progerin within the nuclear lamina. Cellular hallmarks of HGPS include nuclear blebbing, loss of peripheral heterochromatin, defective epigenetic inheritance, altered gene expression, and senescence. To model HGPS using iPSCs, detailed genome-wide and structural analysis of the epigenetic landscape is required to assess the initiation and progression of the disease. We generated a library of iPSC lines from fibroblasts of patients with HGPS and controls, including one family trio. HGPS patient-derived iPSCs are nearly indistinguishable from controls in terms of pluripotency, nuclear membrane integrity, as well as transcriptional and epigenetic profiles, and can differentiate into affected cell lineages recapitulating disease progression, despite the nuclear aberrations, altered gene expression, and epigenetic landscape inherent to the donor fibroblasts. These analyses demonstrate the power of iPSC reprogramming to reset the epigenetic landscape to a revitalized pluripotent state in the face of widespread epigenetic defects, validating their use to model the initiation and progression of disease in affected cell lineages.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/acel.12621DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506428PMC
August 2017

A genome wide dosage suppressor network reveals genomic robustness.

Nucleic Acids Res 2017 Jan 29;45(1):255-270. Epub 2016 Nov 29.

Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA

Genomic robustness is the extent to which an organism has evolved to withstand the effects of deleterious mutations. We explored the extent of genomic robustness in budding yeast by genome wide dosage suppressor analysis of 53 conditional lethal mutations in cell division cycle and RNA synthesis related genes, revealing 660 suppressor interactions of which 642 are novel. This collection has several distinctive features, including high co-occurrence of mutant-suppressor pairs within protein modules, highly correlated functions between the pairs and higher diversity of functions among the co-suppressors than previously observed. Dosage suppression of essential genes encoding RNA polymerase subunits and chromosome cohesion complex suggests a surprising degree of functional plasticity of macromolecular complexes, and the existence of numerous degenerate pathways for circumventing the effects of potentially lethal mutations. These results imply that organisms and cancer are likely able to exploit the genomic robustness properties, due the persistence of cryptic gene and pathway functions, to generate variation and adapt to selective pressures.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1148DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224485PMC
January 2017

Biological data analysis as an information theory problem: multivariable dependence measures and the shadows algorithm.

J Comput Biol 2015 Nov 3;22(11):1005-24. Epub 2015 Sep 3.

1 Pacific Northwest Diabetes Research Institute , Seattle, Washington.

Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of variables in a data set, which is significantly nonzero only if the subset of variables is collectively dependent. This is useful, however, only if we can avoid a combinatorial explosion of calculations for increasing numbers of variables.  The proposed dependence measure for a subset of variables, τ, differential interaction information, Δ(τ), has the property that for subsets of τ some of the factors of Δ(τ) are significantly nonzero, when the full dependence includes more variables. We use this property to suppress the combinatorial explosion by following the "shadows" of multivariable dependency on smaller subsets. Rather than calculating the marginal entropies of all subsets at each degree level, we need to consider only calculations for subsets of variables with appropriate "shadows." The number of calculations for n variables at a degree level of d grows therefore, at a much smaller rate than the binomial coefficient (n, d), but depends on the parameters of the "shadows" calculation. This approach, avoiding a combinatorial explosion, enables the use of our multivariable measures on very large data sets. We demonstrate this method on simulated data sets, and characterize the effects of noise and sample numbers. In addition, we analyze a data set of a few thousand mutant yeast strains interacting with a few thousand chemical compounds.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2015.0051DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642827PMC
November 2015

Systems genomics evaluation of the SH-SY5Y neuroblastoma cell line as a model for Parkinson's disease.

BMC Genomics 2014 Dec 20;15:1154. Epub 2014 Dec 20.

Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg.

Background: The human neuroblastoma cell line, SH-SY5Y, is a commonly used cell line in studies related to neurotoxicity, oxidative stress, and neurodegenerative diseases. Although this cell line is often used as a cellular model for Parkinson's disease, the relevance of this cellular model in the context of Parkinson's disease (PD) and other neurodegenerative diseases has not yet been systematically evaluated.

Results: We have used a systems genomics approach to characterize the SH-SY5Y cell line using whole-genome sequencing to determine the genetic content of the cell line and used transcriptomics and proteomics data to determine molecular correlations. Further, we integrated genomic variants using a network analysis approach to evaluate the suitability of the SH-SY5Y cell line for perturbation experiments in the context of neurodegenerative diseases, including PD.

Conclusions: The systems genomics approach showed consistency across different biological levels (DNA, RNA and protein concentrations). Most of the genes belonging to the major Parkinson's disease pathways and modules were intact in the SH-SY5Y genome. Specifically, each analysed gene related to PD has at least one intact copy in SH-SY5Y. The disease-specific network analysis approach ranked the genetic integrity of SH-SY5Y as higher for PD than for Alzheimer's disease but lower than for Huntington's disease and Amyotrophic Lateral Sclerosis for loss of function perturbation experiments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-15-1154DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4367834PMC
December 2014

Mutations in STX1B, encoding a presynaptic protein, cause fever-associated epilepsy syndromes.

Nat Genet 2014 Dec 2;46(12):1327-32. Epub 2014 Nov 2.

Section of Complex Genetics, Department of Medical Genetics, University Medical Center Utrecht, Utrecht, the Netherlands.

Febrile seizures affect 2-4% of all children and have a strong genetic component. Recurrent mutations in three main genes (SCN1A, SCN1B and GABRG2) have been identified that cause febrile seizures with or without epilepsy. Here we report the identification of mutations in STX1B, encoding syntaxin-1B, that are associated with both febrile seizures and epilepsy. Whole-exome sequencing in independent large pedigrees identified cosegregating STX1B mutations predicted to cause an early truncation or an in-frame insertion or deletion. Three additional nonsense or missense mutations and a de novo microdeletion encompassing STX1B were then identified in 449 familial or sporadic cases. Video and local field potential analyses of zebrafish larvae with antisense knockdown of stx1b showed seizure-like behavior and epileptiform discharges that were highly sensitive to increased temperature. Wild-type human syntaxin-1B but not a mutated protein rescued the effects of stx1b knockdown in zebrafish. Our results thus implicate STX1B and the presynaptic release machinery in fever-associated epilepsy syndromes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3130DOI Listing
December 2014

A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data.

Nat Biotechnol 2014 Jul 18;32(7):663-9. Epub 2014 May 18.

Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA.

High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.2895DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157619PMC
July 2014

Discovering pair-wise genetic interactions: an information theory-based approach.

PLoS One 2014 26;9(3):e92310. Epub 2014 Mar 26.

Luxembourg Centre for Systems Biomedicine, Esch-sur-Alzette, Luxembourg; Pacific Northwest Diabetes Research Institute, Seattle, Washington, United States of America.

Phenotypic variation, including that which underlies health and disease in humans, results in part from multiple interactions among both genetic variation and environmental factors. While diseases or phenotypes caused by single gene variants can be identified by established association methods and family-based approaches, complex phenotypic traits resulting from multi-gene interactions remain very difficult to characterize. Here we describe a new method based on information theory, and demonstrate how it improves on previous approaches to identifying genetic interactions, including both synthetic and modifier kinds of interactions. We apply our measure, called interaction distance, to previously analyzed data sets of yeast sporulation efficiency, lipid related mouse data and several human disease models to characterize the method. We show how the interaction distance can reveal novel gene interaction candidates in experimental and simulated data sets, and outperforms other measures in several circumstances. The method also allows us to optimize case/control sample composition for clinical studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0092310PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3966778PMC
December 2015

Molecular evidence of stress-induced acute heart injury in a mouse model simulating posttraumatic stress disorder.

Proc Natl Acad Sci U S A 2014 Feb 10;111(8):3188-93. Epub 2014 Feb 10.

Institute for Systems Biology, Seattle, WA 98109.

Posttraumatic stress disorder (PTSD) is a common condition induced by life-threatening stress, such as that experienced by soldiers under battlefield conditions. Other than the commonly recognized behavioral and psychological dysfunction, epidemiological studies have also revealed that PTSD patients have a higher risk of other diseases, such as cardiovascular disorders. Using a PTSD mouse model, we investigated the longitudinal transcriptomic changes in heart tissues after the exposure to stress through intimidation. Our results revealed acute heart injury associated with the traumatic experience, reflecting the underlying biological injury processes of the immune response, extracellular matrix remodeling, epithelial-to-mesenchymal cell transitions, and cell proliferation. Whether this type of injury has any long-term effects on heart function is yet to be determined. The differing responses to stress leading to acute heart injury in different inbred strains of mice also suggest that this response has a genetic as well as an environmental component. Accordingly, the results from this study suggest a molecular basis for the observed higher risk of cardiovascular disorders in PTSD patients, which raises the likelihood of cardiac dysfunction induced by long-term stress exposures.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1400113111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3939897PMC
February 2014

Describing the complexity of systems: multivariable "set complexity" and the information basis of systems biology.

J Comput Biol 2014 Feb 30;21(2):118-40. Epub 2013 Dec 30.

1 Pacific Northwest Diabetes Research Institute , Seattle, Washington.

Context dependence is central to the description of complexity. Keying on the pairwise definition of "set complexity," we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multivariable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multivariable dependency, "differential interaction information." This quantity for two variables reduces to the pairwise "set complexity" previously proposed as a context-dependent measure of information in biological systems. We generalize it here to an arbitrary number of variables. Critical limiting properties of the "differential interaction information" are key to the generalization. This measure extends previous ideas about biological information and provides a more sophisticated basis for the study of complexity. The properties of "differential interaction information" also suggest new approaches to data analysis. Given a data set of system measurements, differential interaction information can provide a measure of collective dependence, which can be represented in hypergraphs describing complex system interaction patterns. We investigate this kind of analysis using simulated data sets. The conjoining of a generalized set complexity measure, multivariable dependency analysis, and hypergraphs is our central result. While our focus is on complex biological systems, our results are applicable to any complex system.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2013.0039DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3904535PMC
February 2014

An evaluation of high-throughput approaches to QTL mapping in Saccharomyces cerevisiae.

Genetics 2014 Mar 27;196(3):853-65. Epub 2013 Dec 27.

European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany.

Dissecting the molecular basis of quantitative traits is a significant challenge and is essential for understanding complex diseases. Even in model organisms, precisely determining causative genes and their interactions has remained elusive, due in part to difficulty in narrowing intervals to single genes and in detecting epistasis or linked quantitative trait loci. These difficulties are exacerbated by limitations in experimental design, such as low numbers of analyzed individuals or of polymorphisms between parental genomes. We address these challenges by applying three independent high-throughput approaches for QTL mapping to map the genetic variants underlying 11 phenotypes in two genetically distant Saccharomyces cerevisiae strains, namely (1) individual analysis of >700 meiotic segregants, (2) bulk segregant analysis, and (3) reciprocal hemizygosity scanning, a new genome-wide method that we developed. We reveal differences in the performance of each approach and, by combining them, identify eight polymorphic genes that affect eight different phenotypes: colony shape, flocculation, growth on two nonfermentable carbon sources, and resistance to two drugs, salt, and high temperature. Our results demonstrate the power of individual segregant analysis to dissect QTL and address the underestimated contribution of interactions between variants. We also reveal confounding factors like mutations and aneuploidy in pooled approaches, providing valuable lessons for future designs of complex trait mapping studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.113.160291DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948811PMC
March 2014

RCytoscape: tools for exploratory network analysis.

BMC Bioinformatics 2013 Jul 9;14:217. Epub 2013 Jul 9.

Fred Hutchison Cancer Research Institute, Seattle, WA, USA.

Background: Biomolecular pathways and networks are dynamic and complex, and the perturbations to them which cause disease are often multiple, heterogeneous and contingent. Pathway and network visualizations, rendered on a computer or published on paper, however, tend to be static, lacking in detail, and ill-equipped to explore the variety and quantities of data available today, and the complex causes we seek to understand.

Results: RCytoscape integrates R (an open-ended programming environment rich in statistical power and data-handling facilities) and Cytoscape (powerful network visualization and analysis software). RCytoscape extends Cytoscape's functionality beyond what is possible with the Cytoscape graphical user interface. To illustrate the power of RCytoscape, a portion of the Glioblastoma multiforme (GBM) data set from the Cancer Genome Atlas (TCGA) is examined. Network visualization reveals previously unreported patterns in the data suggesting heterogeneous signaling mechanisms active in GBM Proneural tumors, with possible clinical relevance.

Conclusions: Progress in bioinformatics and computational biology depends upon exploratory and confirmatory data analysis, upon inference, and upon modeling. These activities will eventually permit the prediction and control of complex biological systems. Network visualizations--molecular maps--created from an open-ended programming environment rich in statistical power and data-handling facilities, such as RCytoscape, will play an essential role in this progression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-14-217DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751905PMC
July 2013

The spectrum of circulating RNA: a window into systems toxicology.

Toxicol Sci 2013 Apr 28;132(2):478-92. Epub 2013 Jan 28.

Institute for Systems Biology, Seattle, Washington 98109, USA.

Adverse effects caused by therapeutic drugs are a serious and costly health concern. Despite the body's systemic responses to therapeutics, the liver is often the focus of damage and is usually the focus of studies of toxic effects due to its active roles in the metabolism of xenobiotics. It is extremely difficult, however, to assess systemic responses with currently available methods. Comprehensive cataloging of cell-free circulating RNAs using next-generation sequencing technology may open a window to assess drug-associated adverse effects at the systems level. To explore this potential, we conducted an RNA profiling study using the well-characterized acetaminophen overdose mouse model on liver and plasma with microarray and next-generation sequencing platforms, respectively. After drug treatment, the levels of a number of transcripts, both endogenous and exogenous RNAs, showed significant changes in plasma, reflecting not only the classical liver injury induced by acetaminophen overdose but also damage in tissues other than the liver. The changes in exogenous RNAs also reflect alteration on dieting behavior after acetaminophen overdose. Besides reporting an extensive list of circulating RNA-based biomarker candidates, this study illustrates the possibility of using circulating RNAs to assess global effects of therapeutics. This could also lead to a new approach for a more comprehensive assessment of the efficacy and safety of therapeutics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/toxsci/kft014DOI Listing
April 2013

Relations between the set-complexity and the structure of graphs and their sub-graphs.

EURASIP J Bioinform Syst Biol 2012 Sep 21;2012(1):13. Epub 2012 Sep 21.

Institute for Systems Biology, 401 N, Terry Avenue, Seattle, WA 98109, USA.

: We describe some new conceptual tools for the rigorous, mathematical description of the "set-complexity" of graphs. This set-complexity has been shown previously to be a useful measure for analyzing some biological networks, and in discussing biological information in a quantitative fashion. The advances described here allow us to define some significant relationships between the set-complexity measure and the structure of graphs, and of their component sub-graphs. We show here that modular graph structures tend to maximize the set-complexity of graphs. We point out the relationship between modularity and redundancy, and discuss the significance of set-complexity in this regard. We specifically discuss the relationship between complexity and entropy in the case of complete-bipartite graphs, and present a new method for constructing highly complex, binary graphs. These results can be extended to the case of ternary graphs, and to other multi-edge graphs, which are fundamentally more relevant to biological structures and systems. Finally, our results lead us to an approach for extracting high complexity modular graphs from large, noisy graphs with low information content. We illustrate this approach with two examples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1687-4153-2012-13DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610188PMC
September 2012

Comparing the MicroRNA spectrum between serum and plasma.

PLoS One 2012 31;7(7):e41561. Epub 2012 Jul 31.

Institute for Systems Biology, Seattle, Washington, United States of America.

MicroRNAs (miRNAs) are small, non-coding RNAs that regulate various biological processes, primarily through interaction with messenger RNAs. The levels of specific, circulating miRNAs in blood have been shown to associate with various pathological conditions including cancers. These miRNAs have great potential as biomarkers for various pathophysiological conditions. In this study we focused on different sample types' effects on the spectrum of circulating miRNA in blood. Using serum and corresponding plasma samples from the same individuals, we observed higher miRNA concentrations in serum samples compared to the corresponding plasma samples. The difference between serum and plasma miRNA concentration showed some associations with miRNA from platelets, which may indicate that the coagulation process may affect the spectrum of extracellular miRNA in blood. Several miRNAs also showed platform dependent variations in measurements. Our results suggest that there are a number of factors that might affect the measurement of circulating miRNA concentration. Caution must be taken when comparing miRNA data generated from different sample types or measurement platforms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0041561PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409228PMC
April 2013
-->