Publications by authors named "Nicolo Fusi"

12 Publications

  • Page 1 of 1

Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs.

Nat Biomed Eng 2018 Jan 10;2(1):38-47. Epub 2018 Jan 10.

Microsoft Research, Cambridge, MA, USA.

The CRISPR-Cas9 system provides unprecedented genome editing capabilities. However, off-target effects lead to sub-optimal usage and additionally are a bottleneck in the development of therapeutic uses. Herein, we introduce the first machine learning-based approach to off-target prediction, yielding a state-of-the-art model for CRISPR-Cas9 that outperforms all other guide design services. Our approach, Elevation, consists of two interdependent machine learning models-one for scoring individual guide-target pairs, and another which aggregates these guide-target scores into a single, overall summary guide score. Through systematic investigation, we demonstrate that Elevation performs substantially better than competing approaches on both tasks. Additionally, we are the first to systematically evaluate approaches on the guide summary score problem; we show that the most widely-used method performs no better than random at times, whereas Elevation consistently outperformed it, sometimes by an order of magnitude. We also introduce an evaluation method that balances errors between active and inactive guides, thereby encapsulating a range of practical use cases; Elevation is consistently superior to other methods across the entire range. Finally, because of the large scale and computational demands of off-target prediction, we have developed a cloud-based service for quick retrieval. This service provides end-to-end guide design by also incorporating our previously reported on-target model, Azimuth. (https://crispr.ml:please treat this web site as confidential until publication).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41551-017-0178-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037314PMC
January 2018

Orthologous CRISPR-Cas9 enzymes for combinatorial genetic screens.

Nat Biotechnol 2018 02 18;36(2):179-189. Epub 2017 Dec 18.

Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

Combinatorial genetic screening using CRISPR-Cas9 is a useful approach to uncover redundant genes and to explore complex gene networks. However, current methods suffer from interference between the single-guide RNAs (sgRNAs) and from limited gene targeting activity. To increase the efficiency of combinatorial screening, we employ orthogonal Cas9 enzymes from Staphylococcus aureus and Streptococcus pyogenes. We used machine learning to establish S. aureus Cas9 sgRNA design rules and paired S. aureus Cas9 with S. pyogenes Cas9 to achieve dual targeting in a high fraction of cells. We also developed a lentiviral vector and cloning strategy to generate high-complexity pooled dual-knockout libraries to identify synthetic lethal and buffering gene pairs across multiple cell types, including MAPK pathway genes and apoptotic genes. Our orthologous approach also enabled a screen combining gene knockouts with transcriptional activation, which revealed genetic interactions with TP53. The "Big Papi" (paired aureus and pyogenes for interactions) approach described here will be widely applicable for the study of combinatorial phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.4048DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5800952PMC
February 2018

Flexible Modeling of Genetic Effects on Function-Valued Traits.

J Comput Biol 2017 Jun 5;24(6):524-535. Epub 2017 Jan 5.

Microsoft Research , Cambridge, Massachusetts.

Genome-wide association studies commonly examine one trait at a time. Occasionally they examine several related traits with the hope of increasing power; in such a setting, the traits are not generally smoothly varying in any way such as time or space. However, for function-valued traits, the trait is often smoothly varying along the axis of interest, such as space or time. For instance, in the case of longitudinal traits such as growth curves, the axis of interest is time; for spatially varying traits such as chromatin accessibility, it would be position along the genome. Although there have been efforts to perform genome-wide association studies with such function-valued traits, the statistical approaches developed for this purpose often have limitations such as requiring the trait to behave linearly in time or space, or constraining the genetic effect itself to be constant or linear in time. Herein, we present a flexible model for this problem-the Partitioned Gaussian Process-which removes many such limitations and is especially effective as the number of time points increases. The theoretical basis of this model provides machinery for handling missing and unaligned function values such as would occur when not all individuals are measured at the same time points. Furthermore, we make use of algebraic refactorizations to substantially reduce the time complexity of our model beyond the naive implementation. Finally, we apply our approach and several others to synthetic data before closing, with some directions for improved modeling and statistical testing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2016.0174DOI Listing
June 2017

Impact of pre-adapted HIV transmission.

Nat Med 2016 06 16;22(6):606-13. Epub 2016 May 16.

Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA.

Human leukocyte antigen class I (HLA)-restricted CD8(+) T lymphocyte (CTL) responses are crucial to HIV-1 control. Although HIV can evade these responses, the longer-term impact of viral escape mutants remains unclear, as these variants can also reduce intrinsic viral fitness. To address this, we here developed a metric to determine the degree of HIV adaptation to an HLA profile. We demonstrate that transmission of viruses that are pre-adapted to the HLA molecules expressed in the recipient is associated with impaired immunogenicity, elevated viral load and accelerated CD4(+) T cell decline. Furthermore, the extent of pre-adaptation among circulating viruses explains much of the variation in outcomes attributed to the expression of certain HLA alleles. Thus, viral pre-adaptation exploits 'holes' in the immune response. Accounting for these holes may be key for vaccine strategies seeking to elicit functional responses from viral variants, and to HIV cure strategies that require broad CTL responses to achieve successful eradication of HIV reservoirs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nm.4100DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4899163PMC
June 2016

Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.

Nat Biotechnol 2016 Feb 18;34(2):184-191. Epub 2016 Jan 18.

Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

CRISPR-Cas9-based genetic screens are a powerful new tool in biology. By simply altering the sequence of the single-guide RNA (sgRNA), one can reprogram Cas9 to target different sites in the genome with relative ease, but the on-target activity and off-target effects of individual sgRNAs can vary widely. Here, we use recently devised sgRNA design rules to create human and mouse genome-wide libraries, perform positive and negative selection screens and observe that the use of these rules produced improved results. Additionally, we profile the off-target activity of thousands of sgRNAs and develop a metric to predict off-target sites. We incorporate these findings from large-scale, empirical data to improve our computational design rules and create optimized sgRNA libraries that maximize on-target activity and minimize off-target effects to enable more effective and efficient genetic screens and genome engineering.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3437DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744125PMC
February 2016

Further improvements to linear mixed models for genome-wide association studies.

Sci Rep 2014 Nov 12;4:6874. Epub 2014 Nov 12.

eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep06874DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230738PMC
November 2014

Warped linear mixed models for the genetic analysis of transformed phenotypes.

Nat Commun 2014 Sep 19;5:4890. Epub 2014 Sep 19.

European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge CB10 1SD, UK.

Linear mixed models (LMMs) are a powerful and established tool for studying genotype-phenotype relationships. A limitation of the LMM is that the model assumes Gaussian distributed residuals, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and loss in power. To mitigate this problem, it is common practice to pre-process the phenotypic values to make them as Gaussian as possible, for instance by applying logarithmic or other nonlinear transformations. Unfortunately, different phenotypes require different transformations, and choosing an appropriate transformation is challenging and subjective. Here we present an extension of the LMM that estimates an optimal transformation from the observed data. In simulations and applications to real data from human, mouse and yeast, we show that using transformations inferred by our model increases power in genome-wide association studies and increases the accuracy of heritability estimation and phenotype prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms5890DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4199105PMC
September 2014

A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control.

Elife 2013 Oct 29;2:e01123. Epub 2013 Oct 29.

School of Life Sciences , École Polytechnique Fédérale de Lausanne , Lausanne , Switzerland ; Institute of Microbiology , University Hospital and University of Lausanne , Lausanne , Switzerland ; Research Group of Theoretical Biology and Evolutionary Ecology , Eötvös Loránd University and the Hungarian Academy of Sciences , Budapest , Hungary ; Swiss Institute of Bioinformatics , Lausanne , Switzerland.

HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10(-12)). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the 'intermediate phenotype' nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.01123DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3807812PMC
October 2013

Transcriptomic indices of fast and slow disease progression in two mouse models of amyotrophic lateral sclerosis.

Brain 2013 Nov 24;136(Pt 11):3305-32. Epub 2013 Sep 24.

1 Laboratory of Molecular Neurobiology, Department of Neuroscience, IRCCS - Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa, 19, 20156 Milan, Italy.

Amyotrophic lateral sclerosis is heterogeneous with high variability in the speed of progression even in cases with a defined genetic cause such as superoxide dismutase 1 (SOD1) mutations. We reported that SOD1(G93A) mice on distinct genetic backgrounds (C57 and 129Sv) show consistent phenotypic differences in speed of disease progression and life-span that are not explained by differences in human SOD1 transgene copy number or the burden of mutant SOD1 protein within the nervous system. We aimed to compare the gene expression profiles of motor neurons from these two SOD1(G93A) mouse strains to discover the molecular mechanisms contributing to the distinct phenotypes and to identify factors underlying fast and slow disease progression. Lumbar spinal motor neurons from the two SOD1(G93A) mouse strains were isolated by laser capture microdissection and transcriptome analysis was conducted at four stages of disease. We identified marked differences in the motor neuron transcriptome between the two mice strains at disease onset, with a dramatic reduction of gene expression in the rapidly progressive (129Sv-SOD1(G93A)) compared with the slowly progressing mutant SOD1 mice (C57-SOD1(G93A)) (1276 versus 346; Q-value ≤ 0.01). Gene ontology pathway analysis of the transcriptional profile from 129Sv-SOD1(G93A) mice showed marked downregulation of specific pathways involved in mitochondrial function, as well as predicted deficiencies in protein degradation and axonal transport mechanisms. In contrast, the transcriptional profile from C57-SOD1(G93A) mice with the more benign disease course, revealed strong gene enrichment relating to immune system processes compared with 129Sv-SOD1(G93A) mice. Motor neurons from the more benign mutant strain demonstrated striking complement activation, over-expressing genes normally involved in immune cell function. We validated through immunohistochemistry increased expression of the C3 complement subunit and major histocompatibility complex I within motor neurons. In addition, we demonstrated that motor neurons from the slowly progressing mice activate a series of genes with neuroprotective properties such as angiogenin and the nuclear factor (erythroid-derived 2)-like 2 transcriptional regulator. In contrast, the faster progressing mice show dramatically reduced expression at disease onset of cell pathways involved in neuroprotection. This study highlights a set of key gene and molecular pathway indices of fast or slow disease progression which may prove useful in identifying potential disease modifiers responsible for the heterogeneity of human amyotrophic lateral sclerosis and which may represent valid therapeutic targets for ameliorating the disease course in humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/brain/awt250DOI Listing
November 2013

Detecting regulatory gene-environment interactions with unmeasured environmental factors.

Bioinformatics 2013 Jun 4;29(11):1382-9. Epub 2013 Apr 4.

Department of Computer Science, University of Sheffield, Sheffield, UK.

Motivation: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits.

Results: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it is not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability.

Availability: and implementation: Software available at http://pmbio.github.io/envGPLVM/.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt148DOI Listing
June 2013

Unravelling the enigma of selective vulnerability in neurodegeneration: motor neurons resistant to degeneration in ALS show distinct gene expression characteristics and decreased susceptibility to excitotoxicity.

Acta Neuropathol 2013 Jan 13;125(1):95-109. Epub 2012 Nov 13.

Academic Neurology Unit, Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, 385A Glossop Road, Sheffield S10 2HQ, UK.

A consistent clinical feature of amyotrophic lateral sclerosis (ALS) is the sparing of eye movements and the function of external sphincters, with corresponding preservation of motor neurons in the brainstem oculomotor nuclei, and of Onuf's nucleus in the sacral spinal cord. Studying the differences in properties of neurons that are vulnerable and resistant to the disease process in ALS may provide insights into the mechanisms of neuronal degeneration, and identify targets for therapeutic manipulation. We used microarray analysis to determine the differences in gene expression between oculomotor and spinal motor neurons, isolated by laser capture microdissection from the midbrain and spinal cord of neurologically normal human controls. We compared these to transcriptional profiles of oculomotor nuclei and spinal cord from rat and mouse, obtained from the GEO omnibus database. We show that oculomotor neurons have a distinct transcriptional profile, with significant differential expression of 1,757 named genes (q < 0.001). Differentially expressed genes are enriched for the functional categories of synaptic transmission, ubiquitin-dependent proteolysis, mitochondrial function, transcriptional regulation, immune system functions, and the extracellular matrix. Marked differences are seen, across the three species, in genes with a function in synaptic transmission, including several glutamate and GABA receptor subunits. Using patch clamp recording in acute spinal and brainstem slices, we show that resistant oculomotor neurons show a reduced AMPA-mediated inward calcium current, and a higher GABA-mediated chloride current, than vulnerable spinal motor neurons. The findings suggest that reduced susceptibility to excitotoxicity, mediated in part through enhanced GABAergic transmission, is an important determinant of the relative resistance of oculomotor neurons to degeneration in ALS.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00401-012-1058-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3535376PMC
January 2013

Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies.

PLoS Comput Biol 2012 Jan 5;8(1):e1002330. Epub 2012 Jan 5.

Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, United Kingdom.

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1002330DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3252274PMC
January 2012