Publications by authors named "Jennifer Listgarten"

38 Publications

Rethinking drug design in the artificial intelligence era.

Nat Rev Drug Discov 2020 05 4;19(5):353-364. Epub 2019 Dec 4.

ETH Zurich, RETHINK, Department of Chemistry and Applied Biosciences, Zurich, Switzerland.

Artificial intelligence (AI) tools are increasingly being applied in drug discovery. While some protagonists point to vast opportunities potentially offered by such tools, others remain sceptical, waiting for a clear impact to be shown in drug discovery projects. The reality is probably somewhere in-between these extremes, yet it is clear that AI is providing new challenges not only for the scientists involved but also for the biopharma industry and its established processes for discovering and developing new medicines. This article presents the views of a diverse group of international experts on the 'grand challenges' in small-molecule drug discovery with AI and the approaches to address them.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41573-019-0050-3DOI Listing
May 2020

Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs.

Nat Biomed Eng 2018 Jan 10;2(1):38-47. Epub 2018 Jan 10.

Microsoft Research, Cambridge, MA, USA.

The CRISPR-Cas9 system provides unprecedented genome editing capabilities. However, off-target effects lead to sub-optimal usage and additionally are a bottleneck in the development of therapeutic uses. Herein, we introduce the first machine learning-based approach to off-target prediction, yielding a state-of-the-art model for CRISPR-Cas9 that outperforms all other guide design services. Our approach, Elevation, consists of two interdependent machine learning models-one for scoring individual guide-target pairs, and another which aggregates these guide-target scores into a single, overall summary guide score. Through systematic investigation, we demonstrate that Elevation performs substantially better than competing approaches on both tasks. Additionally, we are the first to systematically evaluate approaches on the guide summary score problem; we show that the most widely-used method performs no better than random at times, whereas Elevation consistently outperformed it, sometimes by an order of magnitude. We also introduce an evaluation method that balances errors between active and inactive guides, thereby encapsulating a range of practical use cases; Elevation is consistently superior to other methods across the entire range. Finally, because of the large scale and computational demands of off-target prediction, we have developed a cloud-based service for quick retrieval. This service provides end-to-end guide design by also incorporating our previously reported on-target model, Azimuth. (https://crispr.ml:please treat this web site as confidential until publication).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41551-017-0178-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037314PMC
January 2018

Orthologous CRISPR-Cas9 enzymes for combinatorial genetic screens.

Nat Biotechnol 2018 02 18;36(2):179-189. Epub 2017 Dec 18.

Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

Combinatorial genetic screening using CRISPR-Cas9 is a useful approach to uncover redundant genes and to explore complex gene networks. However, current methods suffer from interference between the single-guide RNAs (sgRNAs) and from limited gene targeting activity. To increase the efficiency of combinatorial screening, we employ orthogonal Cas9 enzymes from Staphylococcus aureus and Streptococcus pyogenes. We used machine learning to establish S. aureus Cas9 sgRNA design rules and paired S. aureus Cas9 with S. pyogenes Cas9 to achieve dual targeting in a high fraction of cells. We also developed a lentiviral vector and cloning strategy to generate high-complexity pooled dual-knockout libraries to identify synthetic lethal and buffering gene pairs across multiple cell types, including MAPK pathway genes and apoptotic genes. Our orthologous approach also enabled a screen combining gene knockouts with transcriptional activation, which revealed genetic interactions with TP53. The "Big Papi" (paired aureus and pyogenes for interactions) approach described here will be widely applicable for the study of combinatorial phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.4048DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5800952PMC
February 2018

Flexible Modeling of Genetic Effects on Function-Valued Traits.

J Comput Biol 2017 Jun 5;24(6):524-535. Epub 2017 Jan 5.

Microsoft Research , Cambridge, Massachusetts.

Genome-wide association studies commonly examine one trait at a time. Occasionally they examine several related traits with the hope of increasing power; in such a setting, the traits are not generally smoothly varying in any way such as time or space. However, for function-valued traits, the trait is often smoothly varying along the axis of interest, such as space or time. For instance, in the case of longitudinal traits such as growth curves, the axis of interest is time; for spatially varying traits such as chromatin accessibility, it would be position along the genome. Although there have been efforts to perform genome-wide association studies with such function-valued traits, the statistical approaches developed for this purpose often have limitations such as requiring the trait to behave linearly in time or space, or constraining the genetic effect itself to be constant or linear in time. Herein, we present a flexible model for this problem-the Partitioned Gaussian Process-which removes many such limitations and is especially effective as the number of time points increases. The theoretical basis of this model provides machinery for handling missing and unaligned function values such as would occur when not all individuals are measured at the same time points. Furthermore, we make use of algebraic refactorizations to substantially reduce the time complexity of our model beyond the naive implementation. Finally, we apply our approach and several others to synthetic data before closing, with some directions for improved modeling and statistical testing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2016.0174DOI Listing
June 2017

Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.

Nat Biotechnol 2016 Feb 18;34(2):184-191. Epub 2016 Jan 18.

Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

CRISPR-Cas9-based genetic screens are a powerful new tool in biology. By simply altering the sequence of the single-guide RNA (sgRNA), one can reprogram Cas9 to target different sites in the genome with relative ease, but the on-target activity and off-target effects of individual sgRNAs can vary widely. Here, we use recently devised sgRNA design rules to create human and mouse genome-wide libraries, perform positive and negative selection screens and observe that the use of these rules produced improved results. Additionally, we profile the off-target activity of thousands of sgRNAs and develop a metric to predict off-target sites. We incorporate these findings from large-scale, empirical data to improve our computational design rules and create optimized sgRNA libraries that maximize on-target activity and minimize off-target effects to enable more effective and efficient genetic screens and genome engineering.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3437DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744125PMC
February 2016

Personalized medicine: from genotypes, molecular phenotypes and the quantified self, towards improved medicine.

Pac Symp Biocomput 2015 :342-6

Icahn School of Medicine at Mount Sinai, 1425 Madison Ave., New York, USA.

Advances in molecular profiling and sensor technologies are expanding the scope of personalized medicine beyond genotypes, providing new opportunities for developing richer and more dynamic multi-scale models of individual health. Recent studies demonstrate the value of scoring high-dimensional microbiome, immune, and metabolic traits from individuals to inform personalized medicine. Efforts to integrate multiple dimensions of clinical and molecular data towards predictive multi-scale models of individual health and wellness are already underway. Improved methods for mining and discovery of clinical phenotypes from electronic medical records and technological developments in wearable sensor technologies present new opportunities for mapping and exploring the critical yet poorly characterized "phenome" and "envirome" dimensions of personalized medicine. There are ambitious new projects underway to collect multi-scale molecular, sensor, clinical, behavioral, and environmental data streams from large population cohorts longitudinally to enable more comprehensive and dynamic models of individual biology and personalized health. Personalized medicine stands to benefit from inclusion of rich new sources and dimensions of data. However, realizing these improvements in care relies upon novel informatics methodologies, tools, and systems to make full use of these data to advance both the science and translational applications of personalized medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5893135PMC
April 2016

Further improvements to linear mixed models for genome-wide association studies.

Sci Rep 2014 Nov 12;4:6874. Epub 2014 Nov 12.

eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep06874DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230738PMC
November 2014

Greater power and computational efficiency for kernel-based association testing of sets of genetic variants.

Bioinformatics 2014 Nov 29;30(22):3206-14. Epub 2014 Jul 29.

eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA.

Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods.

Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500.

Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/.

Contact: heckerma@microsoft.com

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu504DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4221116PMC
November 2014

Epigenome-wide association studies without the need for cell-type composition.

Nat Methods 2014 Mar 26;11(3):309-11. Epub 2014 Jan 26.

eScience Research Group, Microsoft Research, Los Angeles, California, USA.

In epigenome-wide association studies, cell-type composition often differs between cases and controls, yielding associations that simply tag cell type rather than reveal fundamental biology. Current solutions require actual or estimated cell-type composition--information not easily obtainable for many samples of interest. We propose a method, FaST-LMM-EWASher, that automatically corrects for cell-type composition without the need for explicit knowledge of it, and then validate our method by comparison with the state-of-the-art approach. Corresponding software is available from http://www.microsoft.com/science/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.2815DOI Listing
March 2014

A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control.

Elife 2013 Oct 29;2:e01123. Epub 2013 Oct 29.

School of Life Sciences , École Polytechnique Fédérale de Lausanne , Lausanne , Switzerland ; Institute of Microbiology , University Hospital and University of Lausanne , Lausanne , Switzerland ; Research Group of Theoretical Biology and Evolutionary Ecology , Eötvös Loránd University and the Hungarian Academy of Sciences , Budapest , Hungary ; Swiss Institute of Bioinformatics , Lausanne , Switzerland.

HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10(-12)). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the 'intermediate phenotype' nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.01123DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3807812PMC
October 2013

The benefits of selecting phenotype-specific variants for applications of mixed models in genomics.

Sci Rep 2013 ;3:1815

eScience Group, Microsoft Research, Los Angeles, CA 90024, United States.

Applications of linear mixed models (LMMs) to problems in genomics include phenotype prediction, correction for confounding in genome-wide association studies, estimation of narrow sense heritability, and testing sets of variants (e.g., rare variants) for association. In each of these applications, the LMM uses a genetic similarity matrix, which encodes the pairwise similarity between every two individuals in a cohort. Although ideally these similarities would be estimated using strictly variants relevant to the given phenotype, the identity of such variants is typically unknown. Consequently, relevant variants are excluded and irrelevant variants are included, both having deleterious effects. For each application of the LMM, we review known effects and describe new effects showing how variable selection can be used to mitigate them.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep01815DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3648840PMC
October 2013

A powerful and efficient set test for genetic markers that handles confounders.

Bioinformatics 2013 Jun 18;29(12):1526-33. Epub 2013 Apr 18.

eScience Group, Microsoft Research, Los Angeles, CA 90024, USA.

Motivation: Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power.

Results: We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn's disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis.

Availability: A Python-based library implementing our approach is available at http://mscompbio.codeplex.com.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt177DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3673214PMC
June 2013

The future of genome-based medicine.

Pac Symp Biocomput 2013 :456-8

University of Toronto, Donnelly Centre, 160 College Street, Toronto, ON M5S 3E1, Canada.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5894348PMC
December 2013

PERSONALIZED MEDICINE: FROM GENOTYPES AND MOLECULAR PHENOTYPES TOWARDS COMPUTED THERAPY.

Pac Symp Biocomput 2013 ;18:171-174

Max Planck Institutes Tübingen, 72076 Tübingen, Germany.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5894351PMC
January 2013

An exhaustive epistatic SNP association analysis on expanded Wellcome Trust data.

Sci Rep 2013 22;3:1099. Epub 2013 Jan 22.

Microsoft Research, Los Angeles, CA, USA.

We present an approach for genome-wide association analysis with improved power on the Wellcome Trust data consisting of seven common phenotypes and shared controls. We achieved improved power by expanding the control set to include other disease cohorts, multiple races, and closely related individuals. Within this setting, we conducted exhaustive univariate and epistatic interaction association analyses. Use of the expanded control set identified more known associations with Crohn's disease and potential new biology, including several plausible epistatic interactions in several diseases. Our work suggests that carefully combining data from large repositories could reveal many new biological insights through increased power. As a community resource, all results have been made available through an interactive web server.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep01099DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3551227PMC
September 2013

Patterns of methylation heritability in a genome-wide analysis of four brain regions.

Nucleic Acids Res 2013 Feb 8;41(4):2095-104. Epub 2013 Jan 8.

eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA 90024, USA.

DNA methylation has been implicated in a number of diseases and other phenotypes. It is, therefore, of interest to identify and understand the genetic determinants of methylation and epigenomic variation. We investigated the extent to which genetic variation in cis-DNA sequence explains variation in CpG dinucleotide methylation in publicly available data for four brain regions from unrelated individuals, finding that 3-4% of CpG loci assayed were heritable, with a mean estimated narrow-sense heritability of 30% over the heritable loci. Over all loci, the mean estimated heritability was 3%, as compared with a recent twin-based study reporting 18%. Heritable loci were enriched for open chromatin regions and binding sites of CTCF, an influential regulator of transcription and chromatin architecture. Additionally, heritable loci were proximal to genes enriched in several known pathways, suggesting a possible functional role for these loci. Our estimates of heritability are conservative, and we suspect that the number of identified heritable loci will increase as the methylome is assayed across a broader range of cell types and the density of the tested loci is increased. Finally, we show that the number of heritable loci depends on the window size parameter commonly used to identify candidate cis-acting single-nucleotide polymorphism variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1449DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575819PMC
February 2013

Co-operative additive effects between HLA alleles in control of HIV-1.

PLoS One 2012 19;7(10):e47799. Epub 2012 Oct 19.

Department of Paediatrics, University of Oxford, Oxford, United Kingdom.

Background: HLA class I genotype is a major determinant of the outcome of HIV infection, and the impact of certain alleles on HIV disease outcome is well studied. Recent studies have demonstrated that certain HLA class I alleles that are in linkage disequilibrium, such as HLA-A*74 and HLA-B*57, appear to function co-operatively to result in greater immune control of HIV than mediated by either single allele alone. We here investigate the extent to which HLA alleles--irrespective of linkage disequilibrium--function co-operatively.

Methodology/principal Findings: We here refined a computational approach to the analysis of >2000 subjects infected with C-clade HIV first to discern the individual effect of each allele on disease control, and second to identify pairs of alleles that mediate 'co-operative additive' effects, either to improve disease suppression or to contribute to immunological failure. We identified six pairs of HLA class I alleles that have a co-operative additive effect in mediating HIV disease control and four hazardous pairs of alleles that, occurring together, are predictive of worse disease outcomes (q<0.05 in each case). We developed a novel 'sharing score' to quantify the breadth of CD8+ T cell responses made by pairs of HLA alleles across the HIV proteome, and used this to demonstrate that successful viraemic suppression correlates with breadth of unique CD8+ T cell responses (p = 0.03).

Conclusions/significance: These results identify co-operative effects between HLA Class I alleles in the control of HIV-1 in an extended Southern African cohort, and underline complementarity and breadth of the CD8+ T cell targeting as one potential mechanism for this effect.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0047799PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3477121PMC
April 2013

Correlates of protective cellular immunity revealed by analysis of population-level immune escape pathways in HIV-1.

J Virol 2012 Dec 10;86(24):13202-16. Epub 2012 Oct 10.

Microsoft Research, Los Angeles, California, USA.

HLA class I-associated polymorphisms identified at the population level mark viral sites under immune pressure by individual HLA alleles. As such, analysis of their distribution, frequency, location, statistical strength, sequence conservation, and other properties offers a unique perspective from which to identify correlates of protective cellular immunity. We analyzed HLA-associated HIV-1 subtype B polymorphisms in 1,888 treatment-naïve, chronically infected individuals using phylogenetically informed methods and identified characteristics of HLA-associated immune pressures that differentiate protective and nonprotective alleles. Over 2,100 HLA-associated HIV-1 polymorphisms were identified, approximately one-third of which occurred inside or within 3 residues of an optimally defined cytotoxic T-lymphocyte (CTL) epitope. Differential CTL escape patterns between closely related HLA alleles were common and increased with greater evolutionary distance between allele group members. Among 9-mer epitopes, mutations at HLA-specific anchor residues represented the most frequently detected escape type: these occurred nearly 2-fold more frequently than expected by chance and were computationally predicted to reduce peptide-HLA binding nearly 10-fold on average. Characteristics associated with protective HLA alleles (defined using hazard ratios for progression to AIDS from natural history cohorts) included the potential to mount broad immune selection pressures across all HIV-1 proteins except Nef, the tendency to drive multisite and/or anchor residue escape mutations within known CTL epitopes, and the ability to strongly select mutations in conserved regions within HIV's structural and functional proteins. Thus, the factors defining protective cellular immune responses may be more complex than simply targeting conserved viral regions. The results provide new information to guide vaccine design and immunogenicity studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/JVI.01998-12DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3503140PMC
December 2012

Learning transcriptional regulatory relationships using sparse graphical models.

PLoS One 2012 7;7(5):e35762. Epub 2012 May 7.

Microsoft Research, Los Angeles, California, USA.

Understanding the organization and function of transcriptional regulatory networks by analyzing high-throughput gene expression profiles is a key problem in computational biology. The challenges in this work are 1) the lack of complete knowledge of the regulatory relationship between the regulators and the associated genes, 2) the potential for spurious associations due to confounding factors, and 3) the number of parameters to learn is usually larger than the number of available microarray experiments. We present a sparse (L1 regularized) graphical model to address these challenges. Our model incorporates known transcription factors and introduces hidden variables to represent possible unknown transcription and confounding factors. The expression level of a gene is modeled as a linear combination of the expression levels of known transcription factors and hidden factors. Using gene expression data covering 39,296 oligonucleotide probes from 1109 human liver samples, we demonstrate that our model better predicts out-of-sample data than a model with no hidden variables. We also show that some of the gene sets associated with hidden variables are strongly correlated with Gene Ontology categories. The software including source code is available at http://grnl1.codeplex.com.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0035762PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3346750PMC
September 2012

Widespread impact of HLA restriction on immune control and escape pathways of HIV-1.

J Virol 2012 May 29;86(9):5230-43. Epub 2012 Feb 29.

Microsoft Research, eScience Group, Los Angeles, California, USA.

The promiscuous presentation of epitopes by similar HLA class I alleles holds promise for a universal T-cell-based HIV-1 vaccine. However, in some instances, cytotoxic T lymphocytes (CTL) restricted by HLA alleles with similar or identical binding motifs are known to target epitopes at different frequencies, with different functional avidities and with different apparent clinical outcomes. Such differences may be illuminated by the association of similar HLA alleles with distinctive escape pathways. Using a novel computational method featuring phylogenetically corrected odds ratios, we systematically analyzed differential patterns of immune escape across all optimally defined epitopes in Gag, Pol, and Nef in 2,126 HIV-1 clade C-infected adults. Overall, we identified 301 polymorphisms in 90 epitopes associated with HLA alleles belonging to shared supertypes. We detected differential escape in 37 of 38 epitopes restricted by more than one allele, which included 278 instances of differential escape at the polymorphism level. The majority (66 to 97%) of these resulted from the selection of unique HLA-specific polymorphisms rather than differential epitope targeting rates, as confirmed by gamma interferon (IFN-γ) enzyme-linked immunosorbent spot assay (ELISPOT) data. Discordant associations between HLA alleles and viral load were frequently observed between allele pairs that selected for differential escape. Furthermore, the total number of associated polymorphisms strongly correlated with average viral load. These studies confirm that differential escape is a widespread phenomenon and may be the norm when two alleles present the same epitope. Given the clinical correlates of immune escape, such heterogeneity suggests that certain epitopes will lead to discordant outcomes if applied universally in a vaccine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/JVI.06728-11DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3347390PMC
May 2012

FaST linear mixed models for genome-wide association studies.

Nat Methods 2011 Sep 4;8(10):833-5. Epub 2011 Sep 4.

Microsoft Research, Los Angeles, California, USA.

We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals (http://mscompbio.codeplex.com/).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.1681DOI Listing
September 2011

HLA-A*7401-mediated control of HIV viremia is independent of its linkage disequilibrium with HLA-B*5703.

J Immunol 2011 May 15;186(10):5675-86. Epub 2011 Apr 15.

Department of Paediatrics, University of Oxford, Oxford OX1 3SY, United Kingdom.

The potential contribution of HLA-A alleles to viremic control in chronic HIV type 1 (HIV-1) infection has been relatively understudied compared with HLA-B. In these studies, we show that HLA-A*7401 is associated with favorable viremic control in extended southern African cohorts of >2100 C-clade-infected subjects. We present evidence that HLA-A*7401 operates an effect that is independent of HLA-B*5703, with which it is in linkage disequilibrium in some populations, to mediate lowered viremia. We describe a novel statistical approach to detecting additive effects between class I alleles in control of HIV-1 disease, highlighting improved viremic control in subjects with HLA-A*7401 combined with HLA-B*57. In common with HLA-B alleles that are associated with effective control of viremia, HLA-A*7401 presents highly targeted epitopes in several proteins, including Gag, Pol, Rev, and Nef, of which the Gag epitopes appear immunodominant. We identify eight novel putative HLA-A*7401-restricted epitopes, of which three have been defined to the optimal epitope. In common with HLA-B alleles linked with slow progression, viremic control through an HLA-A*7401-restricted response appears to be associated with the selection of escape mutants within Gag epitopes that reduce viral replicative capacity. These studies highlight the potentially important contribution of an HLA-A allele to immune control of HIV infection, which may have been concealed by a stronger effect mediated by an HLA-B allele with which it is in linkage disequilibrium. In addition, these studies identify a factor contributing to different HIV disease outcomes in individuals expressing HLA-B*5703.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4049/jimmunol.1003711DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3738002PMC
May 2011

Correction for hidden confounders in the genetic analysis of gene expression.

Proc Natl Acad Sci U S A 2010 Sep 1;107(38):16465-70. Epub 2010 Sep 1.

Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, USA.

Understanding the genetic underpinnings of disease is important for screening, treatment, drug development, and basic biological insight. One way of getting at such an understanding is to find out which parts of our DNA, such as single-nucleotide polymorphisms, affect particular intermediary processes such as gene expression. Naively, such associations can be identified using a simple statistical test on all paired combinations of genetic variants and gene transcripts. However, a wide variety of confounders lie hidden in the data, leading to both spurious associations and missed associations if not properly addressed. We present a statistical model that jointly corrects for two particular kinds of hidden structure--population structure (e.g., race, family-relatedness), and microarray expression artifacts (e.g., batch effects), when these confounders are unknown. Applying our method to both real and synthetic, human and mouse data, we demonstrate the need for such a joint correction of confounders, and also the disadvantages of other possible approaches based on those in the current literature. In particular, we show that our class of models has maximum power to detect eQTL on synthetic data, and has the best performance on a bronze standard applied to real data. Lastly, our software and the associations we found with it are available at http://www.microsoft.com/science.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1002425107DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2944732PMC
September 2010

Additive contribution of HLA class I alleles in the immune control of HIV-1 infection.

J Virol 2010 Oct 21;84(19):9879-88. Epub 2010 Jul 21.

Weatherall Institute of Molecular Medicine, University of Oxford, UK.

Previous studies have identified a central role for HLA-B alleles in influencing control of HIV infection. An alternative possibility is that a small number of HLA-B alleles may have a very strong impact on HIV disease outcome, dominating the contribution of other HLA alleles. Here, we find that even following the exclusion of subjects expressing any of the HLA-B class I alleles (B*57, B*58, and B*18) identified to have the strongest influence on control, the dominant impact of HLA-B alleles on virus set point and absolute CD4 count variation remains significant. However, we also find that the influence of HLA on HIV control in this C-clade-infected cohort from South Africa extends beyond HLA-B as HLA-Cw type remains a significant predictor of virus and CD4 count following exclusion of the strongest HLA-B associations. Furthermore, there is evidence of interdependent protective effects of the HLA-Cw*0401-B*8101, HLA-Cw*1203-B*3910, and HLA-A*7401-B*5703 haplotypes that cannot be explained solely by linkage to a protective HLA-B allele. Analysis of individuals expressing both protective and detrimental alleles shows that even the strongest HLA alleles appear to have an additive rather than dominant effect on HIV control at the individual level. Finally, weak but significant frequency-dependent effects in this cohort can be detected only by looking at an individual's combined HLA allele frequencies. Taken together, these data suggest that although individual HLA alleles, particularly HLA-B, can have a strong impact, HIV control overall is likely to be influenced by the additive effect of some or all of the other HLA alleles present.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/JVI.00320-10DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2937780PMC
October 2010

Rare HLA drive additional HIV evolution compared to more frequent alleles.

AIDS Res Hum Retroviruses 2009 Mar;25(3):297-303

Department of Microbiology, University of Washington, Seattle, Washington 98103, USA.

HIV-1 can evolve HLA-specific escape variants in response to HLA-mediated cellular immunity. HLA alleles that are common in the host population may increase the frequency of such escape variants at the population level. When loss of viral fitness is caused by immune escape variation, these variants may revert upon infection of a new host who does not have the corresponding HLA allele. Furthermore, additional escape variants may appear in response to the nonconcordant HLA alleles. Because individuals with rare HLA alleles are less likely to be infected by a partner with concordant HLA alleles, viral populations infecting hosts with rare HLA alleles may undergo a greater amount of evolution than those infecting hosts with common alleles due to the loss of preexisting escape variants followed by new immune escape. This hypothesis was evaluated using maximum likelihood phylogenetic trees of each gene from 272 full-length HIV-1 sequences. Recent viral evolution, as measured by the external branch length, was found to be inversely associated with HLA frequency in nef (p < 0.02), env (p < 0.03), and pol (p < or = 0.05), suggesting that rare HLA alleles provide a disproportionate force driving viral evolution compared to common alleles, likely due to the loss of preexisting escape variants during early stages postinfection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/aid.2008.0208DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2693345PMC
March 2009

Statistical resolution of ambiguous HLA typing data.

PLoS Comput Biol 2008 Feb 29;4(2):e1000016. Epub 2008 Feb 29.

Microsoft Research, Redmond, Washington, United States of America.

High-resolution HLA typing plays a central role in many areas of immunology, such as in identifying immunogenetic risk factors for disease, in studying how the genomes of pathogens evolve in response to immune selection pressures, and also in vaccine design, where identification of HLA-restricted epitopes may be used to guide the selection of vaccine immunogens. Perhaps one of the most immediate applications is in direct medical decisions concerning the matching of stem cell transplant donors to unrelated recipients. However, high-resolution HLA typing is frequently unavailable due to its high cost or the inability to re-type historical data. In this paper, we introduce and evaluate a method for statistical, in silico refinement of ambiguous and/or low-resolution HLA data. Our method, which requires an independent, high-resolution training data set drawn from the same population as the data to be refined, uses linkage disequilibrium in HLA haplotypes as well as four-digit allele frequency data to probabilistically refine HLA typings. Central to our approach is the use of haplotype inference. We introduce new methodology to this area, improving upon the Expectation-Maximization (EM)-based approaches currently used within the HLA community. Our improvements are achieved by using a parsimonious parameterization for haplotype distributions and by smoothing the maximum likelihood (ML) solution. These improvements make it possible to scale the refinement to a larger number of alleles and loci in a more computationally efficient and stable manner. We also show how to augment our method in order to incorporate ethnicity information (as HLA allele distributions vary widely according to race/ethnicity as well as geographic area), and demonstrate the potential utility of this experimentally. A tool based on our approach is freely available for research purposes at http://microsoft.com/science.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1000016DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2289775PMC
February 2008

A statistical framework for modeling HLA-dependent T cell response data.

PLoS Comput Biol 2007 Oct;3(10):1879-86

Microsoft Research, Redmond, Washington, USA.

The identification of T cell epitopes and their HLA (human leukocyte antigen) restrictions is important for applications such as the design of cellular vaccines for HIV. Traditional methods for such identification are costly and time-consuming. Recently, a more expeditious laboratory technique using ELISpot assays has been developed that allows for rapid screening of specific responses. However, this assay does not directly provide information concerning the HLA restriction of a response, a critical piece of information for vaccine design. Thus, we introduce, apply, and validate a statistical model for identifying HLA-restricted epitopes from ELISpot data. By looking at patterns across a broad range of donors, in conjunction with our statistical model, we can determine (probabilistically) which of the HLA alleles are likely to be responsible for the observed reactivities. Additionally, we can provide a good estimate of the number of false positives generated by our analysis (i.e., the false discovery rate). This model allows us to learn about new HLA-restricted epitopes from ELISpot data in an efficient, cost-effective, and high-throughput manner. We applied our approach to data from donors infected with HIV and identified many potential new HLA restrictions. Among 134 such predictions, six were confirmed in the lab and the remainder could not be ruled as invalid. These results shed light on the extent of HLA class I promiscuity, which has significant implications for the understanding of HLA class I antigen presentation and vaccine development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.0030188DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2014793PMC
October 2007