Publications by authors named "Mauno Vihinen"

164 Publications

Circulating Plasma microRNAs In Systemic Sclerosis-Associated Pulmonary Arterial Hypertension.

Rheumatology (Oxford) 2021 Mar 30. Epub 2021 Mar 30.

Department of Clinical Sciences Lund, Rheumatology, Lund University and Skåne University Hospital, SE-22185, Sweden, Lund.

Objectives: Systemic sclerosis-associated pulmonary arterial hypertension (SSc-APAH) is a late but devastating complication of systemic sclerosis (SSc). Early identification of SSc-APAH may improve survival. We examined the role of circulating micro-RNAs (miRNAs) in SSc-APAH.

Methods: Using quantitative RT-PCR the abundance of mature miRNAs in plasma were determined in 85 female patients with anti-centromere antibody positive limited cutaneous SSc. Twenty-two of the patients had SSc-APAH. Sixty-three SSc controls without PAH were matched for disease duration. Forty-six selected miRNA plasma levels were correlated with clinical data. Longitudinal samples were analysed from 14 SSc-APAH and 27 SSc patients.

Results: The disease duration was 12 years for the SSc-APAH patients and 12.7 years for the SSc controls. Plasma expression levels of 11 miRNAs were lower in patients with SSc-APAH. Four miRNAs displayed higher plasma levels in SSc-APAH patients compared with SSc controls. There was significant difference between groups for miR-20a-5p and miR-203a-3p when correcting for multiple comparisons (p= 0.002 for both). Receiver operating characteristics curve showed |AUC| = 0.69-0.83 for miR-21-5p and miR-20a-5p or their combination. miR-20a-5p and miR-203a-3p correlated inversely with NT-pro-BNP levels (r = -0.42 and -0.47). Mixed effect model analysis could not identify any miRNAs as predictor of PAH development. However, miR-20a-5p plasma levels were lower in the longitudinal samples of SSc-APAH patients than in the SSc controls.

Conclusions: Our study links expression levels of the circulating plasma miRNAs, especially miR-20a-5p and miR-203a-3p, to the occurrence of SSc-APAH in female patients with anti-centromere antibody positive limited cutaneous SSc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/rheumatology/keab300DOI Listing
March 2021

TNF-α and α-synuclein fibrils differently regulate human astrocyte immune reactivity and impair mitochondrial respiration.

Cell Rep 2021 Mar;34(12):108895

Cell Stem Cell Laboratory for CNS Disease Modeling, Department of Experimental Medical Science, BMC D10, Lund University, 22184 Lund, Sweden; MultiPark and the Lund Stem Cell Center, Lund University, 22184 Lund, Sweden. Electronic address:

Here, we examine the cellular changes triggered by tumor necrosis factor alpha (TNF-α) and different alpha-synuclein (αSYN) species in astrocytes derived from induced pluripotent stem cells. Human astrocytes treated with TNF-α display a strong reactive pro-inflammatory phenotype with upregulation of pro-inflammatory gene networks, activation of the nuclear factor κB (NF-κB) pathway, and release of pro-inflammatory cytokines, whereas those treated with high-molecular-weight αSYN fibrils acquire a reactive antigen (cross)-presenting phenotype with upregulation of major histocompatibility complex (MHC) genes and increased human leukocyte antigen (HLA) molecules at the cell surface. Surprisingly, the cell surface location of MHC proteins is abrogated by larger F110 fibrillar polymorphs, despite the upregulation of MHC genes. Interestingly, TNF-α and αSYN fibrils compete to drive the astrocyte immune reactive response. The astrocyte immune responses are accompanied by an impaired mitochondrial respiration, which is exacerbated in Parkinson's disease (PD) astrocytes. Our data provide evidence for astrocytic involvement in PD pathogenesis and reveal their complex immune reactive responses to exogenous stressors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2021.108895DOI Listing
March 2021

BTK gatekeeper residue variation combined with cysteine 481 substitution causes super-resistance to irreversible inhibitors acalabrutinib, ibrutinib and zanubrutinib.

Leukemia 2021 May 1;35(5):1317-1329. Epub 2021 Feb 1.

Department of Laboratory Medicine, Clinical Research Center, Karolinska Institutet, Karolinska University Hospital Huddinge, SE-141 86, Huddinge, Sweden.

Irreversible inhibitors of Bruton tyrosine kinase (BTK), pioneered by ibrutinib, have become breakthrough drugs in the treatment of leukemias and lymphomas. Resistance variants (mutations) occur, but in contrast to those identified for many other tyrosine kinase inhibitors, they affect less frequently the "gatekeeper" residue in the catalytic domain. In this study we carried out variation scanning by creating 11 substitutions at the gatekeeper amino acid, threonine 474 (T474). These variants were subsequently combined with replacement of the cysteine 481 residue to which irreversible inhibitors, such as ibrutinib, acalabrutinib and zanubrutinib, bind. We found that certain double mutants, such as threonine 474 to isoleucine (T474I) or methionine (T474M) combined with catalytically active cysteine 481 to serine (C481S), are insensitive to ≥16-fold the pharmacological serum concentration, and therefore defined as super-resistant to irreversible inhibitors. Conversely, reversible inhibitors showed a variable pattern, from resistance to no resistance, collectively demonstrating the structural constraints for different classes of inhibitors, which may affect their clinical application.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41375-021-01123-6DOI Listing
May 2021

Functional effects of protein variants.

Authors:
Mauno Vihinen

Biochimie 2021 Jan 23;180:104-120. Epub 2020 Oct 23.

Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184, Lund, Sweden. Electronic address:

Genetic and other variations frequently affect protein functions. Scientific articles can contain confusing descriptions about which function or property is affected, and in many cases the statements are pure speculation without any experimental evidence. To clarify functional effects of protein variations of genetic or non-genetic origin, a systematic conceptualisation and framework are introduced. This framework describes protein functional effects on abundance, activity, specificity and affinity, along with countermeasures, which allow cells, tissues and organisms to tolerate, avoid, repair, attenuate or resist (TARAR) the effects. Effects on abundance discussed include gene dosage, restricted expression, mis-localisation and degradation. Enzymopathies, effects on kinetics, allostery and regulation of protein activity are subtopics for the effects of variants on activity. Variation outcomes on specificity and affinity comprise promiscuity, specificity, affinity and moonlighting. TARAR mechanisms redress variations with active and passive processes including chaperones, redundancy, robustness, canalisation and metabolic and signalling rewiring. A framework for pragmatic protein function analysis and presentation is introduced. All of the mechanisms and effects are described along with representative examples, most often in relation to diseases. In addition, protein function is discussed from evolutionary point of view. Application of the presented framework facilitates unambiguous, detailed and specific description of functional effects and their systematic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.biochi.2020.10.009DOI Listing
January 2021

Poikilosis - pervasive biological variation.

Authors:
Mauno Vihinen

F1000Res 2020 12;9:602. Epub 2020 Jun 12.

Department of Experimental Medical Science, Lund University, Lund, 22184, Sweden.

Biological systems are dynamic and display heterogeneity at all levels. Ubiquitous heterogeneity, here called for poikilosis, is an integral and important property of organisms and in molecules, systems and processes within them. Traditionally, heterogeneity in biology and experiments has been considered as unwanted noise, here poikilosis is shown to be the normal state. Acceptable variation ranges are called as lagom. Non-lagom, variations that are too extensive, have negative effects, which influence interconnected levels and once the variation is large enough cause a disease and can lead even to death. Poikilosis has numerous applications and consequences e.g. for how to design, analyze and report experiments, how to develop and apply prediction and modelling methods, and in diagnosis and treatment of diseases. Poikilosis-aware new and practical definitions are provided for life, death, senescence, disease, and lagom. Poikilosis is the first new unifying theory in biology since evolution and should be considered in every scientific study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.24173.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7463298PMC
April 2021

Systematics for types and effects of RNA variations.

Authors:
Mauno Vihinen

RNA Biol 2021 04 20;18(4):481-498. Epub 2020 Sep 20.

Department of Experimental Medical Science, Lund University, Lund, Sweden.

Systematics is described for annotation of variations in RNA molecules. The conceptual framework is part of Variation Ontology (VariO) and facilitates depiction of types of variations, their functional and structural effects and other consequences in any RNA molecule in any organism. There are more than 150 RNA related VariO terms in seven levels, which can be further combined to generate even more complicated and detailed annotations. The terms are described together with examples, usually for variations and effects in human and in diseases. RNA variation type has two subcategories: variation classification and origin with subterms. Altogether six terms are available for function description. Several terms are available for affected RNA properties. The ontology contains also terms for structural description for affected RNA type, post-transcriptional RNA modifications, secondary and tertiary structure effects and RNA sugar variations. Together with the DNA and protein concepts and annotations, RNA terms allow comprehensive description of variations of genetic and non-genetic origin at all possible levels. The VariO annotations are readable both for humans and computer programs for advanced data integration and mining.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/15476286.2020.1817266DOI Listing
April 2021

Strategy for Disease Diagnosis, Progression Prediction, Risk Group Stratification and Treatment-Case of COVID-19.

Authors:
Mauno Vihinen

Front Med (Lausanne) 2020 16;7:294. Epub 2020 Jun 16.

Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden.

A novel strategy is presented for reliable diagnosis and progression prediction of diseases with special attention to COVID-19 pandemic. A plan is presented for how the model can be implemented worldwide in healthcare and how novel treatments and targets can be detected. The idea is based on poikilosis, pervasive heterogeneity, and variation at all levels, systems, and mechanisms. Poikilosis in diseases can be taken into account in pathogenicity model, which is based on distribution of three independent condition measures-extent, modulation, and severity. Pathogenicity model is a population or cohort-based description of disease components. Evidence-based thresholds can be applied to the pathogenicity model and used for diagnosis as well as for early detection of patients in risk of developing the most severe forms of the disease. Analysis of patients with differential course of disease can help in detecting biomarkers of diagnostic and prognostic significance. A practical and feasible plan is presented how the concepts can be implemented in practice. Collaboration of many actors, including the World Health Organization and national health authorities, will be essential for success.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmed.2020.00294DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7308420PMC
June 2020

Guidelines for systematic reporting of sequence alignments.

Authors:
Mauno Vihinen

Biol Methods Protoc 2020 30;5(1):bpaa001. Epub 2020 Jan 30.

Department of Experimental Medical Science, Lund University, Lund, Sweden.

Bioinformatics methods are increasingly needed and used to analyze and interpret extensive datasets many of which are produced by diverse high-throughput technologies. Unfortunately, it is quite common that published articles do not contain sufficient information to allow the reader to fully comprehend and repeat computational and other studies. Guidelines were developed for reporting studies and results from sequence alignment. Brief and concise checklist of required data items was compiled making it easy to provide necessary details. Implementation of the guidelines requires similar meticulous attitude toward details as other parts of publications. If the journal does not allow reporting full details in the main article, it can be provided in supplementary material. It is important to make the alignments available. Systematic and detailed description of bioinformatics analyses adds to the value of papers and makes it easier for the scientific community to evaluate, understand, verify, and extend the published articles and their results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomethods/bpaa001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994045PMC
January 2020

Problems in variation interpretation guidelines and in their implementation in computational tools.

Authors:
Mauno Vihinen

Mol Genet Genomic Med 2020 09 11;8(9):e1206. Epub 2020 Mar 11.

Department of Experimental Medical Science, Lund University, Lund, Sweden.

Background: ACMG/AMP and AMP/ASCO/CAP have released guidelines for variation interpretation, and ESHG for diagnostic sequencing. These guidelines contain recommendations including the use of computational prediction methods. The guidelines per se and the way they are implemented cause some problems.

Methods: Logical reasoning based on domain knowledge.

Results: According to the guidelines, several methods have to be used and they have to agree. This means that the methods with the poorest performance overrule the better ones. The choice of the prediction method(s) should be made by experts  based on systematic benchmarking studies reporting all the relevant performance measures. Currently variation interpretation methods have been applied mainly to amino acid substitutions and splice site variants; however, predictors for some other types of variations are available and there will be tools for new application areas in the near future. Common problems in prediction method usage are discussed. The number of features used for method training or the number of variation types predicted by a tool are not indicators of method performance. Many published gene, protein or disease-specific benchmark studies suffer from too small dataset rendering the results useless. In the case of binary predictors, equal number of positive and negative cases is beneficial for training, the imbalance has to be corrected for performance assessment. Predictors cannot be better than the data they are based on and used for training and testing. Minor allele frequency (MAF) can help to detect likely benign cases, but the recommended MAF threshold is apparently too high. The fact that many rare variants are disease-causing or -related does not mean that rare variants in general would be harmful. How large a portion of the tested variants a tool can predict (coverage) is not a quality measure.

Conclusion: Methods used for variation interpretation have to be carefully selected. It should be possible to use only one predictor, with proven good performance or a limited number of complementary predictors with state-of-the-art performance. Bear in mind that diseases and pathogenicity have a continuum and variants are not dichotomic i.e. either pathogenic or benign, either.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/mgg3.1206DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7507483PMC
September 2020

Variation benchmark datasets: update, criteria, quality and applications.

Database (Oxford) 2020 01;2020

Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden.

Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baz117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997940PMC
January 2020

ProTstab - predictor for cellular protein stability.

BMC Genomics 2019 Nov 4;20(1):804. Epub 2019 Nov 4.

Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden.

Background: Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples.

Results: We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities.

Conclusions: The Pearson's correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-019-6138-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6830000PMC
November 2019

Checklist for gene/disease-specific variation database curators to enable ethical data management.

Hum Mutat 2019 10 17;40(10):1634-1640. Epub 2019 Aug 17.

Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden.

Databases with variant and phenotype information are essential for advancing research and improving the health and welfare of individuals. These resources require data to be collected, curated, and shared among relevant specialties to maximize impact. The increasing generation of data which must be shared both nationally and globally for maximal effect presents important ethical and privacy concerns. Database curators need to ensure that their work conform to acceptable ethical standards. A Working Group of the Human Variome Project had the task of updating and streamlining ethical guidelines for locus-specific/gene variant database curators. In this article, we present practical and achievable steps which should assist database curators in carrying out their responsibilities within acceptable ethical norms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23881DOI Listing
October 2019

Benchmarking subcellular localization and variant tolerance predictors on membrane proteins.

BMC Genomics 2019 Jul 16;20(Suppl 8):547. Epub 2019 Jul 16.

Department of Experimental Medical Science, BMC B13, Lund University, SE-22184, Lund, Sweden.

Background: Membrane proteins constitute up to 30% of the human proteome. These proteins have special properties because the transmembrane segments are embedded into lipid bilayer while extramembranous parts are in different environments. Membrane proteins have several functions and are involved in numerous diseases. A large number of prediction methods have been introduced to predict protein subcellular localization as well as the tolerance or pathogenicity of amino acid substitutions.

Results: We tested the performance of 22 tolerance predictors by collecting information on membrane proteins and variants in them. The analysis indicated that the best tools had similar prediction performance on transmembrane, inside and outside regions of transmembrane proteins and comparable to overall prediction performances for all types of proteins. PON-P2 had the highest performance followed by REVEL, MetaSVM and VEST3. Further, we tested with the high quality dataset also the performance of seven subcellular localization predictors on membrane proteins. We assessed separately the performance for single pass and multi pass membrane proteins. Predictions for multi pass proteins were more reliable than those for single pass proteins.

Conclusions: The predictors for variant effects had better performance than subcellular localization tools. The best tolerance predictors are highly reliable. As there are large differences in the performances of tools, end-users have to be cautious in method selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-019-5865-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6631444PMC
July 2019

Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants.

Hum Mutat 2019 09 3;40(9):1530-1545. Epub 2019 Sep 3.

Institute of Medical Technology, University of Tampere, Tampere, Finland.

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23868DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7325732PMC
September 2019

FGF family members differentially regulate maturation and proliferation of stem cell-derived astrocytes.

Sci Rep 2019 07 3;9(1):9610. Epub 2019 Jul 3.

Department of Experimental Medical Science, BMC D10, Faculty of Medicine, Lund University, SE-22184, Lund, Sweden.

The glutamate transporter 1 (GLT1) is upregulated during astrocyte development and maturation in vivo and is vital for astrocyte function. Yet it is expressed at low levels by most cultured astrocytes. We previously showed that maturation of human and mouse stem cell-derived astrocytes - including functional glutamate uptake - could be enhanced by fibroblast growth factor (FGF)1 or FGF2. Here, we examined the specificity and mechanism of action of FGF2 and other FGF family members, as well as neurotrophic and differentiation factors, on mouse embryonic stem cell-derived astrocytes. We found that some FGFs - including FGF2, strongly increased GLT1 expression and enhanced astrocyte proliferation, while others (FGF16 and FGF18) mainly affected maturation. Interestingly, BMP4 increased astrocytic GFAP expression, and BMP4-treated astrocytes failed to promote the survival of motor neurons in vitro. Whole transcriptome analysis showed that FGF2 treatment regulated multiple genes linked to cell division, and that the mRNA encoding GLT1 was one of the most strongly upregulated of all astrocyte canonical markers. Since GLT1 is expressed at reduced levels in many neurodegenerative diseases, activation of this pathway is of potential therapeutic interest. Furthermore, treatment with FGFs provides a robust means for expansion of functionally mature stem cell-derived astrocytes for preclinical investigation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-46110-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6610107PMC
July 2019

How good are pathogenicity predictors in detecting benign variants?

PLoS Comput Biol 2019 02 11;15(2):e1006481. Epub 2019 Feb 11.

Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, Lund, Sweden.

Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1006481DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386394PMC
February 2019

Systematics for types and effects of DNA variations.

Authors:
Mauno Vihinen

BMC Genomics 2018 Dec 28;19(1):974. Epub 2018 Dec 28.

Department of Experimental Medical Science, Lund University, BMC B13, SE-22184, Lund, Sweden.

Background: Numerous different types of variations can occur in DNA and have diverse effects and consequences. The Variation Ontology (VariO) was developed for systematic descriptions of variations and their effects at DNA, RNA and protein levels.

Results: VariO use and terms for DNA variations are described in depth. VariO provides systematic names for variation types and detailed descriptions for changes in DNA function, structure and properties. The principles of VariO are presented along with examples from published articles or databases, most often in relation to human diseases. VariO terms describe local DNA changes, chromosome number and structure variants, chromatin alterations, as well as genomic changes, whether of genetic or non-genetic origin.

Conclusions: DNA variation systematics facilitates unambiguous descriptions of variations and their effects and further reuse and integration of data from different sources by both human and computers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-5262-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309100PMC
December 2018

Representativeness of variation benchmark datasets.

BMC Bioinformatics 2018 Nov 29;19(1):461. Epub 2018 Nov 29.

Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84, Lund, Sweden.

Background: Benchmark datasets are essential for both method development and performance assessment. These datasets have numerous requirements, representativeness being one. In the case of variant tolerance/pathogenicity prediction, representativeness means that the dataset covers the space of variations and their effects.

Results: We performed the first analysis of the representativeness of variation benchmark datasets. We used statistical approaches to investigate how proteins in the benchmark datasets were representative for the entire human protein universe. We investigated the distributions of variants in chromosomes, protein structures, CATH domains and classes, Pfam protein families, Enzyme Commission (EC) classifications and Gene Ontology annotations in 24 datasets that have been used for training and testing variant tolerance prediction methods. All the datasets were available in VariBench or VariSNP databases. We tested also whether the pathogenic variant datasets contained neutral variants defined as those that have high minor allele frequency in the ExAC database. The distributions of variants over the chromosomes and proteins varied greatly between the datasets.

Conclusions: None of the datasets was found to be well representative. Many of the tested datasets had quite good coverage of the different protein characteristics. Dataset size correlates to representativeness but only weakly to the performance of methods trained on them. The results imply that dataset representativeness is an important factor and should be taken into account in predictor development and testing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2478-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6267811PMC
November 2018

Pan-cancer analysis of neoepitopes.

Sci Rep 2018 08 24;8(1):12735. Epub 2018 Aug 24.

Department of Experimental Medical Science, BMC B13, Lund University, SE-22184, Lund, Sweden.

Somatic variations are frequent and important drivers in cancers. Amino acid substitutions can yield neoantigens that are detected by the immune system. Neoantigens can lead to immune response and tumor rejection. Although neoantigen load and occurrence have been widely studied, a detailed pan-cancer analysis of the occurrence and characterization of neoepitopes is missing. We investigated the proteome-wide amino acid substitutions in 8-, 9-, 10-, and 11-mer peptides in 30 cancer types with the NetMHC 4.0 software. 11,316,078 (0.24%) of the predicted 8-, 9-, 10-, and 11-mer peptides were highly likely neoepitope candidates and were derived from 95.44% of human proteins. Binding affinity to MHC molecules is just one of the many epitope features. The most likely epitopes are those which are detected by several MHCs and of several peptide lengths. 9-mer peptides are the most common among the high binding neoantigens. 0.17% of all variants yield more than 100 neoepitopes and are considered as the best candidates for any application. Amino acid distributions indicate that variants at all positions in neoepitopes of any length are, on average, more hydrophobic than the wild-type residues. We characterized properties of neoepitopes in 30 cancer types and estimated the likely numbers of tumor-derived epitopes that could induce an immune response. We found that amino acid distributions, at all positions in neoepitopes of all lengths, contain more hydrophobic residues than the wild-type sequences implying that the hydropathy nature of neoepitopes is an important property. The neoepitope characteristics can be employed for various applications including targeted cancer vaccine development for precision medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-30724-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109115PMC
August 2018

Simulation of the Dynamics of Primary Immunodeficiencies in B Cells.

Front Immunol 2018 2;9:1785. Epub 2018 Aug 2.

Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden.

Primary immunodeficiencies (PIDs) are a group of over 300 hereditary, heterogeneous, and mainly rare disorders that affect the immune system. Various aspects of immune system and PID proteins and genes have been investigated and facilitate systems biological studies of effects of PIDs on B cell physiology and response. We reconstructed a B cell network model based on data for the core B cell receptor activation and response processes and performed semi-quantitative dynamic simulations for normal and B cell PID failure modes. The results for several knockout simulations correspond to previously reported molecular studies and reveal novel mechanisms for PIDs. The simulations for CD21, CD40, LYN, MS4A1, ORAI1, PLCG2, PTPRC, and STIM1 indicated profound changes to major transcription factor signaling and to the network. Significant effects were observed also in the BCL10, BLNK, BTK, loss-of-function CARD11, IKKB, MALT1, and NEMO, simulations whereas only minor effects were detected for PIDs that are caused by constitutively active proteins (PI3K, gain-of-function CARD11, KRAS, and NFKBIA). This study revealed the underlying dynamics of PID diseases, confirms previous observations, and identifies novel candidates for PID diagnostics and therapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fimmu.2018.01785DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6082931PMC
October 2019

NDDVD: an integrated and manually curated Neurodegenerative Diseases Variation Database.

Database (Oxford) 2018 01;2018

Center for Systems Biology, Soochow University, No1. Shizi Street, Suzhou, Jiangsu 215006, China.

Url: : http://bioinf.suda.edu.cn/NDDvarbase/LOVDv.3.0.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bay018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5841369PMC
January 2018

PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality.

Int J Mol Sci 2018 Mar 28;19(4). Epub 2018 Mar 28.

Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden.

Several methods have been developed to predict effects of amino acid substitutions on protein stability. Benchmark datasets are essential for method training and testing and have numerous requirements including that the data is representative for the investigated phenomenon. Available machine learning algorithms for variant stability have all been trained with ProTherm data. We noticed a number of issues with the contents, quality and relevance of the database. There were errors, but also features that had not been clearly communicated. Consequently, all machine learning variant stability predictors have been trained on biased and incorrect data. We obtained a corrected dataset and trained a random forests-based tool, PON-tstab, applicable to variants in any organism. Our results highlight the importance of the benchmark quality, suitability and appropriateness. Predictions are provided for three categories: stability decreasing, increasing and those not affecting stability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms19041009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5979465PMC
March 2018

Mu transpososome activity-profiling yields hyperactive MuA variants for highly efficient genetic and genome engineering.

Nucleic Acids Res 2018 05;46(9):4649-4661

Division of Genetics and Physiology, Department of Biology, FI-20014 University of Turku, Turku, Finland.

The phage Mu DNA transposition system provides a versatile species non-specific tool for molecular biology, genetic engineering and genome modification applications. Mu transposition is catalyzed by MuA transposase, with DNA cleavage and integration reactions ultimately attaching the transposon DNA to target DNA. To improve the activity of the Mu DNA transposition machinery, we mutagenized MuA protein and screened for hyperactivity-causing substitutions using an in vivo assay. The individual activity-enhancing substitutions were mapped onto the MuA-DNA complex structure, containing a tetramer of MuA transposase, two Mu end segments and a target DNA. This analysis, combined with the varying effect of the mutations in different assays, implied that the mutations exert their effects in several ways, including optimizing protein-protein and protein-DNA contacts. Based on these insights, we engineered highly hyperactive versions of MuA, by combining several synergistically acting substitutions located in different subdomains of the protein. Purified hyperactive MuA variants are now ready for use as second-generation tools in a variety of Mu-based DNA transposition applications. These variants will also widen the scope of Mu-based gene transfer technologies toward medical applications such as human gene therapy. Moreover, the work provides a platform for further design of custom transposases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1281DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961161PMC
May 2018

PON-SC - program for identifying steric clashes caused by amino acid substitutions.

BMC Bioinformatics 2017 Nov 29;18(1):531. Epub 2017 Nov 29.

Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-22 184, Lund, Sweden.

Background: Amino acid substitutions due to DNA nucleotide replacements are frequently disease-causing because of affecting functionally important sites. If the substituting amino acid does not fit into the protein, it causes structural alterations that are often harmful. Clashes of amino acids cause local or global structural changes. Testing structural compatibility of variations has been difficult due to the lack of a dedicated method that could handle vast amounts of variation data produced by next generation sequencing technologies.

Results: We developed a method, PON-SC, for detecting protein structural clashes due to amino acid substitutions. The method utilizes side chain rotamer library and tests whether any of the common rotamers can be fitted into the protein structure. The tool was tested both with variants that cause and do not cause clashes and found to have accuracy of 0.71 over five test datasets.

Conclusions: We developed a fast method for residue side chain clash detection. The method provides in addition to the prediction also visualization of the variant in three dimensional structure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-017-1947-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5707825PMC
November 2017

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges.

Hum Mutat 2017 09 7;38(9):1182-1192. Epub 2017 Jul 7.

Department of Information Engineering, University of Padova, Padova, Italy.

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23280DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5600620PMC
September 2017

Simulation of the dynamics of primary immunodeficiencies in CD4+ T-cells.

PLoS One 2017 27;12(4):e0176500. Epub 2017 Apr 27.

Department of Experimental Medical Science, Lund University, Lund, Sweden.

Primary immunodeficiencies (PIDs) form a large and heterogeneous group of mainly rare disorders that affect the immune system. T-cell deficiencies account for about one-tenth of PIDs, most of them being monogenic. Apart from genetic and clinical information, lots of other data are available for PID proteins and genes, including functions and interactions. Thus, it is possible to perform systems biology studies on the effects of PIDs on T-cell physiology and response. To achieve this, we reconstructed a T-cell network model based on literature mining and TPPIN, a previously published core T-cell network, and performed semi-quantitative dynamic network simulations on both normal and T-cell PID failure modes. The results for several loss-of-function PID simulations correspond to results of previously reported molecular studies. The simulations for TCR PTPRC, LCK, ZAP70 and ITK indicate profound changes to numerous proteins in the network. Significant effects were observed also in the BCL10, CARD11, MALT1, NEMO, IKKB and MAP3K14 simulations. No major effects were observed for PIDs that are caused by constitutively active proteins. The T-cell model facilitates the understanding of the underlying dynamics of PID disease processes. The approach confirms previous knowledge about T-cell signaling network and indicates several new important proteins that may be of interest when developing novel diagnosis and therapies to treat immunological defects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0176500PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407609PMC
September 2017

Large differences in proportions of harmful and benign amino acid substitutions between proteins and diseases.

Hum Mutat 2017 07 30;38(7):839-848. Epub 2017 May 30.

Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, Lund, Sweden.

Genes and proteins are known to have differences in their sensitivity to alterations. Despite numerous sequencing studies, proportions of harmful and harmless substitutions are not known for proteins and groups of proteins. To address this question, we predicted the outcome for all possible single amino acid substitutions (AASs) in nine representative protein groups by using the PON-P2 method. The effects on 996 proteins were studied and vast differences were noticed. Proteins in the cancer group harbor the largest proportion of harmful variants (42.1%), whereas the non-disease group of proteins not known to have a disease association and not involved in the housekeeping functions had the lowest number of harmful variants (4.2%). Differences in the proportions of the harmful and benign variants are wide within each group, but they still show clear differences between the groups. Frequently appearing protein domains show a wide spectrum of variant frequencies, whereas no major protein structural class-specific differences were noticed. AAS types in the original and variant residues showed distinctive patterns, which are shared by all the protein groups. The observations are relevant for understanding genetic bases of diseases, variation interpretation, and for the development of methods for that purpose.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23236DOI Listing
July 2017

Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI.

Hum Mutat 2017 09 16;38(9):1042-1050. Epub 2017 May 16.

Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland.

Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23235DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5561474PMC
September 2017

PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned.

Hum Mutat 2017 09 2;38(9):1085-1091. Epub 2017 May 2.

Protein Structure and Bioinformatics Group, Department of Experimental Medical Science, Lund University, Lund, Sweden.

Computational tools are widely used for ranking and prioritizing variants for characterizing their disease relevance. Since numerous tools have been developed, they have to be properly assessed before being applied. Critical Assessment of Genome Interpretation (CAGI) experiments have significantly contributed toward the assessment of prediction methods for various tasks. Within and outside the CAGI, we have addressed several questions that facilitate development and assessment of variation interpretation tools. These areas include collection and distribution of benchmark datasets, their use for systematic large-scale method assessment, and the development of guidelines for reporting methods and their performance. For us, CAGI has provided a chance to experiment with new ideas, test the application areas of our methods, and network with other prediction method developers. In this article, we discuss our experiences and lessons learned from the various CAGI challenges. We describe our approaches, their performance, and impact of CAGI on our research. Finally, we discuss some of the possibilities that CAGI experiments have opened up and make some suggestions for future experiments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23199DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5561442PMC
September 2017