Publications by authors named "Piotr Dittwald"

19 Publications

  • Page 1 of 1

Computational planning of the synthesis of complex natural products.

Nature 2020 12 13;588(7836):83-88. Epub 2020 Oct 13.

Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland.

Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years. However, the field has progressed greatly since the development of early programs such as LHASA, for which reaction choices at each step were made by human operators. Multiple software platforms are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships, allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2855-yDOI Listing
December 2020

Computer-generated "synthetic contingency" plans at times of logistics and supply problems: scenarios for hydroxychloroquine and remdesivir.

Chem Sci 2020 Jul 10;11(26):6736-6744. Epub 2020 Jun 10.

Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland . Email:

A computer program for retrosynthetic planning helps develop multiple "synthetic contingency" plans for hydroxychloroquine and also routes leading to remdesivir, both promising but yet unproven medications against COVID-19. These plans are designed to navigate, as much as possible, around known and patented routes and to commence from inexpensive and diverse starting materials, so as to ensure supply in case of anticipated market shortages of commonly used substrates. Looking beyond the current COVID-19 pandemic, development of similar contingency syntheses is advocated for other already-approved medications, in case such medications become urgently needed in mass quantities to face other public-health emergencies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/d0sc01799jDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500088PMC
July 2020

Computational design of syntheses leading to compound libraries or isotopically labelled targets.

Chem Sci 2019 Oct 16;10(40):9219-9232. Epub 2019 Aug 16.

Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland . Email:

Although computer programs for retrosynthetic planning have shown improved and in some cases quite satisfactory performance in designing routes leading to specific, individual targets, no algorithms capable of planning syntheses of entire target libraries - important in modern drug discovery - have yet been reported. This study describes how network-search routines underlying existing retrosynthetic programs can be adapted and extended to multi-target design operating on one common search graph, benefitting from the use of common intermediates and reducing the overall synthetic cost. Implementation in the Chematica platform illustrates the usefulness of such algorithms in the syntheses of either (i) all members of a user-defined library, or (ii) the most synthetically accessible members of this library. In the latter case, algorithms are also readily adapted to the identification of the most facile syntheses of isotopically labelled targets. These examples are industrially relevant in the context of hit-to-lead optimization and syntheses of isotopomers of various bioactive molecules.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/c9sc02678aDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6979321PMC
October 2019

MIND: A Double-Linear Model To Accurately Determine Monoisotopic Precursor Mass in High-Resolution Top-Down Proteomics.

Anal Chem 2019 08 23;91(15):10310-10319. Epub 2019 Jul 23.

UA-VITO Center for Proteomics , University of Antwerp , 2000 Antwerp , Belgium.

Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the "true" (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.analchem.9b02682DOI Listing
August 2019

Discovery and Enumeration of Organic-Chemical and Biomimetic Reaction Cycles within the Network of Chemistry.

Angew Chem Int Ed Engl 2018 02 6;57(9):2367-2371. Epub 2018 Feb 6.

Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, Warsaw, 01-224, Poland.

Analysis of the chemical-organic knowledge represented as a giant network reveals that it contains millions of reaction sequences closing into cycles. Without realizing it, independent chemists working at different times have jointly created examples of cyclic sequences that allow for the recovery of useful reagents and for the autoamplification of synthetically important molecules, those that mimic biological cycles, and those that can be operated one-pot.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/anie.201712052DOI Listing
February 2018

Computer-Assisted Synthetic Planning: The End of the Beginning.

Angew Chem Int Ed Engl 2016 05 8;55(20):5904-37. Epub 2016 Apr 8.

Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland.

Exactly half a century has passed since the launch of the first documented research project (1965 Dendral) on computer-assisted organic synthesis. Many more programs were created in the 1970s and 1980s but the enthusiasm of these pioneering days had largely dissipated by the 2000s, and the challenge of teaching the computer how to plan organic syntheses earned itself the reputation of a "mission impossible". This is quite curious given that, in the meantime, computers have "learned" many other skills that had been considered exclusive domains of human intellect and creativity-for example, machines can nowadays play chess better than human world champions and they can compose classical music pleasant to the human ear. Although there have been no similar feats in organic synthesis, this Review argues that to concede defeat would be premature. Indeed, bringing together the combination of modern computational power and algorithms from graph/network theory, chemical rules (with full stereo- and regiochemistry) coded in appropriate formats, and the elements of quantum mechanics, the machine can finally be "taught" how to plan syntheses of non-trivial organic molecules in a matter of seconds to minutes. The Review begins with an overview of some basic theoretical concepts essential for the big-data analysis of chemical syntheses. It progresses to the problem of optimizing pathways involving known reactions. It culminates with discussion of algorithms that allow for a completely de novo and fully automated design of syntheses leading to relatively complex targets, including those that have not been made before. Of course, there are still things to be improved, but computers are finally becoming relevant and helpful to the practice of organic-synthetic planning. Paraphrasing Churchill's famous words after the Allies' first major victory over the Axis forces in Africa, it is not the end, it is not even the beginning of the end, but it is the end of the beginning for the computer-assisted synthesis planning. The machine is here to stay.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/anie.201506101DOI Listing
May 2016

On the Fine Isotopic Distribution and Limits to Resolution in Mass Spectrometry.

J Am Soc Mass Spectrom 2015 Oct 12;26(10):1732-45. Epub 2015 Aug 12.

Institute of Informatics, University of Warsaw, Warsaw, Poland.

Mass spectrometry enables the study of increasingly larger biomolecules with increasingly higher resolution, which is able to distinguish between fine isotopic variants having the same additional nucleon count, but slightly different masses. Therefore, the analysis of the fine isotopic distribution becomes an interesting research topic with important practical applications. In this paper, we propose the comprehensive methodology for studying the basic characteristics of the fine isotopic distribution. Our approach uses a broad spectrum of methods ranging from generating functions--that allow us to estimate the variance and the information theory entropy of the distribution--to the theory of thermal energy fluctuations. Having characterized the variance, spread, shape, and size of the fine isotopic distribution, we are able to indicate limitations to high resolution mass spectrometry. Moreover, the analysis of "thermorelativistic" effects (i.e., mass uncertainty attributable to relativistic effects coupled with the statistical mechanical uncertainty of the energy of an isolated ion), in turn, gives us an estimate of impassable limits of isotopic resolution (understood as the ability to distinguish fine structure peaks), which can be moved further only by cooling the ions. The presented approach highlights the potential of theoretical analysis of the fine isotopic distribution, which allows modeling the data more accurately, aiming to support the successful experimental measurements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s13361-015-1180-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4565875PMC
October 2015

A Priori Estimation of Organic Reaction Yields.

Angew Chem Int Ed Engl 2015 Sep 21;54(37):10797-801. Epub 2015 Jul 21.

Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw (Poland).

A thermodynamically guided calculation of free energies of substrate and product molecules allows for the estimation of the yields of organic reactions. The non-ideality of the system and the solvent effects are taken into account through the activity coefficients calculated at the molecular level by perturbed-chain statistical associating fluid theory (PC-SAFT). The model is iteratively trained using a diverse set of reactions with yields that have been reported previously. This trained model can then estimate a priori the yields of reactions not included in the training set with an accuracy of ca. ±15 %. This ability has the potential to translate into significant economic savings through the selection and then execution of only those reactions that can proceed in good yields.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/anie.201503890DOI Listing
September 2015

Towards automated discrimination of lipids versus peptides from full scan mass spectra.

EuPA Open Proteom 2014 Sep;4:87-100

Applied Bio & molecular Systems, VITO, Mol, Belgium ; Center for Proteomics, Antwerp, Belgium ; Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium.

Although physicochemical fractionation techniques play a crucial role in the analysis of complex mixtures, they are not necessarily the best solution to separate specific molecular classes, such as lipids and peptides. Any physical fractionation step such as, for example, those based on liquid chromatography, will introduce its own variation and noise. In this paper we investigate to what extent the high sensitivity and resolution of contemporary mass spectrometers offers viable opportunities for computational separation of signals in full scan spectra. We introduce an automatic method that can discriminate peptide from lipid peaks in full scan mass spectra, based on their isotopic properties. We systematically evaluate which features maximally contribute to a peptide versus lipid classification. The selected features are subsequently used to build a random forest classifier that enables almost perfect separation between lipid and peptide signals without requiring ion fragmentation and classical tandem MS-based identification approaches. The classifier is trained on data, but is also capable of discriminating signals in real world experiments. We evaluate the influence of typical data inaccuracies of common classes of mass spectrometry instruments on the optimal set of discriminant features. Finally, the method is successfully extended towards the classification of individual lipid classes from full scan mass spectral features, based on input data defined by the Lipid Maps Consortium.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.euprot.2014.05.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234154PMC
September 2014

Human endogenous retroviral elements promote genome instability via non-allelic homologous recombination.

BMC Biol 2014 Sep 23;12:74. Epub 2014 Sep 23.

Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Rm ABBR-R809, Houston, TX, USA.

Background: Recurrent rearrangements of the human genome resulting in disease or variation are mainly mediated by non-allelic homologous recombination (NAHR) between low-copy repeats. However, other genomic structures, including AT-rich palindromes and retroviruses, have also been reported to underlie recurrent structural rearrangements. Notably, recurrent deletions of Yq12 conveying azoospermia, as well as non-pathogenic reciprocal duplications, are mediated by human endogenous retroviral elements (HERVs). We hypothesized that HERV elements throughout the genome can serve as substrates for genomic instability and result in human copy-number variation (CNV).

Results: We developed parameters to identify HERV elements similar to those that mediate Yq12 rearrangements as well as recurrent deletions of 3q13.2q13.31. We used these parameters to identify HERV pairs genome-wide that may cause instability. Our analysis highlighted 170 pairs, flanking 12.1% of the genome. We cross-referenced these predicted susceptibility regions with CNVs from our clinical databases for potentially HERV-mediated rearrangements and identified 78 CNVs. We subsequently molecularly confirmed recurrent deletion and duplication rearrangements at four loci in ten individuals, including reciprocal rearrangements at two loci. Breakpoint sequencing revealed clustering in regions of high sequence identity enriched in PRDM9-mediated recombination hotspot motifs.

Conclusions: The presence of deletions and reciprocal duplications suggests NAHR as the causative mechanism of HERV-mediated CNV, even though the length and the sequence homology of the HERV elements are less than currently thought to be required for NAHR. We propose that in addition to HERVs, other repetitive elements, such as long interspersed elements, may also be responsible for the formation of recurrent CNVs via NAHR.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12915-014-0074-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4195946PMC
September 2014

BRAIN 2.0: time and memory complexity improvements in the algorithm for calculating the isotope distribution.

J Am Soc Mass Spectrom 2014 Apr 12;25(4):588-94. Epub 2014 Feb 12.

College of Inter-faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, Warsaw, Poland,

Recently, an elegant iterative algorithm called BRAIN (Baffling Recursive Algorithm for Isotopic distributioN calculations) was presented. The algorithm is based on the classic polynomial method for calculating aggregated isotope distributions, and it introduces algebraic identities using Newton-Girard and Viète's formulae to solve the problem of polynomial expansion. Due to the iterative nature of the BRAIN method, it is a requirement that the calculations start from the lightest isotope variant. As such, the complexity of BRAIN scales quadratically with the mass of the putative molecule, since it depends on the number of aggregated peaks that need to be calculated. In this manuscript, we suggest two improvements of the algorithm to decrease both time and memory complexity in obtaining the aggregated isotope distribution. We also illustrate a concept to represent the element isotope distribution in a generic manner. This representation allows for omitting the root calculation of the element polynomial required in the original BRAIN method. A generic formulation for the roots is of special interest for higher order element polynomials such that root finding algorithms and its inaccuracies can be avoided.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s13361-013-0796-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3953541PMC
April 2014

Comment on "Computation of isotopic peak center-mass distribution by fourier transform".

Anal Chem 2013 Dec 4;85(24):12189-12192. Epub 2013 Dec 4.

Applied Bio & Molecular Systems, Vlaamse Instelling Voor Technologisch Onderzoek (VITO), Mol, Belgium.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/ac402731hDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4119064PMC
December 2013

Recurrent HERV-H-mediated 3q13.2-q13.31 deletions cause a syndrome of hypotonia and motor, language, and cognitive delays.

Hum Mutat 2013 Oct 13;34(10):1415-23. Epub 2013 Aug 13.

Signature Genomic Laboratories, PerkinElmer, Inc, Spokane, Washington.

We describe the molecular and clinical characterization of nine individuals with recurrent, 3.4-Mb, de novo deletions of 3q13.2-q13.31 detected by chromosomal microarray analysis. All individuals have hypotonia and language and motor delays; they variably express mild to moderate cognitive delays (8/9), abnormal behavior (7/9), and autism spectrum disorders (3/9). Common facial features include downslanting palpebral fissures with epicanthal folds, a slightly bulbous nose, and relative macrocephaly. Twenty-eight genes map to the deleted region, including four strong candidate genes, DRD3, ZBTB20, GAP43, and BOC, with important roles in neural and/or muscular development. Analysis of the breakpoint regions based on array data revealed directly oriented human endogenous retrovirus (HERV-H) elements of ~5 kb in size and of >95% DNA sequence identity flanking the deletion. Subsequent DNA sequencing revealed different deletion breakpoints and suggested nonallelic homologous recombination (NAHR) between HERV-H elements as a mechanism of deletion formation, analogous to HERV-I-flanked and NAHR-mediated AZFa deletions. We propose that similar HERV elements may also mediate other recurrent deletion and duplication events on a genome-wide scale. Observation of rare recurrent chromosomal events such as these deletions helps to further the understanding of mechanisms behind naturally occurring variation in the human genome and its contribution to genetic disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.22384DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4599348PMC
October 2013

NAHR-mediated copy-number variants in a clinical population: mechanistic insights into both genomic disorders and Mendelizing traits.

Genome Res 2013 Sep 8;23(9):1395-409. Epub 2013 May 8.

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.

We delineated and analyzed directly oriented paralogous low-copy repeats (DP-LCRs) in the most recent version of the human haploid reference genome. The computationally defined DP-LCRs were cross-referenced with our chromosomal microarray analysis (CMA) database of 25,144 patients subjected to genome-wide assays. This computationally guided approach to the empirically derived large data set allowed us to investigate genomic rearrangement relative frequencies and identify new loci for recurrent nonallelic homologous recombination (NAHR)-mediated copy-number variants (CNVs). The most commonly observed recurrent CNVs were NPHP1 duplications (233), CHRNA7 duplications (175), and 22q11.21 deletions (DiGeorge/velocardiofacial syndrome, 166). In the ∼25% of CMA cases for which parental studies were available, we identified 190 de novo recurrent CNVs. In this group, the most frequently observed events were deletions of 22q11.21 (48), 16p11.2 (autism, 34), and 7q11.23 (Williams-Beuren syndrome, 11). Several features of DP-LCRs, including length, distance between NAHR substrate elements, DNA sequence identity (fraction matching), GC content, and concentration of the homologous recombination (HR) hot spot motif 5'-CCNCCNTNNCCNC-3', correlate with the frequencies of the recurrent CNVs events. Four novel adjacent DP-LCR-flanked and NAHR-prone regions, involving 2q12.2q13, were elucidated in association with novel genomic disorders. Our study quantitates genome architectural features responsible for NAHR-mediated genomic instability and further elucidates the role of NAHR in human disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.152454.112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759717PMC
September 2013

BRAIN: a universal tool for high-throughput calculations of the isotopic distribution for mass spectrometry.

Anal Chem 2013 Feb 31;85(4):1991-4. Epub 2013 Jan 31.

This Letter presents the R-package implementation of the recently introduced polynomial method for calculating the aggregated isotopic distribution called BRAIN (Baffling Recursive Algorithm for Isotopic distributioN calculations). The algorithm is simple, easy to understand, highly accurate, fast, and memory-efficient. The method is based on the application of the Newton-Girard theorem and Viète's formulae to the polynomial coding of different aggregated isotopic variants. As a result, an elegant recursive equation is obtained for computing the occurrence probabilities of consecutive aggregated isotopic peaks. Additionally, the algorithm also allows calculating the center-masses of the aggregated isotopic variants. We propose an implementation which is suitable for high-throughput processing and easily customizable for application in different areas of mass spectral data analyses. A case study demonstrates how the R-package can be applied in the context of protein research, but the software can be also used for calculating the isotopic distribution in the context of lipidomics, metabolomics, glycoscience, or even space exploration. More materials, i.e., reference manual, vignette, and the package itself are available at Bioconductor online (http://www.bioconductor.org/packages/release/bioc/html/BRAIN.html).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/ac303439mDOI Listing
February 2013

Small noncoding differentially methylated copy-number variants, including lncRNA genes, cause a lethal lung developmental disorder.

Genome Res 2013 Jan 3;23(1):23-33. Epub 2012 Oct 3.

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.

An unanticipated and tremendous amount of the noncoding sequence of the human genome is transcribed. Long noncoding RNAs (lncRNAs) constitute a significant fraction of non-protein-coding transcripts; however, their functions remain enigmatic. We demonstrate that deletions of a small noncoding differentially methylated region at 16q24.1, including lncRNA genes, cause a lethal lung developmental disorder, alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV), with parent-of-origin effects. We identify overlapping deletions 250 kb upstream of FOXF1 in nine patients with ACD/MPV that arose de novo specifically on the maternally inherited chromosome and delete lung-specific lncRNA genes. These deletions define a distant cis-regulatory region that harbors, besides lncRNA genes, also a differentially methylated CpG island, binds GLI2 depending on the methylation status of this CpG island, and physically interacts with and up-regulates the FOXF1 promoter. We suggest that lung-transcribed 16q24.1 lncRNAs may contribute to long-range regulation of FOXF1 by GLI2 and other transcription factors. Perturbation of lncRNA-mediated chromatin interactions may, in general, be responsible for position effect phenomena and potentially cause many disorders of human development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.141887.112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530681PMC
January 2013

Inverted low-copy repeats and genome instability--a genome-wide analysis.

Hum Mutat 2013 Jan 11;34(1):210-20. Epub 2012 Oct 11.

Institute of Informatics, University of Warsaw, Warsaw, Poland.

Inverse paralogous low-copy repeats (IP-LCRs) can cause genome instability by nonallelic homologous recombination (NAHR)-mediated balanced inversions. When disrupting a dosage-sensitive gene(s), balanced inversions can lead to abnormal phenotypes. We delineated the genome-wide distribution of IP-LCRs >1 kB in size with >95% sequence identity and mapped the genes, potentially intersected by an inversion, that overlap at least one of the IP-LCRs. Remarkably, our results show that 12.0% of the human genome is potentially susceptible to such inversions and 942 genes, 99 of which are on the X chromosome, are predicted to be disrupted secondary to such an inversion! In addition, IP-LCRs larger than 800 bp with at least 98% sequence identity (duplication/triplication facilitating IP-LCRs, DTIP-LCRs) were recently implicated in the formation of complex genomic rearrangements with a duplication-inverted triplication-duplication (DUP-TRP/INV-DUP) structure by a replication-based mechanism involving a template switch between such inverted repeats. We identified 1,551 DTIP-LCRs that could facilitate DUP-TRP/INV-DUP formation. Remarkably, 1,445 disease-associated genes are at risk of undergoing copy-number gain as they map to genomic intervals susceptible to the formation of DUP-TRP/INV-DUP complex rearrangements. We implicate inverted LCRs as a human genome architectural feature that could potentially be responsible for genomic instability associated with many human disease traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.22217DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3738003PMC
January 2013

Inferring serum proteolytic activity from LC-MS/MS data.

BMC Bioinformatics 2012 Apr 12;13 Suppl 5:S7. Epub 2012 Apr 12.

Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.

Background: In this paper we deal with modeling serum proteolysis process from tandem mass spectrometry data. The parameters of peptide degradation process inferred from LC-MS/MS data correspond directly to the activity of specific enzymes present in the serum samples of patients and healthy donors. Our approach integrate the existing knowledge about peptidases' activity stored in MEROPS database with the efficient procedure for estimation the model parameters.

Results: Taking into account the inherent stochasticity of the process, the proteolytic activity is modeled with the use of Chemical Master Equation (CME). Assuming the stationarity of the Markov process we calculate the expected values of digested peptides in the model. The parameters are fitted to minimize the discrepancy between those expected values and the peptide activities observed in the MS data. Constrained optimization problem is solved by Levenberg-Marquadt algorithm.

Conclusions: Our results demonstrates the feasibility and potential of high-level analysis for LC-MS proteomic data. The estimated enzyme activities give insights into the molecular pathology of colorectal cancer. Moreover the developed framework is general and can be applied to study proteolytic activity in different systems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-13-S5-S7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3358667PMC
April 2012

An efficient method to calculate the aggregated isotopic distribution and exact center-masses.

J Am Soc Mass Spectrom 2012 Apr 15;23(4):753-63. Epub 2012 Feb 15.

I-BioStat, Hasselt University, Diepenbeek, Belgium.

In this article, we present a computation- and memory-efficient method to calculate the probabilities of occurrence and exact center-masses of the aggregated isotopic distribution of a molecule. The method uses fundamental mathematical properties of polynomials given by the Newton-Girard theorem and Viete's formulae. The calculation is based on the atomic composition of the molecule and the natural abundances of the elemental isotopes in normal terrestrial matter. To evaluate the performance of the proposed method, which we named BRAIN, we compare it with the results obtained from five existing software packages (IsoPro, Mercury, Emass, NeutronCluster, and IsoDalton) for 10 biomolecules. Additionally, we compare the computed mass centers with the results obtained by calculating, and subsequently aggregating, the fine isotopic distribution for two of the exemplary biomolecules. The algorithm will be made available as a Bioconductor package in R, and is also available upon request.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s13361-011-0326-2DOI Listing
April 2012
-->