Publications by authors named "Aleksi Kallio"

16 Publications

  • Page 1 of 1

Multisubstituted pyrimidines effectively inhibit bacterial growth and biofilm formation of Staphylococcus aureus.

Sci Rep 2021 Apr 12;11(1):7931. Epub 2021 Apr 12.

Drug Research Program, Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, P.O. Box 56, (Viikinkaari 5 E), FI-00014, Helsinki, Finland.

Biofilms are multicellular communities of microorganisms that generally attach to surfaces in a self-produced matrix. Unlike planktonic cells, biofilms can withstand conventional antibiotics, causing significant challenges in the healthcare system. Currently, new chemical entities are urgently needed to develop novel anti-biofilm agents. In this study, we designed and synthesized a set of 2,4,5,6-tetrasubstituted pyrimidines and assessed their antibacterial activity against planktonic cells and biofilms formed by Staphylococcus aureus. Compounds 9e, 10d, and 10e displayed potent activity for inhibiting the onset of biofilm formation as well as for killing pre-formed biofilms of S. aureus ATCC 25923 and Newman strains, with half-maximal inhibitory concentration (IC) values ranging from 11.6 to 62.0 µM. These pyrimidines, at 100 µM, not only decreased the number of viable bacteria within the pre-formed biofilm by 2-3 log but also reduced the amount of total biomass by 30-50%. Furthermore, these compounds were effective against planktonic cells with minimum inhibitory concentration (MIC) values lower than 60 µM for both staphylococcal strains. Compound 10d inhibited the growth of S. aureus ATCC 25923 in a concentration-dependent manner and displayed a bactericidal anti-staphylococcal activity. Taken together, our study highlights the value of multisubstituted pyrimidines to develop novel anti-biofilm agents.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-021-86852-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041844PMC
April 2021

Properties of Fixed-Fixed Models and Alternatives in Presence-Absence Data Analysis.

Authors:
Aleksi Kallio

PLoS One 2016 3;11(11):e0165456. Epub 2016 Nov 3.

Department of Computer Science, Aalto University, Espoo, Finland.

Assessing the significance of patterns in presence-absence data is an important question in ecological data analysis, e.g., when studying nestedness. Significance testing can be performed with the commonly used fixed-fixed models, which preserve the row and column sums while permuting the data. The manuscript considers the properties of fixed-fixed models and points out how their strict constraints can lead to limited randomizability. The manuscript considers the question of relaxing row and column sun constraints of the fixed-fixed models. The Rasch models are presented as an alternative with relaxed constraints and sound statistical properties. Models are compared on presence-absence data and surprisingly the fixed-fixed models are observed to produce unreasonably optimistic measures of statistical significance, giving interesting insight into practical effects of limited randomizability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0165456PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5094661PMC
July 2017

Recommendations on e-infrastructures for next-generation sequencing.

Gigascience 2016 06 7;5:26. Epub 2016 Jun 7.

CSC - IT Center for Science Ltd., Espoo, P.O. Box 405, FI-02101, Finland.

With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13742-016-0132-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4897895PMC
June 2016

Individual FEV1 Trajectories Can Be Identified from a COPD Cohort.

COPD 2016 08 25;13(4):425-30. Epub 2016 Jan 25.

a Clinical Research Unit for Pulmonary Diseases and Division of Pulmonology , Helsinki University Central Hospital , Helsinki , Finland.

Objective: We aim to make use of clinical spirometry data in order to identify individual COPD-patients with divergent trajectories of lung function over time.

Study Design And Setting: Hospital-based COPD cohort (N = 607) was followed on average 4.6 years. Each patient had a mean of 8.4 spirometries available. We used a Hierarchical Bayesian Model (HBM) to identify the individuals presenting constant trends in lung function.

Results: At a probability level of 95%, one third of the patients (180/607) presented rapidly declining FEV1 (mean -78 ml/year, 95% CI -73 to -83 ml) compared to that in the rest of the patients (mean -26 ml/year, 95% CI -23 to -29 ml, p ≤ 2.2 × 10(-16)). Constant improvement of FEV1 was very rare. The rapid decliners more frequently suffered from exacerbations measured by various outcome markers.

Conclusion: Clinical data of unique patients can be utilized to identify diverging trajectories of FEV1 with a high probability. Frequent exacerbations were more prevalent in FEV1-decliners than in the rest of the patients. The result confirmed previously reported association between FEV1 decline and exacerbation rate and further suggested that in clinical practice HBM could improve the identification of high-risk individuals at early stages of the disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3109/15412555.2015.1043423DOI Listing
August 2016

BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences.

Bioinform Biol Insights 2015 10;9:125-8. Epub 2015 Sep 10.

SNIC-UPPMAX, Department of Information Technology, Uppsala University, Uppsala, Sweden. ; Science for Life Laboratory, Uppsala University, Uppsala, Sweden. ; Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.

Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4137/BBI.S28636DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4567039PMC
September 2015

Experiences with workflows for automating data-intensive bioinformatics.

Biol Direct 2015 Aug 19;10:43. Epub 2015 Aug 19.

AgroBioInstitute and Joint Genomic Centre, Sofia, Bulgaria.

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13062-015-0071-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4539931PMC
August 2015

Quantitative analysis of colony morphology in yeast.

Biotechniques 2014 Jan;56(1):18-27

Pacific Northwest Diabetes Research Institute, Seattle, WA; Molecular and Cellular Biology Program, University of Washington, Seattle, WA.

Microorganisms often form multicellular structures such as biofilms and structured colonies that can influence the organism's virulence, drug resistance, and adherence to medical devices. Phenotypic classification of these structures has traditionally relied on qualitative scoring systems that limit detailed phenotypic comparisons between strains. Automated imaging and quantitative analysis have the potential to improve the speed and accuracy of experiments designed to study the genetic and molecular networks underlying different morphological traits. For this reason, we have developed a platform that uses automated image analysis and pattern recognition to quantify phenotypic signatures of yeast colonies. Our strategy enables quantitative analysis of individual colonies, measured at a single time point or over a series of time-lapse images, as well as the classification of distinct colony shapes based on image-derived features. Phenotypic changes in colony morphology can be expressed as changes in feature space trajectories over time, thereby enabling the visualization and quantitative analysis of morphological development. To facilitate data exploration, results are plotted dynamically through an interactive Yeast Image Analysis web application (YIMAA; http://yimaa.cs.tut.fi) that integrates the raw and processed images across all time points, allowing exploration of the image-based features and principal components associated with morphological development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2144/000114123DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3996921PMC
January 2014

POMO--Plotting Omics analysis results for Multiple Organisms.

BMC Genomics 2013 Dec 24;14:918. Epub 2013 Dec 24.

Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg, Luxembourg.

Background: Systems biology experiments studying different topics and organisms produce thousands of data values across different types of genomic data. Further, data mining analyses are yielding ranked and heterogeneous results and association networks distributed over the entire genome. The visualization of these results is often difficult and standalone web tools allowing for custom inputs and dynamic filtering are limited.

Results: We have developed POMO (http://pomo.cs.tut.fi), an interactive web-based application to visually explore omics data analysis results and associations in circular, network and grid views. The circular graph represents the chromosome lengths as perimeter segments, as a reference outer ring, such as cytoband for human. The inner arcs between nodes represent the uploaded network. Further, multiple annotation rings, for example depiction of gene copy number changes, can be uploaded as text files and represented as bar, histogram or heatmap rings. POMO has built-in references for human, mouse, nematode, fly, yeast, zebrafish, rice, tomato, Arabidopsis, and Escherichia coli. In addition, POMO provides custom options that allow integrated plotting of unsupported strains or closely related species associations, such as human and mouse orthologs or two yeast wild types, studied together within a single analysis. The web application also supports interactive label and weight filtering. Every iterative filtered result in POMO can be exported as image file and text file for sharing or direct future input.

Conclusions: The POMO web application is a unique tool for omics data analysis, which can be used to visualize and filter the genome-wide networks in the context of chromosomal locations as well as multiple network layouts. With the several illustration and filtering options the tool supports the analysis and visualization of any heterogeneous omics data analysis association results for many organisms. POMO is freely available and does not require any installation or registration.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-14-918DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880012PMC
December 2013

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.

Bioinformatics 2014 Jan 22;30(1):119-20. Epub 2013 Oct 22.

Aalto University School of Science and Helsinki Institute for Information Technology HIIT, Finland, International Computer Science Institute, Berkeley, CA, USA, CRS4-Center for Advanced Studies, Research and Development in Sardinia, Italy and CSC-IT Center for Science, Finland.

Summary: Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts.

Availability And Implementation: Available under the open source MIT license at http://sourceforge.net/projects/seqpig/
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt601DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866557PMC
January 2014

Optimizing detection of transcription factor-binding sites in ChIP-seq experiments.

Methods Mol Biol 2013 ;1038:181-91

CSC-IT Center for Science Ltd, Espoo, Finland.

Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) offers a powerful means to study transcription factor binding on a genome-wide scale. While a number of advanced software packages have already become available for identifying ChIP-seq-binding sites, it has become evident that the choice of the package together with its adjustable parameters can considerably affect the biological conclusions made from the data. Therefore, to aid these choices, we have recently introduced a reproducibility-optimization procedure, which computationally adjusts the parameters of the popular peak detection algorithms for each ChIP-seq data separately. Here, we provide a detailed description of the procedure together with practical guidelines on how to apply its implementation, the peakROTS R-package, in a given ChIP-seq experiment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-62703-514-9_11DOI Listing
February 2014

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.

Bioinformatics 2012 Mar 2;28(6):876-7. Epub 2012 Feb 2.

Aalto University, Department of Information and Computer Science, Aalto, Finland.

Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts054DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307120PMC
March 2012

Optimized detection of transcription factor-binding sites in ChIP-seq experiments.

Nucleic Acids Res 2012 Jan 18;40(1):e1. Epub 2011 Oct 18.

Department of Mathematics, University of Turku, FI-20014 Turku, Finland.

We developed a computational procedure for optimizing the binding site detections in a given ChIP-seq experiment by maximizing their reproducibility under bootstrap sampling. We demonstrate how the procedure can improve the detection accuracies beyond those obtained with the default settings of popular peak calling software, or inform the user whether the peak detection results are compromised, circumventing the need for arbitrary re-iterative peak calling under varying parameter settings. The generic, open-source implementation is easily extendable to accommodate additional features and to promote its widespread application in future ChIP-seq studies. The peakROTS R-package and user guide are freely available at http://www.nic.funet.fi/pub/sci/molbio/peakROTS.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr839DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245948PMC
January 2012

Chipster: user-friendly analysis software for microarray and other high-throughput data.

BMC Genomics 2011 Oct 14;12:507. Epub 2011 Oct 14.

CSC - IT Center for Science, Keilaranta 14, Keilaniemi, Espoo, Finland.

Background: The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software.

Results: Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies.

Conclusions: Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-12-507DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3215701PMC
October 2011

Randomization techniques for assessing the significance of gene periodicity results.

BMC Bioinformatics 2011 Aug 9;12:330. Epub 2011 Aug 9.

Research Environment Services, CSC-IT Center for Science Ltd, P.O. Box 405, Espoo 02101, Finland.

Background: Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns, such as clusters, regulatory networks, or time series periodicity. Study of periodic gene expression is an interesting research question that also is a good example of challenges involved in the analysis of high-throughput data in general. Unlike for classical statistical tests, the distribution of test statistic for data mining methods cannot be derived analytically.

Results: We describe the randomization based approach to significance testing, and show how it can be applied to detect periodically expressed genes. We present four randomization methods, three of which have previously been used for gene cycle data. We propose a new method for testing significance of periodicity in gene expression short time series data, such as from gene cycle and circadian clock studies. We argue that the underlying assumptions behind existing significance testing approaches are problematic and some of them unrealistic. We analyze the theoretical properties of the existing and proposed methods, showing how our method can be robustly used to detect genes with exceptionally high periodicity. We also demonstrate the large differences in the number of significant results depending on the chosen randomization methods and parameters of the testing framework.By reanalyzing gene cycle data from various sources, we show how previous estimates on the number of gene cycle controlled genes are not supported by the data. Our randomization approach combined with widely adopted Benjamini-Hochberg multiple testing method yields better predictive power and produces more accurate null distributions than previous methods.

Conclusions: Existing methods for testing significance of periodic gene expression patterns are simplistic and optimistic. Our testing framework allows strict levels of statistical significance with more realistic underlying assumptions, without losing predictive power. As DNA microarrays have now become mainstream and new high-throughput methods are rapidly being adopted, we argue that not only there will be need for data mining methods capable of coping with immense datasets, but there will also be need for solid methods for significance testing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-330DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199764PMC
August 2011

MicroRNA expression profiling reveals miRNA families regulating specific biological pathways in mouse frontal cortex and hippocampus.

PLoS One 2011 22;6(6):e21495. Epub 2011 Jun 22.

Institute of Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.

MicroRNAs (miRNAs) are small regulatory molecules that cause post-transcriptional gene silencing. Although some miRNAs are known to have region-specific expression patterns in the adult brain, the functional consequences of the region-specificity to the gene regulatory networks of the brain nuclei are not clear. Therefore, we studied miRNA expression patterns by miRNA-Seq and microarrays in two brain regions, frontal cortex (FCx) and hippocampus (HP), which have separate biological functions. We identified 354 miRNAs from FCx and 408 from HP using miRNA-Seq, and 245 from FCx and 238 from HP with microarrays. Several miRNA families and clusters were differentially expressed between FCx and HP, including the miR-8 family, miR-182|miR-96|miR-183 cluster, and miR-212|miR-312 cluster overexpressed in FCx and miR-34 family overexpressed in HP. To visualize the clusters, we developed support for viewing genomic alignments of miRNA-Seq reads in the Chipster genome browser. We carried out pathway analysis of the predicted target genes of differentially expressed miRNA families and clusters to assess their putative biological functions. Interestingly, several miRNAs from the same family/cluster were predicted to regulate specific biological pathways. We have developed a miRNA-Seq approach with a bioinformatic analysis workflow that is suitable for studying miRNA expression patterns from specific brain nuclei. FCx and HP were shown to have distinct miRNA expression patterns which were reflected in the predicted gene regulatory pathways. This methodology can be applied for the identification of brain region-specific and phenotype-specific miRNA-mRNA-regulatory networks from the adult and developing rodent brain.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0021495PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3120887PMC
November 2011

Optimized detection of differential expression in global profiling experiments: case studies in clinical transcriptomic and quantitative proteomic datasets.

Brief Bioinform 2009 Sep 23;10(5):547-55. Epub 2009 Jun 23.

Department of Mathematics, University of Turku, FI-20014 Turku, Finland.

Identification of reliable molecular markers that show differential expression between distinct groups of samples has remained a fundamental research problem in many large-scale profiling studies, such as those based on DNA microarray or mass-spectrometry technologies. Despite the availability of a wide spectrum of statistical procedures, the users of the high-throughput platforms are still facing the crucial challenge of deciding which test statistic is best adapted to the intrinsic properties of their own datasets. To meet this challenge, we recently introduced an adaptive procedure, named ROTS (Reproducibility-Optimized Test Statistic), which learns an optimal statistic directly from the given data, and whose relative benefits have previously been shown in comparison with state-of-the-art procedures for detecting differential expression. Using gene expression microarray and mass-spectrometry (MS)-based protein expression datasets as case studies, we illustrate here the practical usage and advantages of ROTS toward detecting reliable marker lists in clinical transcriptomic and proteomic studies. In a public leukemia microarray dataset, the procedure could improve the sensitivity of the gene marker lists detected with high specificity. When applied to a recent LC-MS dataset, involving plasma samples from severe burn patients, the procedure could identify several peptide markers that remained undetected in the conventional analysis, thus demonstrating the effectiveness of ROTS also for global quantitative proteomic studies. To promote its widespread usage, we have made freely available efficient implementations of ROTS, which are easily accessible either as a stand-alone R-package or as integrated in the open-source data analysis software Chipster.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbp033DOI Listing
September 2009