Publications by authors named "Avraam Tapinos"

6 Publications

  • Page 1 of 1

Whole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes.

Nat Commun 2021 11 26;12(1):6946. Epub 2021 Nov 26.

Center for Clinical Cancer Genetics and Global Health, Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA.

Black women across the African diaspora experience more aggressive breast cancer with higher mortality rates than white women of European ancestry. Although inter-ethnic germline variation is known, differential somatic evolution has not been investigated in detail. Analysis of deep whole genomes of 97 breast cancers, with RNA-seq in a subset, from women in Nigeria in comparison with The Cancer Genome Atlas (nā€‰=ā€‰76) reveal a higher rate of genomic instability and increased intra-tumoral heterogeneity as well as a unique genomic subtype defined by early clonal GATA3 mutations with a 10.5-year younger age at diagnosis. We also find non-coding mutations in bona fide drivers (ZNF217 and SYPL1) and a previously unreported INDEL signature strongly associated with African ancestry proportion, underscoring the need to expand inclusion of diverse populations in biomedical research. Finally, we demonstrate that characterizing tumors for homologous recombination deficiency has significant clinical relevance in stratifying patients for potentially life-saving therapies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-27079-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8626467PMC
November 2021

Investigation of Salmonella Phage-Bacteria Infection Profiles: Network Structure Reveals a Gradient of Target-Range from Generalist to Specialist Phage Clones in Nested Subsets.

Viruses 2021 06 28;13(7). Epub 2021 Jun 28.

Division of Evolution and Genomic Sciences, The University of Manchester, Manchester M13 9GB, UK.

Bacteriophages that lyse Salmonella enterica are potential tools to target and control Salmonella infections. Investigating the host range of Salmonella phages is a key to understand their impact on bacterial ecology, coevolution and inform their use in intervention strategies. Virus-host infection networks have been used to characterize the "predator-prey" interactions between phages and bacteria and provide insights into host range and specificity. Here, we characterize the target-range and infection profiles of 13 Salmonella phage clones against a diverse set of 141 Salmonella strains. The environmental source and taxonomy contributed to the observed infection profiles, and genetically proximal phages shared similar infection profiles. Using in vitro infection data, we analyzed the structure of the Salmonella phage-bacteria infection network. The network has a non-random nested organization and weak modularity suggesting a gradient of target-range from generalist to specialist species with nested subsets, which are also observed within and across the different phage infection profile groups. Our results have implications for our understanding of the coevolutionary mechanisms shaping the ecological interactions between Salmonella phages and their bacterial hosts and can inform strategies for targeting Salmonella enterica with specific phage preparations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/v13071261DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8310288PMC
June 2021

The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.

Viruses 2019 04 26;11(5). Epub 2019 Apr 26.

School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK.

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/v11050394DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6563281PMC
April 2019

A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns.

Sci Rep 2019 02 15;9(1):2159. Epub 2019 Feb 15.

Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PT, UK.

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local 'texture' changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their 'texture' compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-38197-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6377666PMC
February 2019

Challenges in the analysis of viral metagenomes.

Virus Evol 2016 Jul 3;2(2):vew022. Epub 2016 Aug 3.

BioInfoExperts, Norfolk, VA, USA.

Genome sequencing technologies continue to develop with remarkable pace, yet analytical approaches for reconstructing and classifying viral genomes from mixed samples remain limited in their performance and usability. Existing solutions generally target expert users and often have unclear scope, making it challenging to critically evaluate their performance. There is a growing need for intuitive analytical tooling for researchers lacking specialist computing expertise and that is applicable in diverse experimental circumstances. Notable technical challenges have impeded progress; for example, fragments of viral genomes are typically orders of magnitude less abundant than those of host, bacteria, and/or other organisms in clinical and environmental metagenomes; observed viral genomes often deviate considerably from reference genomes demanding use of exhaustive alignment approaches; high intrapopulation viral diversity can lead to ambiguous sequence reconstruction; and finally, the relatively few documented viral reference genomes compared to the estimated number of distinct viral taxa renders classification problematic. Various software tools have been developed to accommodate the unique challenges and use cases associated with characterizing viral sequences; however, the quality of these tools varies, and their use often necessitates computing expertise or access to powerful computers, thus limiting their usefulness to many researchers. In this review, we consider the general and application-specific challenges posed by viral sequencing and analysis, outline the landscape of available tools and methodologies, and propose ways of overcoming the current barriers to effective analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/ve/vew022DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5822887PMC
July 2016

A method for comparing multivariate time series with different dimensions.

PLoS One 2013 5;8(2):e54201. Epub 2013 Feb 5.

School of Computer Science and Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom.

In many situations it is desirable to compare dynamical systems based on their behavior. Similarity of behavior often implies similarity of internal mechanisms or dependency on common extrinsic factors. While there are widely used methods for comparing univariate time series, most dynamical systems are characterized by multivariate time series. Yet, comparison of multivariate time series has been limited to cases where they share a common dimensionality. A semi-metric is a distance function that has the properties of non-negativity, symmetry and reflexivity, but not sub-additivity. Here we develop a semi-metric--SMETS--that can be used for comparing groups of time series that may have different dimensions. To demonstrate its utility, the method is applied to dynamic models of biochemical networks and to portfolios of shares. The former is an example of a case where the dependencies between system variables are known, while in the latter the system is treated (and behaves) as a black box.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0054201PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3564859PMC
August 2013
-->