An improved compound Poisson model for the number of motif hits in DNA sequences.

Bioinformatics 2017 Dec;33(24):3929-3937

Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany.

Motivation: Transcription factors play a crucial role in gene regulation by binding to specific regulatory sequences. The sequence motifs recognized by a transcription factor can be described in terms of position frequency matrices. When scanning a sequence for matches to a position frequency matrix, one needs to determine a cut-off, which then in turn results in a certain number of hits. In this paper we describe how to compute the distribution of match scores and of the number of motif hits, which are the prerequisites to perform motif hit enrichment analysis.

Results: We put forward an improved compound Poisson model that supports general order-d Markov background models and which computes the number of motif-hits more accurately than earlier models. We compared the accuracy of the improved compound Poisson model with previously proposed models across a range of parameters and motifs, demonstrating the improvement. The importance of the order-d model is supported in a case study using CpG-island sequences.

Availability And Implementation: The method is available as a Bioconductor package named 'motifcounter' https://bioconductor.org/packages/motifcounter.

Contact: kopp@molgen.mpg.de.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx539DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860096PMC
December 2017
116 Reads

Publication Analysis

Top Keywords

compound poisson
12
poisson model
12
improved compound
12
motif hits
8
position frequency
8
number motif
8
model supports
4
supports general
4
motif-hits accurately
4
forward improved
4
enrichment analysisresults
4
analysisresults forward
4
general order-d
4
order-d markov
4
background models
4
models computes
4
hit enrichment
4
markov background
4
number motif-hits
4
computes number
4

References

(Supplied by CrossRef)

Alberts et al.
2002
Meme suite: tools for motif discovery and searching
Bailey et al.
Nucleic Acids Res 2009
Matrix search 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices
Chen et al.
Comput. Appl. Biosci. CABIOS 1995

Kemp et al.
1967
The human genome browser at ucsc
Kent et al.
Genome Res 2002

Similar Publications