Publications by authors named "Gert Thijs"

18 Publications

  • Page 1 of 1

Dutch genome diagnostic laboratories accelerated and improved variant interpretation and increased accuracy by sharing data.

Hum Mutat 2019 12 3;40(12):2230-2238. Epub 2019 Sep 3.

Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.

Each year diagnostic laboratories in the Netherlands profile thousands of individuals for heritable disease using next-generation sequencing (NGS). This requires pathogenicity classification of millions of DNA variants on the standard 5-tier scale. To reduce time spent on data interpretation and increase data quality and reliability, the nine Dutch labs decided to publicly share their classifications. Variant classifications of nearly 100,000 unique variants were catalogued and compared in a centralized MOLGENIS database. Variants classified by more than one center were labeled as "consensus" when classifications agreed, and shared internationally with LOVD and ClinVar. When classifications opposed (LB/B vs. LP/P), they were labeled "conflicting", while other nonconsensus observations were labeled "no consensus". We assessed our classifications using the InterVar software to compare to ACMG 2015 guidelines, showing 99.7% overall consistency with only 0.3% discrepancies. Differences in classifications between Dutch labs or between Dutch labs and ACMG were mainly present in genes with low penetrance or for late onset disorders and highlight limitations of the current 5-tier classification system. The data sharing boosted the quality of DNA diagnostics in Dutch labs, an initiative we hope will be followed internationally. Recently, a positive match with a case from outside our consortium resulted in a more definite disease diagnosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23896DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6900155PMC
December 2019

Somatic Tumor Variant Filtration Strategies to Optimize Tumor-Only Molecular Profiling Using Targeted Next-Generation Sequencing Panels.

J Mol Diagn 2019 03 19;21(2):261-273. Epub 2018 Dec 19.

Advanced Molecular Diagnostics Laboratory, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada; Department of Clinical Laboratory Genetics, Laboratory Medicine Program, University Health Network, Toronto, Ontario, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada. Electronic address:

A common approach in clinical diagnostic laboratories to variant assessment from tumor molecular profiling is sequencing of genomic DNA extracted from both tumor (somatic) and normal (germline) tissue, with subsequent variant comparison to identify true somatic variants with potential impact on patient treatment or prognosis. However, challenges exist in paired tumor-normal testing, including increased cost of dual sample testing and identification of germline cancer predisposing variants. Alternatively, somatic variants can be identified by in silico tumor-only variant filtration precluding the need for matched normal testing. The barrier to tumor-only variant filtration is defining a reliable approach, with high sensitivity and specificity to identify somatic variants. In this study, we used retrospective data sets from paired tumor-normal samples tested on small (48 gene) and large (555 gene) targeted next-generation sequencing panels, to model algorithms for tumor-only variants classification. The optimal algorithm required an ordinal filtering approach using information from variant population databases (1000 Genomes Phase 3, ESP6500, ExAC), clinical mutation databases (ClinVar), and information on recurring clinically relevant somatic variants. Overall the tumor-only variant filtration strategy described in this study can define clinically relevant somatic variants from tumor-only analysis with sensitivity of 97% to 99% and specificity of 87% to 94%, and with significant potential utility for clinical laboratories implementing tumor-only molecular profiling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmoldx.2018.09.008DOI Listing
March 2019

Spectrophores as one-dimensional descriptors calculated from three-dimensional atomic properties: applications ranging from scaffold hopping to multi-target virtual screening.

J Cheminform 2018 Mar 7;10(1). Epub 2018 Mar 7.

Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, Campus Drie Eiken, Building A, Universiteitsplein 1, 2610, Antwerp, Belgium.

Spectrophores are novel descriptors that are calculated from the three-dimensional atomic properties of molecules. In our current implementation, the atomic properties that were used to calculate spectrophores include atomic partial charges, atomic lipophilicity indices, atomic shape deviations and atomic softness properties. This approach can easily be widened to also include additional atomic properties. Our novel methodology finds its roots in the experimental affinity fingerprinting technology developed in the 1990's by Terrapin Technologies. Here we have translated it into a purely virtual approach using artificial affinity cages and a simplified metric to calculate the interaction between these cages and the atomic properties. A typical spectrophore consists of a vector of 48 real numbers. This makes it highly suitable for the calculation of a wide range of similarity measures for use in virtual screening and for the investigation of quantitative structure-activity relationships in combination with advanced statistical approaches such as self-organizing maps, support vector machines and neural networks. In our present report we demonstrate the applicability of our novel methodology for scaffold hopping as well as virtual screening.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0268-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5842169PMC
March 2018

Pharao: pharmacophore alignment and optimization.

J Mol Graph Model 2008 Sep 11;27(2):161-9. Epub 2008 Apr 11.

Silicos NV, Wetenschapspark 7, B-3590 Diepenbeek, Belgium.

Within the context of early drug discovery, a new pharmacophore-based tool to score and align small molecules (Pharao) is described. The tool is built on the idea to model pharmacophoric features by Gaussian 3D volumes instead of the more common point or sphere representations. The smooth nature of these continuous functions has a beneficent effect on the optimization problem introduced during alignment. The usefulness of Pharao is illustrated by means of three examples: a virtual screening of trypsin-binding ligands, a virtual screening of phosphodiesterase 5-binding ligands, and an investigation of the biological relevance of an unsupervised clustering of small ligands based on Pharao.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmgm.2008.04.003DOI Listing
September 2008

More robust detection of motifs in coexpressed genes by using phylogenetic information.

BMC Bioinformatics 2006 Mar 20;7:160. Epub 2006 Mar 20.

ESAT-SCD/SISTA, K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

Background: Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates.

Results: We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm.

Conclusion: We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-7-160DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525208PMC
March 2006

A novel approach to identifying regulatory motifs in distantly related genomes.

Genome Biol 2005 30;6(13):R113. Epub 2005 Dec 30.

ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2005-6-13-r113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1414112PMC
July 2006

TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis.

Nucleic Acids Res 2005 Jul;33(Web Server issue):W393-6

Laboratory of Neurogenetics, Department of Human Genetics, Flanders Interuniversity Institute for Biotechnology and K.U.Leuven Belgium.

We present the second and improved release of the TOUCAN workbench for cis-regulatory sequence analysis. TOUCAN implements and integrates fast state-of-the-art methods and strategies in gene regulation bioinformatics, including algorithms for comparative genomics and for the detection of cis-regulatory modules. This second release of TOUCAN has become open source and thereby carries the potential to evolve rapidly. The main goal of TOUCAN is to allow a user to come to testable hypotheses regarding the regulation of a gene or of a set of co-regulated genes. TOUCAN can be launched from this location: http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gki354DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1160115PMC
July 2005

Assessing computational tools for the discovery of transcription factor binding sites.

Nat Biotechnol 2005 Jan;23(1):137-44

Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, Washington 98195-2350, USA.

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt1053DOI Listing
January 2005

Comprehensive analysis of the base composition around the transcription start site in Metazoa.

BMC Genomics 2004 Jun 1;5(1):34. Epub 2004 Jun 1.

Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, Belgium.

Background: The transcription start site of a metazoan gene remains poorly understood, mostly because there is no clear signal present in all genes. Now that several sequenced metazoan genomes have been annotated, we have been able to compare the base composition around the transcription start site for all annotated genes across multiple genomes.

Results: The most prominent feature in the base compositions is a significant local variation in G+C content over a large region around the transcription start site. The change is present in all animal phyla but the extent of variation is different between distinct classes of vertebrates, and the shape of the variation is completely different between vertebrates and arthropods. Furthermore, the height of the variation correlates with CpG frequencies in vertebrates but not in invertebrates and it also correlates with gene expression, especially in mammals. We also detect GC and AT skews in all clades (where %G is not equal to %C or %A is not equal to %T respectively) but these occur in a more confined region around the transcription start site and in the coding region.

Conclusions: The dramatic changes in nucleotide composition in humans are a consequence of CpG nucleotide frequencies and of gene expression, the changes in Fugu could point to primordial CpG islands, and the changes in the fly are of a totally different kind and unrelated to dinucleotide frequencies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-5-34DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC436054PMC
June 2004

In silico identification and experimental validation of PmrAB targets in Salmonella typhimurium by regulatory motif detection.

Genome Biol 2004 29;5(2):R9. Epub 2004 Jan 29.

ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

Background: The PmrAB (BasSR) two-component regulatory system is required for Salmonella typhimurium virulence. PmrAB-controlled modifications of the lipopolysaccharide (LPS) layer confer resistance to cationic antibiotic polypeptides, which may allow bacteria to survive within macrophages. The PmrAB system also confers resistance to Fe3+-mediated killing. New targets of the system have recently been discovered that seem not to have a role in the well-described functions of PmrAB, suggesting that the PmrAB-dependent regulon might contain additional, unidentified targets.

Results: We performed an in silico analysis of possible targets of the PmrAB system. Using a motif model of the PmrA binding site in DNA, genome-wide screening was carried out to detect PmrAB target genes. To increase confidence in the predictions, all putative targets were subjected to a cross-species comparison (phylogenetic footprinting) using a Gibbs sampling-based motif-detection procedure. As well as the known targets, we detected additional targets with unknown functions. Four of these were experimentally validated (yibD, aroQ, mig-13 and sseJ). Site-directed mutagenesis of the PmrA-binding site (PmrA box) in yibD revealed specific sequence requirements.

Conclusions: We demonstrated the efficiency of our procedure by recovering most of the known PmrAB-dependent targets and by identifying unknown targets that we were able to validate experimentally. We also pinpointed directions for further research that could help elucidate the S. typhimurium virulence pathway.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2004-5-2-r9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC395753PMC
March 2005

Computational detection of cis -regulatory modules.

Bioinformatics 2003 Oct;19 Suppl 2:ii5-14

Department of Electrical Engineering ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven, Belgium.

Motivation: The transcriptional regulation of a metazoan gene depends on the cooperative action of multiple transcription factors that bind to cis-regulatory modules (CRMs) located in the neighborhood of the gene. By integrating multiple signals, CRMs confer an organism specific spatial and temporal rate of transcription.

Results: Based on the hypothesis that genes that are needed in exactly the same conditions might share similar regulatory switches, we have developed a novel methodology to find CRMs in a set of coexpressed or coregulated genes. The ModuleSearcher algorithm finds for a given gene set the best scoring combination of transcription factor binding sites within a sequence window using an A(*)procedure for tree searching. To keep the level of noise low, we use DNA sequences that are most likely to contain functional cis-regulatory information, namely conserved regions between human and mouse orthologous genes. The ModuleScanner performs genomic searches with a predicted CRM or with a user-defined CRM known from the literature to find possible target genes. The validity of a set of putative targets is checked using Gene Ontology annotations. We demonstrate the use and effectiveness of the ModuleSearcher and ModuleScanner algorithms and test their specificity and sensitivity on semi-artificial data. Next, we search for a module in a cluster of gene expression profiles of human cell cycle genes.

Availability: The ModuleSearcher is available as a web service within the TOUCAN workbench for regulatory sequence analysis, which can be downloaded from http://www.esat.kuleuven.ac.be/~dna/BioI.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btg1052DOI Listing
October 2003

INCLUSive: A web portal and service registry for microarray and regulatory sequence analysis.

Nucleic Acids Res 2003 Jul;31(13):3468-70

ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.

INCLUSive is a suite of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval, and detection of known and unknown regulatory elements using probabilistic sequence models and Gibbs sampling. All tools are available via different web pages and as web services. The web pages are connected and integrated to reflect a methodology and facilitate complex analysis using different tools. The web services can be invoked using standard SOAP messaging. Example clients are available for download to invoke the services from a remote computer or to be integrated with other applications. All services are catalogued and described in a web service registry. The INCLUSive web portal is available for academic purposes at http://www.esat.kuleuven.ac.be/inclusive.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC169021PMC
http://dx.doi.org/10.1093/nar/gkg615DOI Listing
July 2003

Toucan: deciphering the cis-regulatory logic of coregulated genes.

Nucleic Acids Res 2003 Mar;31(6):1753-64

Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Heverlee, Leuven, Belgium.

TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information. Orthologs are aligned and syntenic regions are selected as candidate regulatory regions. Putative sites for known transcription factors are detected using our MotifScanner, which scores position weight matrices using a probabilistic model. New motifs are detected using our MotifSampler based on Gibbs sampling. Binding sites characteristic for a gene set--and thus statistically over-represented with respect to a reference sequence set--are found using a binomial test. We have validated Toucan by analyzing muscle-specific genes, liver-specific genes and E2F target genes; we have easily detected many known binding sites within intergenic DNA and identified new biologically plausible sites for known and unknown transcription factors. Software available at http://www.esat.kuleuven.ac. be/ approximately dna/BioI/Software.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC152870PMC
http://dx.doi.org/10.1093/nar/gkg268DOI Listing
March 2003

Genome-specific higher-order background models to improve motif detection.

Trends Microbiol 2003 Feb;11(2):61-6

ESAT SISTA-SCD, K.U.Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

Motif detection based on Gibbs sampling is a common procedure used to retrieve regulatory motifs in silico. Using a species-specific background model was previously shown to increase the robustness of the algorithm. Here, we demonstrate that selecting a non-species-adapted background model can have an adverse effect on the results of motif detection. The large differences in the average nucleotide composition of prokaryotic sequences exacerbate the problem of exchanging background models. Therefore, we have developed complex background models for all prokaryotic species with available genome sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/s0966-842x(02)00030-6DOI Listing
February 2003

Adaptive quality-based clustering of gene expression profiles.

Bioinformatics 2002 May;18(5):735-46

ESAT-SCD (SISTA), K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

Motivation: Microarray experiments generate a considerable amount of data, which analyzed properly help us gain a huge amount of biologically relevant information about the global cellular behaviour. Clustering (grouping genes with similar expression profiles) is one of the first steps in data analysis of high-throughput expression measurements. A number of clustering algorithms have proved useful to make sense of such data. These classical algorithms, though useful, suffer from several drawbacks (e.g. they require the predefinition of arbitrary parameters like the number of clusters; they force every gene into a cluster despite a low correlation with other cluster members). In the following we describe a novel adaptive quality-based clustering algorithm that tackles some of these drawbacks.

Results: We propose a heuristic iterative two-step algorithm: First, we find in the high-dimensional representation of the data a sphere where the "density" of expression profiles is locally maximal (based on a preliminary estimate of the radius of the cluster-quality-based approach). In a second step, we derive an optimal radius of the cluster (adaptive approach) so that only the significantly coexpressed genes are included in the cluster. This estimation is achieved by fitting a model to the data using an EM-algorithm. By inferring the radius from the data itself, the biologist is freed from finding an optimal value for this radius by trial-and-error. The computational complexity of this method is approximately linear in the number of gene expression profiles in the data set. Finally, our method is successfully validated using existing data sets.

Availability: http://www.esat.kuleuven.ac.be/~thijs/Work/Clustering.html
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/18.5.735DOI Listing
May 2002

A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes.

J Comput Biol 2002 ;9(2):447-64

ESAT-SCD, KULeuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.

Microarray experiments can reveal important information about transcriptional regulation. In our case, we look for potential promoter regulatory elements in the upstream region of coexpressed genes. Here we present two modifications of the original Gibbs sampling algorithm for motif finding (Lawrence et al., 1993). First, we introduce the use of a probability distribution to estimate the number of copies of the motif in a sequence. Second, we describe the technical aspects of the incorporation of a higher-order background model whose application we discussed in Thijs et al. (2001). Our implementation is referred to as the Motif Sampler. We successfully validate our algorithm on several data sets. First, we show results for three sets of upstream sequences containing known motifs: 1) the G-box light-response element in plants, 2) elements involved in methionine response in Saccharomyces cerevisiae, and 3) the FNR O(2)-responsive element in bacteria. We use these data sets to explain the influence of the parameters on the performance of our algorithm. Second, we show results for upstream sequences from four clusters of coexpressed genes identified in a microarray experiment on wounding in Arabidopsis thaliana. Several motifs could be matched to regulatory elements from plant defence pathways in our database of plant cis-acting regulatory elements (PlantCARE). Some other strong motifs do not have corresponding motifs in PlantCARE but are promising candidates for further analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/10665270252935566DOI Listing
October 2002

INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling.

Bioinformatics 2002 Feb;18(2):331-2

ESAT_ SISTA/COSIC, KULeuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

INCLUSive allows automatic multistep analysis of microarray data (clustering and motif finding). The clustering algorithm (adaptive quality-based clustering) groups together genes with highly similar expression profiles. The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that retrieves statistically over-represented motifs in sets of sequences, in this case upstream regions of co-expressed genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/18.2.331DOI Listing
February 2002

PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences.

Nucleic Acids Res 2002 Jan;30(1):325-7

Vakgroep Moleculaire Genetica, Departement Plantengenetica, Vlaams Interuniversitair Instituut voor Biotechnologie, Universiteit Gent, K. L. Ledeganckstraat 35, B-9000 Gent, Belgium.

PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC99092PMC
http://dx.doi.org/10.1093/nar/30.1.325DOI Listing
January 2002