Publications by authors named "Jerry Hoogenboom"

10 Publications

  • Page 1 of 1

Application of a probabilistic genotyping software to MPS mixture STR data is supported by similar trends in LRs compared with CE data.

Forensic Sci Int Genet 2021 May 26;52:102489. Epub 2021 Feb 26.

Division of Biological Traces, Netherlands Forensic Institute, The Hague, The Netherlands; University of Amsterdam, Swammerdam Institute for Life Sciences, Amsterdam, The Netherlands. Electronic address:

The interpretation of short tandem repeat (STR) profiles can be challenging when, for example, alleles are masked due to allele sharing among contributors and/or when they are subject to drop-out, for instance from sample degradation. Mixture interpretation can be improved by increasing the number of STRs and/or loci with a higher discriminatory power. Both capillary electrophoresis (CE, 6-dye) and massively parallel sequencing (MPS) provide a platform for analysing relatively large numbers of autosomal STRs. In addition, MPS enables distinguishing between sequence variants, resulting in enlarged discriminatory power. Also, MPS allows for small amplicon sizes for all loci as spacing is not an issue, which is beneficial with degraded DNA. Altogether, MPS has the potential to increase the weights of evidence for true contributors to (complex) DNA profiles. In this study, likelihood ratio (LR) calculations were performed using STR profiles obtained with two different MPS systems and analysed using different settings: 1) MPS PowerSeq™ Auto System profiles analysed using FDSTools equipped with optimized settings such as noise correction, 2) ForenSeq™ DNA Signature Prep Kit profiles analysed using the default settings in the Universal Analysis Software (UAS), and 3) ForenSeq™ DNA Signature Prep Kit profiles analysed using FDSTools empirically adapted to cope with one-directional reads and provisional, basic settings. The LR calculations used genotyping data for two- to four-person mixtures varying for mixture proportion, level of drop-out and allele sharing and were generated with the continuous model EuroForMix. The LR results for the over 2000 sets of propositions were affected by the variation for the number of markers and analysis settings used in the three approaches. Nevertheless, trends for true and non-contributors, effects of replicates, assigned number of contributors, and model validation results were comparable for the three MPS approaches and alike the trends known for CE data. Based on this analogy, we regard the probabilistic interpretation of MPS STR data fit for forensic DNA casework. In addition, guidelines were derived on when to apply LR calculations to MPS autosomal STR data and report the corresponding results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2021.102489DOI Listing
May 2021

STRNaming: Generating simple, informative names for sequenced STR alleles in a standardised and automated manner.

Forensic Sci Int Genet 2021 May 29;52:102473. Epub 2021 Jan 29.

Division of Biological Traces, Netherlands Forensic Institute, The Hague, The Netherlands.

The introduction of Massively Parallel Sequencing in the forensic domain has exposed the need for comprehensive nomenclature of sequenced Short Tandem Repeat (STR) alleles. In general, three strategies are at hand: 1) the full sequence mapped to the human genome reference sequence, which ensures exact data exchange; 2) shortened, human-readable formats for forensic reporting and data presentation and 3) very short codes that enable compact figures and tables but do not convey any sequence information. Here, we describe an algorithm of the second type: STRNaming, which generates human-readable names for sequenced STR alleles. STRNaming is guided by a reference sequence at each locus and then functions independently to automatically assign a unique, sequence-descriptive name that also includes the capillary electrophoresis allele number. STRNaming settings were established based on preferences that were surveyed internationally in the forensic community. These settings ensure that a small change in the sequence corresponds to a small change in the allele name, which is helpful for recognising for instance stutter products. Sequence variants outside of the repeat units are indicated as simple variant calls. Since the STR name is sequence-descriptive, the sequence can be traced back from the allele name. Because STRNaming is fully guided by an assignable reference sequence, no central coordination or configuration is required and the method will work for any STR locus, be it autosomal, Y-, X-chromosomal in current or future use. The algorithm is publicly available online and offline.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2021.102473DOI Listing
May 2021

Multi-laboratory validation of DNAxs including the statistical library DNAStatistX.

Forensic Sci Int Genet 2020 11 7;49:102390. Epub 2020 Sep 7.

Netherlands Forensic Institute, Division of Biological Traces, Laan van Ypenburg 6, 2497GB, The Hague, The Netherlands. Electronic address:

This study describes a multi-laboratory validation of DNAxs, a DNA eXpert System for the data management and probabilistic interpretation of DNA profiles [1], and its statistical library DNAStatistX to which, besides the organising laboratory, four laboratories participated. The software was modified to read multiple data formats and the study was performed prior to the release of the software to the forensic community. The first exercise explored all main functionalities of DNAxs with feedback on user-friendliness, installation and general performance. Next, every laboratory performed likelihood ratio (LR) calculations using their own dataset and a dataset provided by the organising laboratory. The organising laboratory performed LR calculations using all datasets. The datasets were generated with different STR typing kits or analysis systems and consisted of samples varying in DNA amounts, mixture ratios, number of contributors and drop-out level. Hypothesis sets had the correct, under- and over-assigned number of contributors and true and false donors as person of interest. When comparing the results between laboratories, the LRs were foremost within one unit on log10 scale. The few LR results that deviated more had differences for the parameters estimated by the optimizer within DNAStatistX. Some of these were indicated by failed iteration results, others by a failed model validation, since unrealistic hypotheses were included. When these results that do not meet the quality criteria were excluded, as is in accordance with interpretation guidelines, none of the analyses in the different laboratories yielded a different statement in the casework report. Nonetheless, changes in software parameters were sought that minimized differences in outcomes, which made the DNAStatistX module more robust. Overall, the software was found intuitive, user-friendly and valid for use in multiple laboratories.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2020.102390DOI Listing
November 2020

Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach.

Forensic Sci Int Genet 2019 11 23;43:102150. Epub 2019 Aug 23.

University of Amsterdam, the Netherlands; Ahold Delhaize, the Netherlands. Electronic address:

The number of contributors (NOC) to (complex) autosomal STR profiles cannot be determined with absolute certainty due to complicating factors such as allele sharing and allelic drop-out. The precision of NOC estimations can be improved by increasing the number of (highly polymorphic) markers, the use of massively parallel sequencing instead of capillary electrophoresis, and/or using more profile information than only the allele counts. In this study, we focussed on machine learning approaches in order to make maximum use of the profile information. To this end, a set of 590 PowerPlex® Fusion 6C profiles with one up to five contributors were generated from a total of 1174 different donors. This set varied for the template amount of DNA, mixture proportion, levels of allele sharing, allelic drop-out and degradation. The dataset contained labels with known NOC and was split into a training, test and hold-out set. The training set was used to optimize ten different algorithms with selection of profile characteristics. Per profile, over 250 characteristics, denoted 'features', were calculated. These features were based on allele counts, peak heights and allele frequencies. The features that were most related to the NOC were selected based on partial correlation using the training set. Next, the performance of each model (=combination of features plus algorithm) was examined using the test set. A random forest classifier with 19 features, denoted the 'RFC19-model' showed best performance and was selected for further validation. Results showed improved accuracy compared to the conventional maximum allele count approach and an in-house nC-tool based on the total allele count. The method is extremely fast and regarded useful for application in forensic casework.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2019.102150DOI Listing
November 2019

DNAxs/DNAStatistX: Development and validation of a software suite for the data management and probabilistic interpretation of DNA profiles.

Forensic Sci Int Genet 2019 09 21;42:81-89. Epub 2019 Jun 21.

Netherlands Forensic Institute, Division of Biological Traces, Laan van Ypenburg 6, 2497GB, The Hague, the Netherlands. Electronic address:

The data management, interpretation and comparison of sets of DNA profiles can be complex, time-consuming and error-prone when performed manually. This, combined with the growing numbers of genetic markers in forensic identification systems calls for expert systems that can automatically compare genotyping results within (large) sets of DNA profiles and assist in profile interpretation. To that aim, we developed a user-friendly software program or DNA eXpert System that is denoted DNAxs. This software includes features to view, infer and match autosomal short tandem repeat profiles with connectivity to up and downstream software programs. Furthermore, DNAxs has imbedded the 'DNAStatistX' module, a statistical library that contains a probabilistic algorithm to calculate likelihood ratios (LRs). This algorithm is largely based on the source code of the quantitative probabilistic genotyping system EuroForMix [1]. The statistical library, DNAStatistX, supports parallel computing which can be delegated to a computer cluster and enables automated queuing of requested LR calculations. DNAStatistX is written in Java and is accessible separately or via DNAxs. Using true and non-contributors to DNA profiles with up to four contributors, the DNAStatistX accuracy and precision were assessed by comparing the DNAStatistX results to those of EuroForMix. Results were the same up to rare differences that could be attributed to the different optimizers used in both software programs. Implementation of dye specific detection thresholds resulted in larger likelihood values and thus a better explanation of the data used in this study. Furthermore, processing time, robustness of DNAStatistX results and the circumstances under which model validations failed were examined. Finally, guidelines for application of the software are shared as an example. The DNAxs software is future-proof as it applies a modular approach by which novel functionalities can be incorporated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2019.06.015DOI Listing
September 2019

Human-associated microbial populations as evidence in forensic casework.

Forensic Sci Int Genet 2018 09 30;36:176-185. Epub 2018 Jun 30.

Netherlands Forensic Institute, P.O. Box 24044, 2490 AA, The Hague, The Netherlands.

In forensic investigations involving human biological traces, cell type identification is often required. Identifying the cell type from which a human STR profile has originated can assist in verifying scenarios. Several techniques have been developed for this purpose, most of which focus on molecular characteristics of human cells. Here we present a microarray method focusing on the microbial populations that are associated with human cell material. A microarray with 863 probes targeting (sets of) species, specific genera, groups of genera or families was designed for this study and evaluated with samples from different body sites: hand, foot, groin, penis, vagina, mouth and faeces. In total 175 samples from healthy individuals were analysed. Next to human faeces, 15 feline and 15 canine faeces samples were also included. Both clustering and classification analysis were used for data analysis. Faecal and oral samples could clearly be distinguished from vaginal and skin samples, and also canine and feline faeces could be differentiated from human faeces. Some penis samples showed high similarity to vaginal samples, others to skin samples. Discriminating between skin samples from different skin sites proved to be challenging. As a proof of principle, twenty-one mock case samples were analysed with the microarray method. All mock case samples were clustered or classified within the correct main cluster/group. Only two of the mock case samples were assigned to the wrong sub-cluster/class; with classification one additional sample was classified within the wrong sub-class. Overall, the microarray method is a valuable addition to already existing cell typing techniques. Combining the results of microbial population analysis with for instance mRNA typing can increase the evidential value of a trace, since both techniques focus on independent targets within a sample.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2018.06.020DOI Listing
September 2018

An image-processing methodology for extracting bloodstain pattern features.

Forensic Sci Int 2017 Aug 3;277:122-132. Epub 2017 Jun 3.

Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, Netherlands.

There is a growing trend in forensic science to develop methods to make forensic pattern comparison tasks more objective. This has generally involved the application of suitable image-processing methods to provide numerical data for identification or comparison. This paper outlines a unique image-processing methodology that can be utilised by analysts to generate reliable pattern data that will assist them in forming objective conclusions about a pattern. A range of features were defined and extracted from a laboratory-generated impact spatter pattern. These features were based in part on bloodstain properties commonly used in the analysis of spatter bloodstain patterns. The values of these features were consistent with properties reported qualitatively for such patterns. The image-processing method developed shows considerable promise as a way to establish measurable discriminating pattern criteria that are lacking in current bloodstain pattern taxonomies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.forsciint.2017.05.022DOI Listing
August 2017

Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

Forensic Sci Int Genet 2017 09 16;30:66-70. Epub 2017 Jun 16.

Institut de Biologia Evolutiva (UPF-CSIC), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain. Electronic address:

We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. F between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10) and 1-(5.9×10), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10) and 1-(2.0×10).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2017.06.006DOI Listing
September 2017

FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise.

Forensic Sci Int Genet 2017 03 27;27:27-40. Epub 2016 Nov 27.

Department of Human Genetics, Leiden University Medical Center, Leiden, 2300 RC, The Netherlands. Electronic address:

Massively parallel sequencing (MPS) is on the advent of a broad scale application in forensic research and casework. The improved capabilities to analyse evidentiary traces representing unbalanced mixtures is often mentioned as one of the major advantages of this technique. However, most of the available software packages that analyse forensic short tandem repeat (STR) sequencing data are not well suited for high throughput analysis of such mixed traces. The largest challenge is the presence of stutter artefacts in STR amplifications, which are not readily discerned from minor contributions. FDSTools is an open-source software solution developed for this purpose. The level of stutter formation is influenced by various aspects of the sequence, such as the length of the longest uninterrupted stretch occurring in an STR. When MPS is used, STRs are evaluated as sequence variants that each have particular stutter characteristics which can be precisely determined. FDSTools uses a database of reference samples to determine stutter and other systemic PCR or sequencing artefacts for each individual allele. In addition, stutter models are created for each repeating element in order to predict stutter artefacts for alleles that are not included in the reference set. This information is subsequently used to recognise and compensate for the noise in a sequence profile. The result is a better representation of the true composition of a sample. Using Promega Powerseq™ Auto System data from 450 reference samples and 31 two-person mixtures, we show that the FDSTools correction module decreases stutter ratios above 20% to below 3%. Consequently, much lower levels of contributions in the mixed traces are detected. FDSTools contains modules to visualise the data in an interactive format allowing users to filter data with their own preferred thresholds.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2016.11.007DOI Listing
March 2017

Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system.

Forensic Sci Int Genet 2016 09 7;24:86-96. Epub 2016 Jun 7.

Forensic Laboratory for DNA Research, Department of Human Genetics, Leiden University Medical Centre, Postzone S 05 P, P.O. Box 9600, 2300 RC Leiden, The Netherlands. Electronic address:

Current forensic DNA analysis predominantly involves identification of human donors by analysis of short tandem repeats (STRs) using Capillary Electrophoresis (CE). Recent developments in Massively Parallel Sequencing (MPS) technologies offer new possibilities in analysis of STRs since they might overcome some of the limitations of CE analysis. In this study 17 STRs and Amelogenin were sequenced in high coverage using a prototype version of the Promega PowerSeq™ system for 297 population samples from the Netherlands, Nepal, Bhutan and Central African Pygmies. In addition, 45 two-person mixtures with different minor contributions down to 1% were analysed to investigate the performance of this system for mixed samples. Regarding fragment length, complete concordance between the MPS and CE-based data was found, marking the reliability of MPS PowerSeq™ system. As expected, MPS presented a broader allele range and higher power of discrimination and exclusion rate. The high coverage sequencing data were used to determine stutter characteristics for all loci and stutter ratios were compared to CE data. The separation of alleles with the same length but exhibiting different stutter ratios lowers the overall variation in stutter ratio and helps in differentiation of stutters from genuine alleles in mixed samples. All alleles of the minor contributors were detected in the sequence reads even for the 1% contributions, but analysis of mixtures below 5% without prior information of the mixture ratio is complicated by PCR and sequencing artefacts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fsigen.2016.05.016DOI Listing
September 2016