Publications by authors named "Lawrence A Adutwum"

6 Publications

  • Page 1 of 1

Unique ion filter-A data reduction tool for chemometric analysis of raw comprehensive two-dimensional gas chromatography-mass spectrometry data.

J Sep Sci 2021 May 1. Epub 2021 May 1.

Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada.

Comprehensive gas chromatography with time of flight mass spectrometry is a powerful tool in the analysis of complex samples. Chemometric analysis of raw chromatographic data is more useful in one- and two-dimensional separations relative to peak tables. The data volume from such experiments generally necessitates the use of data reduction tools. Such tools often sacrifice some of the multivariate information in the mass to charge ratio dimension. The unique ion filter reduces the over-redundancy in two-dimensional gas chromatography-mass spectrometry data by limiting the data to a few unique/pseudo-unique ions, sub-peaks/slices in the first dimension, and spectra in the second dimension. We explore the performance of this algorithm through careful inspection of two-dimensional gas chromatography-mass spectrometry data before and after application of the filter. A reduction (99%) in the number of variables in a two-dimensional gas chromatography-mass spectrometry chromatogram passed on to subsequent analysis was observed. Feature selection times for model optimization reduced from 229 (±13) to 6.8 (±0.5) min when the filter was applied. An estimate of two unique/pseudo-unique ions, one sub-peak in the first dimension and five spectra in the second dimension were considered to provide a true representation of each chromatogram and provided enough information to achieve 100% model prediction accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/jssc.202001127DOI Listing
May 2021

Solving the Coloring Problem in Half-Heusler Structures: Machine-Learning Predictions and Experimental Validation.

Inorg Chem 2019 Jul 25;58(14):9280-9289. Epub 2019 Jun 25.

Department of Chemistry , University of Alberta , Edmonton , Alberta T6G 2G2 , Canada.

The site preferences within the structures of half-Heusler compounds have been evaluated through a machine-learning approach. A support-vector machine algorithm was applied to develop a model which was trained on 179 experimentally reported structures and 23 descriptors based solely on the chemical composition. The model gave excellent performance, with sensitivity of 93%, selectivity of 96%, and accuracy of 95%. As an illustration of data sanitization, two compounds (GdPtSb, HoPdBi) flagged by the model to have potentially incorrect site assignments were resynthesized and structurally characterized. The predictions of the correct site assignments from the machine-learning model were confirmed by single-crystal and powder X-ray diffraction analysis. These site assignments also corresponded to the lowest total energy configurations as revealed from first-principles calculations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.inorgchem.9b00987DOI Listing
July 2019

How To Optimize Materials and Devices via Design of Experiments and Machine Learning: Demonstration Using Organic Photovoltaics.

ACS Nano 2018 Aug 20;12(8):7434-7444. Epub 2018 Jul 20.

Department of Chemistry , University of Alberta , 11227 Saskatchewan Drive , Edmonton , AB T6G 2G2 , Canada.

Most discoveries in materials science have been made empirically, typically through one-variable-at-a-time (Edisonian) experimentation. The characteristics of materials-based systems are, however, neither simple nor uncorrelated. In a device such as an organic photovoltaic, for example, the level of complexity is high due to the sheer number of components and processing conditions, and thus, changing one variable can have multiple unforeseen effects due to their interconnectivity. Design of Experiments (DoE) is ideally suited for such multivariable analyses: by planning one's experiments as per the principles of DoE, one can test and optimize several variables simultaneously, thus accelerating the process of discovery and optimization while saving time and precious laboratory resources. When combined with machine learning, the consideration of one's data in this manner provides a different perspective for optimization and discovery, akin to climbing out of a narrow valley of serial (one-variable-at-a-time) experimentation, to a mountain ridge with a 360° view in all directions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acsnano.8b04726DOI Listing
August 2018

Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC.

J Am Chem Soc 2017 12 28;139(49):17870-17881. Epub 2017 Nov 28.

Department of Chemistry, University of Houston , Houston, Texas 77204, United States.

A method to predict the crystal structure of equiatomic ternary compositions based only on the constituent elements was developed using cluster resolution feature selection (CR-FS) and support vector machine (SVM) classification. The supervised machine-learning model was first trained with 1037 individual compounds that adopt the most populated ternary 1:1:1 structure types (TiNiSi-, ZrNiAl-, PbFCl-, LiGaGe-, YPtAs-, UGeTe-, and LaPtSi-type) and then validated using an additional 519 compounds. The CR-FS algorithm improves class discrimination and indicates that 113 variables including size, electronegativity, number of valence electrons, and position on the periodic table (group number) influence the structure preference. The final model prediction sensitivity, specificity, and accuracy were 97.3%, 93.9%, and 96.9%, respectively, establishing that this method is capable of reliably predicting the crystal structure given only its composition. The power of CR-FS and SVM classification is further demonstrated by segregating the crystal structure of polymorphs, specifically to examine polymorphism in TiNiSi- and ZrNiAl-type structures. Analyzing 19 compositions that are experimentally reported in both structure types, this machine-learning model correctly identifies, with high confidence (>0.7), the low-temperature polymorph from its high-temperature form. Interestingly, machine learning also reveals that certain compositions cannot be clearly differentiated and lie in a "confused" region (0.3-0.7 confidence), suggesting that both polymorphs may be observed in a single sample at certain experimental conditions. The ensuing synthesis and characterization of TiFeP adopting both TiNiSi- and ZrNiAl-type structures in a single sample, even after long annealing times (3 months), validate the occurrence of the region of structural uncertainty predicted by machine learning.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/jacs.7b08460DOI Listing
December 2017

Total Ion Spectra versus Segmented Total Ion Spectra as Preprocessing Tools for Gas Chromatography - Mass Spectrometry Data.

J Forensic Sci 2018 Jul 10;63(4):1059-1068. Epub 2017 Oct 10.

Department of Chemistry, Univeristy of Alberta, Edmonton, Alberta, Canada.

Alignment of fire debris data from GC-MS for chemometric analysis is challenged by highly variable, uncontrolled sample and matrix composition. The total ion spectrum (TIS) obviates the need for alignment but loses all separation information. We introduce the segmented total ion spectrum (STIS), which retains the advantages of TIS while retaining some retention information. We compare the performance of STIS with TIS for the classification of casework fire debris samples. TIS and STIS achieve good model prediction accuracies of 96% and 98%, respectively. Baseline removal improved model prediction accuracies for both TIS and STIS to 97% and 99%, respectively. The importance of maintaining some chromatographic information to aid in deciphering the underlying chemistry of the results and reasons for false positive/negative results was also examined.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1556-4029.13657DOI Listing
July 2018

Estimation of start and stop numbers for cluster resolution feature selection algorithm: an empirical approach using null distribution analysis of Fisher ratios.

Anal Bioanal Chem 2017 Nov 29;409(28):6699-6708. Epub 2017 Sep 29.

Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, Alberta, T6G 2G2, Canada.

Cluster resolution feature selection (CR-FS) is a hybrid feature selection algorithm which involves the evaluation of ranked variables via sequential backward elimination (SBE) and sequential forward selection (SFS). The implementation of CR-FS requires two main inputs, namely, start and stop number. The start number is the number of the highly ranked variables for the SBE while the stop number is the point at which the search for additional features during the SFS stage is halted. The setting of these critical parameters has always relied on trial and error which introduced subjectivity in the results obtained. The start and stop numbers are known to vary with each dataset. Drawing inspiration from overlapping coefficients, a method for comparing two probability density functions, empirical equations toward the estimation of start and stop number for a dataset were developed. All of the parameters in the empirical equations are obtained from the comparisons of the two probability density functions except the constant termed d. The equations were optimized using three real-world datasets. The optimum range of d was determined to be 0.48 to 0.57. An implementation of CR-FS using two new datasets demonstrated the validity of this approach. Partial least squares discriminant analysis (PLS-DA) model prediction accuracies increased from 90 and 96 to 100% for both datasets using start and stop numbers calculated with this approach. Additionally, there was a twofold increase in the explained variance captured in the first two principal components. Graphical abstract Here, we describe how to determine the start and stop numbers for an automated feature selection routine, ensuring that you get the best model you can for your data with minimal effort.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00216-017-0628-8DOI Listing
November 2017