240 results match your criteria Applied Psychological Measurement[Journal]


An Item Response Model for True-False Exams Based on Signal Detection Theory.

Appl Psychol Meas 2020 May 23;44(3):234-248. Epub 2019 Apr 23.

Columbia University, New York, NY, USA.

A true-false exam can be viewed as being a signal detection task-the task is to detect whether or not an item is true (signal) or false (noise). In terms of signal detection theory (SDT), examinees can be viewed as performing the task by comparing the perceived plausibility of an item (a perceptual component) to a threshold that delineates true from false (a decision component). The resulting model is distinct yet is related to item response theory (IRT) models and grade of membership models, with the difference that SDT explicitly recognizes the role of examinees' perceptions in determining their response to an item. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619843823DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174808PMC

Approximating Bifactor IRT True-Score Equating With a Projective Item Response Model.

Appl Psychol Meas 2020 May 13;44(3):215-218. Epub 2019 Nov 13.

The University of North Carolina at Greensboro, USA.

Item response theory (IRT) true-score equating for the bifactor model is often conducted by first numerically integrating out specific factors from the item response function and then applying the unidimensional IRT true-score equating method to the marginalized bifactor model. However, an alternative procedure for obtaining the marginalized bifactor model is through projecting the nuisance dimensions of the bifactor model onto the dominant dimension. Projection, which can be viewed as an approximation to numerical integration, has an advantage over numerical integration in providing item parameters for the marginalized bifactor model; therefore, projection could be used with existing equating software packages that require item parameters. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619885903DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174803PMC

Bias of Two-Level Scalability Coefficients and Their Standard Errors.

Appl Psychol Meas 2020 May 14;44(3):197-214. Epub 2019 May 14.

University of Amsterdam, The Netherlands.

Two-level Mokken scale analysis is a generalization of Mokken scale analysis for multi-rater data. The bias of estimated scalability coefficients for two-level Mokken scale analysis, the bias of their estimated standard errors, and the coverage of the confidence intervals has been investigated, under various testing conditions. It was found that the estimated scalability coefficients were unbiased in all tested conditions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619843821DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174805PMC

A Dynamic Stratification Method for Improving Trait Estimation in Computerized Adaptive Testing Under Item Exposure Control.

Appl Psychol Meas 2020 May 23;44(3):182-196. Epub 2019 Apr 23.

National Chung Cheng University, Chiayi.

When computerized adaptive testing (CAT) is under stringent item exposure control, the precision of trait estimation will substantially decrease. A new item selection method, the dynamic Stratification method based on Dominance Curves (SDC), which is aimed at improving trait estimation, is proposed to mitigate this problem. The objective function of the SDC in item selection is to maximize the sum of test information for all examinees rather than maximizing item information for individual examinees at a single-item administration, as in conventional CAT. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619843820DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174806PMC

Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures.

Authors:
Wenchao Ma

Appl Psychol Meas 2020 May 14;44(3):167-181. Epub 2019 May 14.

The University of Alabama, Tuscaloosa, AL, USA.

Limited-information fit measures appear to be promising in assessing the goodness-of-fit of dichotomous response cognitive diagnosis models (CDMs), but their performance has not been examined for polytomous response CDMs. This study investigates the performance of the statistic and standardized root mean square residual (SRMSR) for an ordinal response CDM-the sequential generalized deterministic inputs, noisy "and" gate model. Simulation studies showed that the statistic had well-calibrated Type I error rates, but the correct detection rates were influenced by various factors such as item quality, sample size, and the number of response categories. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619843829DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174807PMC

Reliability for Tests With Items Having Different Numbers of Ordered Categories.

Appl Psychol Meas 2020 Mar 20;44(2):137-149. Epub 2019 Mar 20.

University of Georgia, Athens, USA.

This study describes a structural equation modeling (SEM) approach to reliability for tests with items having different numbers of ordered categories. A simulation study is provided to compare the performance of this reliability coefficient, coefficient alpha and population reliability for tests having items with different numbers of ordered categories, a one-factor and a bifactor structures, and different skewness distributions of test scores. Results indicated that the proposed reliability coefficient was close to the population reliability in most conditions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619835498DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003185PMC

MIMIC Models for Uniform and Nonuniform DIF as Moderated Mediation Models.

Appl Psychol Meas 2020 Mar 12;44(2):118-136. Epub 2019 Apr 12.

University of California, Los Angeles, USA.

In this article, the authors describe how multiple indicators multiple cause (MIMIC) models for studying uniform and nonuniform differential item functioning (DIF) can be conceptualized as mediation and moderated mediation models. Conceptualizing DIF within the context of a moderated mediation model helps to understand DIF as the effect of some variable on measurements that is not accounted for by the latent variable of interest. In addition, useful concepts and ideas from the mediation and moderation literature can be applied to DIF analysis: (a) improving the understanding of uniform and nonuniform DIF as direct effects and interactions, (b) understanding the implication of indirect effects in DIF analysis, (c) clarifying the interpretation of the "uniform DIF parameter" in the presence of nonuniform DIF, and (d) probing interactions and using the concept of "conditional effects" to better understand the patterns of DIF across the range of the latent variable. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619835496DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003182PMC

Testing the Local Independence Assumption of the Rasch Model With -Based Nonparametric Model Tests.

Appl Psychol Meas 2020 Mar 31;44(2):103-117. Epub 2019 Mar 31.

Alpen-Adria-Universität Klagenfurt, Austria.

Local independence is a central assumption of commonly used item response theory models. Violations of this assumption are usually tested using test statistics based on item pairs. This study presents two quasi-exact tests based on the statistic for testing the hypothesis of local independence in the Rasch model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619835501DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003184PMC

Framework for Developing Multistage Testing With Intersectional Routing for Short-Length Tests.

Appl Psychol Meas 2020 Mar 20;44(2):87-102. Epub 2019 Mar 20.

Graduate Management Admission Council, Reston, VA, USA.

Multistage testing (MST) has many practical advantages over typical item-level computerized adaptive testing (CAT), but there is a substantial tradeoff when using MST because of its reduced level of adaptability. In typical MST, the first stage almost always performs as a routing stage in which all test takers see a linear test form. If multiple test sections measure different but moderately or highly correlated traits, then a score estimate for one section might be capable of adaptively selecting item modules for following sections without having to administer routing stages repeatedly for each section. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619837226DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003183PMC

A Sequential Higher Order Latent Structural Model for Hierarchical Attributes in Cognitive Diagnostic Assessments.

Appl Psychol Meas 2020 Jan 4;44(1):65-83. Epub 2019 Mar 4.

Jiangxi Normal University, Nanchang, China.

The higher-order structure and attribute hierarchical structure are two popular approaches to defining the latent attribute space in cognitive diagnosis models. However, to our knowledge, it is still impossible to integrate them to accommodate the higher-order latent trait and hierarchical attributes simultaneously. To address this issue, this article proposed a sequential higher-order latent structural model (LSM) by incorporating various hierarchical structures into a higher-order latent structure. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619832935DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6906392PMC
January 2020

A Blocked-CAT Procedure for CD-CAT.

Appl Psychol Meas 2020 Jan 19;44(1):49-64. Epub 2019 Mar 19.

The University of Hong Kong, Hong Kong.

This article introduces a blocked-design procedure for cognitive diagnosis computerized adaptive testing (CD-CAT), which allows examinees to review items and change their answers during test administration. Four blocking versions of the new procedure were proposed. In addition, the impact of several factors, namely, item quality, generating model, block size, and test length, on the classification rates was investigated. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619835500DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6906391PMC
January 2020

A Psychometric Model for Discrete-Option Multiple-Choice Items.

Appl Psychol Meas 2020 Jan 19;44(1):33-48. Epub 2019 Mar 19.

Ericsson, Inc., Santa Clara, CA, USA.

Discrete-option multiple-choice (DOMC) items differ from traditional multiple-choice (MC) items in the sequential administration of response options (up to display of the correct option). DOMC can be appealing in computer-based test administrations due to its protection of item security and its potential to reduce testwiseness effects. A psychometric model for DOMC items that attends to the random positioning of key location across different administrations of the same item is proposed, a feature that has been shown to affect DOMC item difficulty. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619835499DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6906389PMC
January 2020

Multidimensional Test Assembly Using Mixed-Integer Linear Programming: An Application of Kullback-Leibler Information.

Appl Psychol Meas 2020 Jan 25;44(1):17-32. Epub 2019 Feb 25.

Educational Testing Service, Princeton, NJ, USA.

Many educational testing programs require different test forms with minimal or no item overlap. At the same time, the test forms should be parallel in terms of their statistical and content-related properties. A well-established method to assemble parallel test forms is to apply combinatorial optimization using mixed-integer linear programming (MILP). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621619827586DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6906390PMC
January 2020

New Efficient and Practicable Adaptive Designs for Calibrating Items Online.

Appl Psychol Meas 2020 Jan 30;44(1):3-16. Epub 2019 Jan 30.

Beijing Normal University, China.

When calibrating new items online, it is practicable to first compare all new items according to some criterion and then assign the most suitable one to the current examinee who reaches a seeding location. The modified D-optimal design proposed by van der Linden and Ren (denoted as D-VR design) works within this practicable framework with the aim of directly optimizing the estimation of item parameters. However, the optimal design point for a given new item should be obtained by comparing all examinees in a static examinee pool. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618824854DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6906388PMC
January 2020

Joint Modeling of Compensatory Multidimensional Item Responses and Response Times.

Appl Psychol Meas 2019 Nov 22;43(8):639-654. Epub 2019 Feb 22.

Zhejiang Normal University, Jinhua, China.

Computer-based testing (CBT) is becoming increasingly popular in assessing test-takers' latent abilities and making inferences regarding their cognitive processes. In addition to collecting item responses, an important benefit of using CBT is that response times (RTs) can also be recorded and used in subsequent analyses. To better understand the structural relations between multidimensional cognitive attributes and the working speed of test-takers, this research proposes a joint-modeling approach that integrates compensatory multidimensional latent traits and response speediness using item responses and RTs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618824853DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6745633PMC
November 2019

An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model.

Appl Psychol Meas 2019 Nov 23;43(8):624-638. Epub 2019 Jan 23.

Veterans Administration New Jersey Health Care System, East Orange, NJ, USA.

The purpose of this simulation study was to investigate the effect of several different item exposure control procedures in computerized adaptive testing (CAT) with variable-length stopping rules using the partial credit model. Previous simulation studies on CAT exposure control methods with polytomous items rarely considered variable-length tests. The four exposure control techniques examined were the randomesque with a group of three items, randomesque with a group of six items, progressive-restricted standard error (PR-SE), and no exposure control. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618824856DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6745632PMC
November 2019

Clarifying the Effect of Test Speededness.

Appl Psychol Meas 2019 Nov 19;43(8):611-623. Epub 2018 Dec 19.

University of Notre Dame, IN, USA.

In the context of high-stakes tests, test takers who do not have enough time to complete a test rush toward the end and may engage in speeded behavior when tests do not penalize guessing. Using mathematical derivations and simulations, previous research showed that random guessing responses should attenuate interitem correlations, and therefore, decrease estimates of reliability. Meanwhile, other researchers showed that random guessing could in fact inflate reliability estimates using real data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618817783DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6745631PMC
November 2019

Linking With External Covariates: Examining Accuracy by Anchor Type, Test Length, Ability Difference, and Sample Size.

Appl Psychol Meas 2019 Nov 14;43(8):597-610. Epub 2019 Feb 14.

Umeå University, Sweden.

Research has recently demonstrated the use of multiple anchor tests and external covariates to supplement or substitute for common anchor items when linking and equating with nonequivalent groups. This study examines the conditions under which external covariates improve linking and equating accuracy, with internal and external anchor tests of varying lengths and groups of differing abilities. Pseudo forms of a state science test were equated within a resampling study where sample size ranged from 1,000 to 10,000 examinees and anchor tests ranged in length from eight to 20 items, with reading and math scores included as covariates. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618824855DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6745630PMC
November 2019

Diagnostic Test Score Validation With a Fallible Criterion.

Authors:
Paul A Jewsbury

Appl Psychol Meas 2019 Nov 13;43(8):579-596. Epub 2018 Dec 13.

Educational Testing Service, Princeton, NJ, USA.

Criterion-related validation of diagnostic test scores for a construct of interest is complicated by the unavailability of the construct directly. The standard method, Known Group Validation, assumes an infallible reference test in place of the construct, but infallible reference tests are rare. In contrast, Mixed Group Validation allows for a fallible reference test, but has been found to make strong assumptions not appropriate for the majority of diagnostic test validation studies. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618817785DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6745629PMC
November 2019

Reporting Valid and Reliable Overall Scores and Domain Scores Using Bi-Factor Model.

Appl Psychol Meas 2019 Oct 10;43(7):562-576. Epub 2018 Dec 10.

Beijing Normal University, China.

Recently, large-scale testing programs have an increasing interest in providing examinees with more accurate diagnostic information by reporting overall and domain scores simultaneously. However, there are few studies focusing on how to report and interpret reliable total scores and domain scores based on bi-factor models. In this study, the authors introduced six methods of reporting overall and domain scores as weighted composite scores of the general and specific factors in a bi-factor model, and compared their performance with Yao's MIRT (multidimensional item response theory) method using both simulated and empirical data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618813093DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739746PMC
October 2019
4 Reads

Nonparametric CAT for CD in Educational Settings With Small Samples.

Appl Psychol Meas 2019 Oct 10;43(7):543-561. Epub 2018 Dec 10.

National Taiwan Normal University, Taipei, Taiwan.

Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees' proficiency class membership. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618813113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739744PMC
October 2019

Q-Matrix Refinement Based on Item Fit Statistic RMSEA.

Appl Psychol Meas 2019 Oct 4;43(7):527-542. Epub 2018 Dec 4.

Zhejiang Normal University, Jinhua City, PR China.

A Q-matrix, which reflects how attributes are measured for each item, is necessary when applying a cognitive diagnosis model to an assessment. In most cases, the Q-matrix is constructed by experts in the field and may be subjective and incorrect. One efficient method to refine the Q-matrix is to employ a suitable statistic that is calculated using response data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618813104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739743PMC
October 2019

A Comparison of the Separate and Concurrent Calibration Methods for the Full-Information Bifactor model.

Authors:
Kyung Yong Kim

Appl Psychol Meas 2019 Oct 30;43(7):512-526. Epub 2018 Nov 30.

University of North Carolina at Greensboro, USA.

When calibrating items using multidimensional item response theory (MIRT) models, item response theory (IRT) calibration programs typically set the probability density of latent variables to a multivariate standard normal distribution to handle three types of indeterminacies: (a) the location of the origin, (b) the unit of measurement along each coordinate axis, and (c) the orientation of the coordinate axes. However, by doing so, item parameter estimates obtained from two independent calibration runs on nonequivalent groups are on two different coordinate systems. To handle this issue and place all the item parameter estimates on a common coordinate system, a process called linking is necessary. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618813095DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739745PMC
October 2019

A Sequential Process Model for Cognitive Diagnostic Assessment With Repeated Attempts.

Appl Psychol Meas 2019 Oct 12;43(7):495-511. Epub 2018 Dec 12.

University of Taipei, Taipei, Taiwan.

When diagnostic assessments are administered to examinees, the mastery status of each examinee on a set of specified cognitive skills or attributes can be directly evaluated using cognitive diagnosis models (CDMs). Under certain circumstances, allowing the examinees to have at least one opportunity to correctly answer the questions and assessments, with repeated attempts on the items, provides many potential benefits. A sequential process model can be extended to model repeated attempts in diagnostic assessments. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618813111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739742PMC
October 2019

Some Problems With the Analytical Argument in Support of RP67 in the Context of the Bookmark Standard Setting Method.

Authors:
Peter Baldwin

Appl Psychol Meas 2019 Sep 3;43(6):481-492. Epub 2018 Oct 3.

National Board of Medical Examiners, Philadelphia, PA, USA.

The choice of response probability in the bookmark method has been shown to affect outcomes in important ways. These findings have implications for the validity of the bookmark method because panelists' inability to internally adjust when given different response probabilities suggests that they are not performing the intended judgment task. In response to the concerns these findings raise, proponents of the bookmark method argue that such concerns can be addressed by using a response probability of . Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618800272DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696871PMC
September 2019

Multidimensional Computerized Adaptive Testing Using Non-Compensatory Item Response Theory Models.

Appl Psychol Meas 2019 Sep 26;43(6):464-480. Epub 2018 Oct 26.

The Education University of Hong Kong, Tai Po, Hong Kong.

Current use of multidimensional computerized adaptive testing (MCAT) has been developed in conjunction with compensatory multidimensional item response theory (MIRT) models rather than with non-compensatory ones. In recognition of the usefulness of MCAT and the complications associated with non-compensatory data, this study aimed to develop MCAT algorithms using non-compensatory MIRT models and to evaluate their performance. For the purpose of the study, three item selection methods were adapted and compared, namely, the Fisher information method, the mutual information method, and the Kullback-Leibler information method. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618800280DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696872PMC
September 2019

A Two-Parameter Logistic Extension Model: .

Appl Psychol Meas 2019 Sep 29;43(6):449-463. Epub 2018 Sep 29.

Northeast Normal University, Changchun, China.

A three-parameter logistic (3PL) model variant, named the two-parameter logistic extension (2PLE) model, was developed. This new model employs a function that integrates item features according to an examinee's ability level instead of a fixed guessing parameter used in the 3PL model to quantify guessing behavior. Correct response probabilities from a solution behavior and guessing behavior increase as the level of ability increases. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618800273DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696869PMC
September 2019

Item Response Theory Modeling for Examinee-selected Items with Rater Effect.

Appl Psychol Meas 2019 Sep 8;43(6):435-448. Epub 2018 Oct 8.

The Education University of Hong Kong, Tai Po, Hong Kong.

Some large-scale testing requires examinees to select and answer a fixed number of items from given items (e.g., select one out of the three items). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618798667DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696873PMC
September 2019

Application of Dimension Reduction to CAT Item Selection Under the Bifactor Model.

Appl Psychol Meas 2019 Sep 27;43(6):419-434. Epub 2018 Nov 27.

Beijing Normal University, Beijing, China.

Multidimensional computerized adaptive testing (MCAT) based on the bifactor model is suitable for tests with multidimensional bifactor measurement structures. Several item selection methods that proved to be more advantageous than the maximum Fisher information method are not practical for bifactor MCAT due to time-consuming computations resulting from high dimensionality. To make them applicable in bifactor MCAT, dimension reduction is applied to four item selection methods, which are the posterior-weighted Fisher D-optimality (PDO) and three non-Fisher information-based methods-posterior expected Kullback-Leibler information (PKL), continuous entropy (CE), and mutual information (MI). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618813086DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696870PMC
September 2019

: An R Package for Psychometric Meta-Analysis.

Appl Psychol Meas 2019 Jul 5;43(5):415-416. Epub 2018 Sep 5.

University of South Florida, Tampa, USA.

Over the past four decades, psychometric meta-analysis (PMA) has emerged a key way that psychological disciplines build cumulative scientific knowledge. Despite the importance and popularity of PMA, software implementing the method has tended to be closed-source, inflexible, limited in terms of the psychometric corrections available, cumbersome to use for complex analyses, and/or costly. To overcome these limitations, we created the R package: a free, open-source, comprehensive program for PMA. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618795933DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572911PMC
July 2019
1 Read

Improved Wald Statistics for Item-Level Model Comparison in Diagnostic Classification Models.

Appl Psychol Meas 2019 Jul 18;43(5):402-414. Epub 2018 Sep 18.

Beijing Normal University, China.

Diagnostic classification models (DCMs) have been widely used in education, psychology, and many other disciplines. To select the most appropriate DCM for each item, the Wald test has been recommended. However, prior research has revealed that this test provides inflated Type I error rates. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618798664DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572908PMC
July 2019
2 Reads

Computerized Adaptive Testing for Cognitively Based Multiple-Choice Data.

Appl Psychol Meas 2019 Jul 18;43(5):388-401. Epub 2018 Sep 18.

The University of Hong Kong, Hong Kong.

Cognitive diagnosis models (CDMs) are latent class models that hold great promise for providing diagnostic information about student knowledge profiles. The increasing use of computers in classrooms enhances the advantages of CDMs for more efficient diagnostic testing by using adaptive algorithms, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT). When multiple-choice items are involved, CD-CAT can be further improved by using polytomous scoring (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618798665DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572910PMC
July 2019
1 Read

Person-Fit as an Index of Inattentive Responding: A Comparison of Methods Using Polytomous Survey Data.

Appl Psychol Meas 2019 Jul 14;43(5):374-387. Epub 2018 Sep 14.

University of Nebraska-Lincoln, NE, USA.

Self-report measures are vulnerable to response biases that can degrade the accuracy of conclusions drawn from results. In low-stakes measures, inattentive or careless responding can be especially problematic. A variety of a priori and post hoc methods exist for detecting these aberrant response patterns. Read More

View Article

Download full-text PDF

Source
http://journals.sagepub.com/doi/10.1177/0146621618798666
Publisher Site
http://dx.doi.org/10.1177/0146621618798666DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572906PMC
July 2019
2 Reads

A Sandwich Standard Error Estimator for Exploratory Factor Analysis With Nonnormal Data and Imperfect Models.

Appl Psychol Meas 2019 Jul 14;43(5):360-373. Epub 2018 Sep 14.

University of Notre Dame, IN, USA.

This article is concerned with standard errors (s) and confidence intervals (CIs) for exploratory factor analysis (EFA) in different situations. The authors adapt a sandwich estimator for EFA parameters to accommodate nonnormal data and imperfect models, factor extraction with maximum likelihood and ordinary least squares, and factor rotation with CF-varimax, CF-quartimax, geomin, or target rotation. They illustrate the sandwich s and CIs using nonnormal continuous data and ordinal data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618798669DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572907PMC

A Comprehensive IRT Approach for Modeling Binary, Graded, and Continuous Responses With Error in Persons and Items.

Authors:
Pere J Ferrando

Appl Psychol Meas 2019 Jul 12;43(5):339-359. Epub 2018 Dec 12.

Universitat Rovira i Virgili, Tarragona, Spain.

Dual item response theory (IRT) models in which items and individuals have different amounts of measurement error have been proposed in the literature. Any developments in these models, however, are feasible only for continuous responses. This article discusses a comprehensive dual modeling approach, based on underlying latent response variables, from which specific models for continuous, graded, and binary responses are obtained. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618817779DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572909PMC
July 2019
1 Read

Extreme Response Style: A Simulation Study Comparison of Three Multidimensional Item Response Models.

Appl Psychol Meas 2019 Jun 1;43(4):322-335. Epub 2018 Aug 1.

James Madison University, Harrisonburg, VA, USA.

Several multidimensional item response models have been proposed for survey responses affected by response styles. Through simulation, this study compares three models designed to account for extreme response tendencies: the IRTree Model, the multidimensional nominal response model, and the modified generalized partial credit model. The modified generalized partial credit model results in the lowest item mean squared error (MSE) across simulation conditions of sample size (500, 1,000), survey length (10, 20), and number of response options (4, 6). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618789392DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6512164PMC
June 2019
2 Reads

Looking at DIF From a New Perspective: A Structure-Based Approach Acknowledging Inherent Indefinability.

Authors:
Anna Doebler

Appl Psychol Meas 2019 Jun 11;43(4):303-321. Epub 2018 Sep 11.

University of Mannheim, Germany.

Differential item functioning (DIF), although highly relevant for psychometric assessment in various fields of psychology, is mathematically not well-defined. Especially the impact, the difference between the means of the person parameters in the focal and the reference group, is a parameter that is not identified without further assumptions. Common DIF detection methods necessarily impose such assumptions, however, in most cases the specific constraints remain quite vague and are implicit in the mathematical algorithms. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618795727DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6512162PMC
June 2019
1 Read

A Graded Response Model Framework for Questionnaires With Uniform Response Formats.

Appl Psychol Meas 2019 Jun 1;43(4):290-302. Epub 2018 Aug 1.

Justus-Liebig-Universität Giessen, Germany.

Questionnaires with uniform-ordered categorical response formats are widely applied in psychology. Muraki proposed a modified graded response model accounting for the items' uniform response formats by assuming identical threshold parameters defining the category boundaries for all items. What is not well known is that there is a set of closely related models, which similarly assume identical thresholds. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618789394DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6512163PMC
June 2019
1 Read

Model Selection for Multilevel Mixture Rasch Models.

Appl Psychol Meas 2019 Jun 7;43(4):272-289. Epub 2018 Jun 7.

University of Georgia, Athens, USA.

Mixture item response theory (MixIRT) models can sometimes be used to model the heterogeneity among the individuals from different subpopulations, but these models do not account for the multilevel structure that is common in educational and psychological data. Multilevel extensions of the MixIRT models have been proposed to address this shortcoming. Successful applications of multilevel MixIRT models depend in part on detection of the best fitting model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618779990DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6512165PMC
June 2019
11 Reads

Cognitive Diagnostic Models With Attribute Hierarchies: Model Estimation With a Restricted Q-Matrix Design.

Appl Psychol Meas 2019 Jun 16;43(4):255-271. Epub 2018 Apr 16.

University of Illinois at Urbana-Champaign, USA.

Attribute hierarchy is a common assumption in the educational context, where the mastery of one attribute is assumed to be a prerequisite to the mastery of another one. The attribute hierarchy can be incorporated through a restricted Q matrix that implies the specified structure. The latent class-based cognitive diagnostic models (CDMs) usually do not assume a hierarchical structure among attributes, which means all profiles of attributes are possible in a population of interest. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618765721DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6512166PMC
June 2019
8 Reads

Comparing Attitudes Across Groups: An IRT-Based Item-Fit Statistic for the Analysis of Measurement Invariance.

Appl Psychol Meas 2019 May 27;43(3):241-250. Epub 2017 Dec 27.

Deutsches Institut für Internationale Pädagogische Forschung, Frankfurt, Germany.

Questionnaires for the assessment of attitudes and other psychological traits are crucial in educational and psychological research, and item response theory (IRT) has become a viable tool for scaling such data. Many international large-scale assessments aim at comparing these constructs across countries, and the invariance of measures across countries is thus required. In its most recent cycle, the Programme for International Student Assessment (PISA 2015) implemented an innovative approach for testing the invariance of IRT-scaled constructs in the context questionnaires administered to students, parents, school principals, and teachers. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617748323DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463271PMC
May 2019
2 Reads

GGUM-RANK Statement and Person Parameter Estimation With Multidimensional Forced Choice Triplets.

Appl Psychol Meas 2019 May 23;43(3):226-240. Epub 2018 Apr 23.

Nanyang Technological University, Singapore.

Historically, multidimensional forced choice (MFC) measures have been criticized because conventional scoring methods can lead to ipsativity problems that render scores unsuitable for interindividual comparisons. However, with the recent advent of item response theory (IRT) scoring methods that yield normative information, MFC measures are surging in popularity and becoming important components in high-stake evaluation settings. This article aims to add to burgeoning methodological advances in MFC measurement by focusing on statement and person parameter recovery for the GGUM-RANK (generalized graded unfolding-RANK) IRT model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618768294DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463341PMC
May 2019
1 Read

The Bipolarity of Attitudes: Unfolding the Implications of Ambivalence.

Authors:
Joshua A McGrane

Appl Psychol Meas 2019 May 26;43(3):211-225. Epub 2018 Mar 26.

University of Oxford, UK.

Recently, some attitude researchers have argued that the traditional bipolar model of attitudes should be replaced, claiming that a bivariate model is superior in several ways, foremost of which is its ability to account for ambivalent attitudes. This study argues that ambivalence is not at odds with bipolarity per se, but rather the conventional view of bipolarity, and that the psychometric evidence supporting a bivariate interpretation has been flawed. To demonstrate this, a scale developed out of the bivariate approach was examined using a unidimensional unfolding item response theory model: general hyperbolic cosine model for polytomous responses. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618762741DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463342PMC
May 2019
2 Reads

A General Unfolding IRT Model for Multiple Response Styles.

Appl Psychol Meas 2019 May 16;43(3):195-210. Epub 2018 Apr 16.

The Education University of Hong Kong, Tai Po, New Territories, Hong Kong.

It is commonly known that respondents exhibit different response styles when responding to Likert-type items. For example, some respondents tend to select the extreme categories (e.g. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618762743DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463344PMC
May 2019
2 Reads

A Law of Comparative Preference: Distinctions Between Models of Personal Preference and Impersonal Judgment in Pair Comparison Designs.

Appl Psychol Meas 2019 May 2;43(3):181-194. Epub 2017 Nov 2.

Hong Kong Examinations and Assessment Authority, Wan Chai, Hong Kong.

The pair comparison design for distinguishing between stimuli located on the same natural or hypothesized linear continuum is used both when the response is a personal preference and when it is an impersonal judgment. Appropriate models which complement the different responses have been proposed. However, the models most appropriate for impersonal judgments have also been described as modeling choice, which may imply personal preference. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617738014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463346PMC
May 2019
1 Read

GGUM: An R Package for Fitting the Generalized Graded Unfolding Model.

Appl Psychol Meas 2019 Mar 7;43(2):172-173. Epub 2018 May 7.

University of Groningen, The Netherlands.

In this article, the newly created GGUM R package is presented. This package finally brings the generalized graded unfolding model (GGUM) to the front stage for practitioners and researchers. It expands the possibilities of fitting this type of item response theory (IRT) model to settings that, up to now, were not possible (thus, beyond the limitations imposed by the widespread GGUM2004 software). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618772290DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376534PMC
March 2019
1 Read

Examining the Impacts of Rater Effects in Performance Assessments.

Authors:
Stefanie A Wind

Appl Psychol Meas 2019 Mar 5;43(2):159-171. Epub 2018 Aug 5.

The University of Alabama, Tuscaloosa, USA.

Rater effects such as severity, centrality, and misfit are recurrent concerns in performance assessments. Despite their persistence in operational assessment settings and frequent discussion in research, researchers have not fully explored the impacts of rater effects as they relate to estimates of student achievement. The purpose of this study is to explore the impacts of rater severity, centrality, and misfit on student achievement estimates and on classification decisions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618789391DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376535PMC
March 2019
1 Read

Bayesian DINA Modeling Incorporating Within-Item Characteristic Dependency.

Appl Psychol Meas 2019 Mar 22;43(2):143-158. Epub 2018 Jun 22.

Zhejiang Normal University, Zhejiang, China.

The within-item characteristic dependency (WICD) means that dependencies exist among different types of item characteristics/parameters within an item. The potential WICD has been ignored by current modeling approaches and estimation algorithms for the deterministic inputs noisy "and" gate (DINA) model. To explicitly model WICD, this study proposed a modified Bayesian DINA modeling approach where a bivariate normal distribution was employed as a joint prior distribution for correlated item parameters. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618781594DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376533PMC
March 2019
1 Read

A Posterior Predictive Model Checking Method Assuming Posterior Normality for Item Response Theory.

Authors:
Megan Kuhfeld

Appl Psychol Meas 2019 Mar 29;43(2):125-142. Epub 2018 Jun 29.

The University of Texas at Austin, USA.

This study investigated the violation of local independence assumptions within unidimensional item response theory (IRT) models. Bayesian posterior predictive model checking (PPMC) methods are increasingly being used to investigate multidimensionality in IRT models. The current work proposes a PPMC method for evaluating local dependence in IRT models that are estimated using full-information maximum likelihood. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618779985DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376537PMC
March 2019
1 Read

The Effect of Option Homogeneity in Multiple-Choice Items.

Appl Psychol Meas 2019 Mar 7;43(2):113-124. Epub 2018 May 7.

National Council of State Boards of Nursing, Chicago, IL, USA.

Previous research has found that option homogeneity in multiple-choice items affects item difficulty when items with homogeneous options are compared to the same items with heterogeneous options. This study conducted an empirical test of the effect of option homogeneity in multiple-choice items on a professional licensure examination to determine the predictability and magnitude of the change. Similarity of options to the key was determined by using subject matter experts and a natural language processing algorithm. Read More

View Article

Download full-text PDF

Source
http://journals.sagepub.com/doi/10.1177/0146621618770803
Publisher Site
http://dx.doi.org/10.1177/0146621618770803DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376538PMC
March 2019
2 Reads