195 results match your criteria Applied Psychological Measurement[Journal]


GGUM: An R Package for Fitting the Generalized Graded Unfolding Model.

Appl Psychol Meas 2019 Mar 7;43(2):172-173. Epub 2018 May 7.

University of Groningen, The Netherlands.

In this article, the newly created GGUM R package is presented. This package finally brings the generalized graded unfolding model (GGUM) to the front stage for practitioners and researchers. It expands the possibilities of fitting this type of item response theory (IRT) model to settings that, up to now, were not possible (thus, beyond the limitations imposed by the widespread GGUM2004 software). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618772290DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376534PMC

Examining the Impacts of Rater Effects in Performance Assessments.

Authors:
Stefanie A Wind

Appl Psychol Meas 2019 Mar 5;43(2):159-171. Epub 2018 Aug 5.

The University of Alabama, Tuscaloosa, USA.

Rater effects such as severity, centrality, and misfit are recurrent concerns in performance assessments. Despite their persistence in operational assessment settings and frequent discussion in research, researchers have not fully explored the impacts of rater effects as they relate to estimates of student achievement. The purpose of this study is to explore the impacts of rater severity, centrality, and misfit on student achievement estimates and on classification decisions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618789391DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376535PMC

Bayesian DINA Modeling Incorporating Within-Item Characteristic Dependency.

Appl Psychol Meas 2019 Mar 22;43(2):143-158. Epub 2018 Jun 22.

Zhejiang Normal University, Zhejiang, China.

The within-item characteristic dependency (WICD) means that dependencies exist among different types of item characteristics/parameters within an item. The potential WICD has been ignored by current modeling approaches and estimation algorithms for the deterministic inputs noisy "and" gate (DINA) model. To explicitly model WICD, this study proposed a modified Bayesian DINA modeling approach where a bivariate normal distribution was employed as a joint prior distribution for correlated item parameters. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618781594DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376533PMC

A Posterior Predictive Model Checking Method Assuming Posterior Normality for Item Response Theory.

Authors:
Megan Kuhfeld

Appl Psychol Meas 2019 Mar 29;43(2):125-142. Epub 2018 Jun 29.

The University of Texas at Austin, USA.

This study investigated the violation of local independence assumptions within unidimensional item response theory (IRT) models. Bayesian posterior predictive model checking (PPMC) methods are increasingly being used to investigate multidimensionality in IRT models. The current work proposes a PPMC method for evaluating local dependence in IRT models that are estimated using full-information maximum likelihood. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618779985DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376537PMC

The Effect of Option Homogeneity in Multiple-Choice Items.

Appl Psychol Meas 2019 Mar 7;43(2):113-124. Epub 2018 May 7.

National Council of State Boards of Nursing, Chicago, IL, USA.

Previous research has found that option homogeneity in multiple-choice items affects item difficulty when items with homogeneous options are compared to the same items with heterogeneous options. This study conducted an empirical test of the effect of option homogeneity in multiple-choice items on a professional licensure examination to determine the predictability and magnitude of the change. Similarity of options to the key was determined by using subject matter experts and a natural language processing algorithm. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618770803DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376538PMC

Use of Information Criteria in the Study of Group Differences in Trace Lines.

Appl Psychol Meas 2019 Mar 15;43(2):95-112. Epub 2018 May 15.

Korea University, Seoul, Korea.

A brief review of various information criteria is presented for the detection of differential item functioning (DIF) under item response theory (IRT). An illustration of using information criteria for model selection as well as results with simulated data are presented and contrasted with the IRT likelihood ratio (LR) DIF detection method. Use of information criteria for general IRT model selection is discussed. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618772292DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376536PMC
March 2019
1 Read

ShortForm: An R Package to Select Scale Short Forms With the Ant Colony Optimization Algorithm.

Appl Psychol Meas 2018 Sep 22;42(6):516-517. Epub 2018 Jan 22.

University of Florida, Gainesville, USA.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617752993DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373857PMC
September 2018

What Information Works Best?: A Comparison of Routing Methods.

Appl Psychol Meas 2018 Sep 4;42(6):499-515. Epub 2018 Feb 4.

University of Florida, Gainesville, USA.

There are many item selection methods proposed for computerized adaptive testing (CAT) applications. However, not all of them have been used in computerized multistage testing (ca-MST). This study uses some item selection methods as a routing method in ca-MST framework. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617752990DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373851PMC
September 2018

Latent Class Analysis of Recurrent Events in Problem-Solving Items.

Appl Psychol Meas 2018 Sep 9;42(6):478-498. Epub 2018 Apr 9.

Columbia University, New York, NY, USA.

Computer-based assessment of complex problem-solving abilities is becoming more and more popular. In such an assessment, the entire problem-solving process of an examinee is recorded, providing detailed information about the individual, such as behavioral patterns, speed, and learning trajectory. The problem-solving processes are recorded in a computer log file which is a time-stamped documentation of events related to task completion. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617748325DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373852PMC
September 2018

Mutual Information Reliability for Latent Class Analysis.

Appl Psychol Meas 2018 Sep 15;42(6):460-477. Epub 2018 Jan 15.

University of Maryland, College Park, USA.

Latent class models are powerful tools in psychological and educational measurement. These models classify individuals into subgroups based on a set of manifest variables, assisting decision making in a diagnostic system. In this article, based on information theory, the authors propose a mutual information reliability (MIR) coefficient that summaries the measurement quality of latent class models, where the latent variables being measured are categorical. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617748324DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373856PMC
September 2018

An EM-Based Method for Q-Matrix Validation.

Appl Psychol Meas 2018 Sep 20;42(6):446-459. Epub 2018 Feb 20.

University of Illinois at Urbana-Champaign, IL, USA.

With the purpose to assist the subject matter experts in specifying their Q-matrices, the authors used expectation-maximization (EM)-based algorithm to investigate three alternative Q-matrix validation methods, namely, the maximum likelihood estimation (MLE), the marginal maximum likelihood estimation (MMLE), and the intersection and difference (ID) method. Their efficiency was compared, respectively, with that of the sequential EM-based δ method and its extension (ς), the γ method, and the nonparametric method in terms of correct recovery rate, true negative rate, and true positive rate under the deterministic-inputs, noisy "and" gate (DINA) model and the reduced reparameterized unified model (rRUM). Simulation results showed that for the rRUM, the MLE performed better for low-quality tests, whereas the MMLE worked better for high-quality tests. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617752991DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373855PMC
September 2018

Scale Separation Reliability: What Does It Mean in the Context of Comparative Judgment?

Appl Psychol Meas 2018 Sep 31;42(6):428-445. Epub 2017 Dec 31.

Université Catholique de Louvain, Louvain-la-Neuve, Belgium.

Comparative judgment (CJ) is an alternative method for assessing competences based on Thurstone's law of comparative judgment. Assessors are asked to compare pairs of students work (representations) and judge which one is better on a certain competence. These judgments are analyzed using the Bradly-Terry-Luce model resulting in logit estimates for the representations. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617748321DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373854PMC
September 2018

Response Styles in the Partial Credit Model.

Appl Psychol Meas 2018 Sep 12;42(6):407-427. Epub 2018 Jan 12.

Institut für Medizinische Biometrie, Informatik und Epidemiologie, Universitätsklinikum Bonn, München, Germany.

In the modeling of ordinal responses in psychological measurement and survey-based research, response styles that represent specific answering patterns of respondents are typically ignored. One consequence is that estimates of item parameters can be poor and considerably biased. The focus here is on the modeling of a tendency to extreme or middle categories. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617748322DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373853PMC
September 2018

Detection Rates of the M Test for Nonzero Lower Asymptotes Under Normal and Nonnormal Ability Distributions in the Applications of IRT.

Appl Psychol Meas 2019 Jan 18;43(1):84-88. Epub 2018 Apr 18.

Florida State University, Tallahassee, USA.

When considering the two-parameter or the three-parameter logistic model for item responses from a multiple-choice test, one may want to assess the need for the lower asymptote parameters in the item response function and make sure the use of the three-parameter item response model. This study reports the degree of sensitivity of an overall model test M to detecting the presence of nonzero asymptotes in the item response function under normal and nonnormal ability distribution conditions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618768291DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6297911PMC
January 2019

Measurement Efficiency for Fixed-Precision Multidimensional Computerized Adaptive Tests: Comparing Health Measurement and Educational Testing Using Example Banks.

Appl Psychol Meas 2019 Jan 23;43(1):68-83. Epub 2018 Apr 23.

University of Oslo, Norway.

It is currently not entirely clear to what degree the research on multidimensional computerized adaptive testing (CAT) conducted in the field of educational testing can be generalized to fields such as health assessment, where CAT design factors differ considerably from those typically used in educational testing. In this study, the impact of a number of important design factors on CAT performance is systematically evaluated, using realistic example item banks for two main scenarios: health assessment (polytomous items, small to medium item bank sizes, high discrimination parameters) and educational testing (dichotomous items, large item banks, small- to medium-sized discrimination parameters). Measurement efficiency is evaluated for both between-item multidimensional CATs and separate unidimensional CATs for each latent dimension. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618765719DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6295884PMC
January 2019

Adaptive Testing With a Hierarchical Item Response Theory Model.

Appl Psychol Meas 2019 Jan 18;43(1):51-67. Epub 2018 Apr 18.

University of Kansas, Lawrence, USA.

The hierarchical item response theory (H-IRT) model is very flexible and allows a general factor and subfactors within an overall structure of two or more levels. When an H-IRT model with a large number of dimensions is used for an adaptive test, the computational burden associated with interim scoring and selection of subsequent items is heavy. An alternative approach for any high-dimension adaptive test is to reduce dimensionality for interim scoring and item selection and then revert to full dimensionality for final score reporting, thereby significantly reducing the computational burden. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618765714DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6297916PMC
January 2019

Multilevel Modeling of Cognitive Diagnostic Assessment: The Multilevel DINA Example.

Appl Psychol Meas 2019 Jan 3;43(1):34-50. Epub 2018 Apr 3.

The Education University of Hong Kong, Tai Po, Hong Kong.

Many multilevel linear and item response theory models have been developed to account for multilevel data structures. However, most existing cognitive diagnostic models (CDMs) are unilevel in nature and become inapplicable when data have a multilevel structure. In this study, using the log-linear CDM as the item-level model, multilevel CDMs were developed based on the latent continuous variable approach and the multivariate Bernoulli distribution approach. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618765713DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6297912PMC
January 2019

Item Parameter Estimation With the General Hyperbolic Cosine Ideal Point IRT Model.

Appl Psychol Meas 2019 Jan 26;43(1):18-33. Epub 2018 Apr 26.

Nanyang Technological University, Singapore.

Over the last decade, researchers have come to recognize the benefits of ideal point item response theory (IRT) models for noncognitive measurement. Although most applied studies have utilized the Generalized Graded Unfolding Model (GGUM), many others have been developed. Most notably, David Andrich and colleagues published a series of papers comparing dominance and ideal point measurement perspectives, and they proposed ideal point models for dichotomous and polytomous single-stimulus responses, known as the Hyperbolic Cosine Model (HCM) and the General Hyperbolic Cosine Model (GHCM), respectively. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618758697DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6297913PMC
January 2019

Modeling Response Style Using Vignettes and Person-Specific Item Response Theory.

Appl Psychol Meas 2019 Jan 14;43(1):3-17. Epub 2018 Sep 14.

The University of Iowa, Iowa City, USA.

Responses to survey data are determined not only by item characteristics and respondents' trait standings but also by response styles. Recently, methods for modeling response style with personality and attitudinal data have turned toward the use of anchoring vignettes, which provide fixed rating targets. Although existing research is promising, a few outstanding questions remain. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618798663DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6297915PMC
January 2019
1 Read

Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items.

Appl Psychol Meas 2018 Nov 23;42(8):677-694. Epub 2018 Apr 23.

School of Psychology, Jiangxi Normal University, Nanchang, China.

Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618762748DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291894PMC
November 2018
1 Read

Investigation of Missing Responses in Q-Matrix Validation.

Appl Psychol Meas 2018 Nov 26;42(8):660-676. Epub 2018 Mar 26.

University of Illinois at Urbana-Champaign, USA.

Missing data can be a serious issue for practitioners and researchers who are tasked with Q-matrix validation analysis in implementation of cognitive diagnostic models. The article investigates the impact of missing responses, and four common approaches (treat as incorrect, logistic regression, listwise deletion, and expectation-maximization [EM] imputation) for dealing with them, on the performance of two major Q-matrix validation methods (the EM-based δ-method and the nonparametric Q-matrix refinement method) across multiple factors. Results of the simulation study show that both validation methods perform better when missing responses are imputed using EM imputation or logistic regression instead of being treated as incorrect and using listwise deletion. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618762742DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291893PMC
November 2018

Assessing Item-Level Fit for Higher Order Item Response Theory Models.

Appl Psychol Meas 2018 Nov 21;42(8):644-659. Epub 2018 Mar 21.

Northeast Normal University, Changchun, Jilin, China.

Testing item-level fit is important in scale development to guide item revision/deletion. Many item-level fit indices have been proposed in literature, yet none of them were directly applicable to an important family of models, namely, the higher order item response theory (HO-IRT) models. In this study, chi-square-based fit indices (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618762740DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291895PMC
November 2018
1 Read

A Hybrid Strategy to Construct Multistage Adaptive Tests.

Authors:
Xinhui Xiong

Appl Psychol Meas 2018 Nov 26;42(8):630-643. Epub 2018 Mar 26.

American Institute of Certified Public Accountants, Ewing, NJ, USA.

How to effectively construct multistage adaptive test (MST) panels is a topic that has spurred recent advances. The most commonly used approaches for MST assembly use one of two strategies: bottom-up and top-down. The bottom-up approach splits the whole test into several modules, and each module is built first, then all modules are compiled to obtain the whole test, while the top-down approach follows the opposite direction. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618762739DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291896PMC
November 2018

Using Odds Ratios to Detect Differential Item Functioning.

Appl Psychol Meas 2018 Nov 21;42(8):613-629. Epub 2018 Mar 21.

The Education University of Hong Kong, New Territories, Hong Kong.

Differential item functioning (DIF) makes test scores incomparable and substantially threatens test validity. Although conventional approaches, such as the logistic regression (LR) and the Mantel-Haenszel (MH) methods, have worked well, they are vulnerable to high percentages of DIF items in a test and missing data. This study developed a simple but effective method to detect DIF using the odds ratio (OR) of two groups' responses to a studied item. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618762738DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291892PMC
November 2018

The Use of Multivariate Generalizability Theory to Evaluate the Quality of Subscores.

Appl Psychol Meas 2018 Nov 3;42(8):595-612. Epub 2018 Apr 3.

National Board of Medical Examiners, Philadelphia, PA, USA.

Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as , which is sensitive to variation in subtest difficulty. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618758698DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291891PMC
November 2018

A Zero-Inflated Box-Cox Normal Unipolar Item Response Model for Measuring Constructs of Psychopathology.

Appl Psychol Meas 2018 Oct 14;42(7):571-589. Epub 2018 Jun 14.

University of Maryland, College Park, USA.

This research introduces a latent class item response theory (IRT) approach for modeling item response data from zero-inflated, positively skewed, and arguably unipolar constructs of psychopathology. As motivating data, the authors use 4,925 responses to the Patient Health Questionnaire (PHQ-9), a nine Likert-type item depression screener that inquires about a variety of depressive symptoms. First, Lucke's log-logistic unipolar item response model is extended to accommodate polytomous responses. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618758291DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6140303PMC
October 2018
1 Read

Methods for Estimating Item-Score Reliability.

Appl Psychol Meas 2018 Oct 9;42(7):553-570. Epub 2018 Apr 9.

Tilburg University, Tilburg, Netherlands.

Reliability is usually estimated for a test score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the item's contribution to the test score's reliability, for identifying unreliable scores in aberrant item-score patterns in person-fit analysis, and for selecting the most reliable item from a test to use as a single-item measure. Four methods were discussed for estimating item-score reliability: the Molenaar-Sijtsma method (method MS), Guttman's method , the latent class reliability coefficient (method LCRC), and the correction for attenuation (method CA). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618758290DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6140096PMC
October 2018

Constructing Shadow Tests in Variable-Length Adaptive Testing.

Authors:
Qi Diao Hao Ren

Appl Psychol Meas 2018 Oct 20;42(7):538-552. Epub 2018 Feb 20.

ACT, Inc., Monterey, CA, USA.

Imposing content constraints is very important in most operational computerized adaptive testing (CAT) programs in educational measurement. Shadow test approach to CAT (Shadow CAT) offers an elegant solution to imposing statistical and nonstatistical constraints by projecting future consequences of item selection. The original form of Shadow CAT presumes fixed test lengths. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617753736DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6140307PMC
October 2018
2 Reads

A Continuous -Stratification Index for Item Exposure Control in Computerized Adaptive Testing.

Appl Psychol Meas 2018 Oct 21;42(7):523-537. Epub 2018 Mar 21.

Amazon.com, Inc., Seattle, WA, USA.

The method of -stratification aims to reduce item overexposure in computerized adaptive testing, as items that are administered at very high rates may threaten the validity of test scores. In existing methods of -stratification, the item bank is partitioned into a fixed number of nonoverlapping strata according to the items' , or discrimination, parameters. This article introduces a continuous -stratification index which incorporates exposure control into the item selection index itself and thus eliminates the need for fixed discrete strata. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618758289DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6140306PMC
October 2018

birtr: A Package for "The Basics of Item Response Theory Using R".

Appl Psychol Meas 2018 Jul 12;42(5):403-404. Epub 2018 Apr 12.

University of Wisconsin-Madison, USA.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617748327DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6023097PMC

IRT Scoring and Test Blueprint Fidelity.

Authors:
Gregory Camilli

Appl Psychol Meas 2018 Jul 20;42(5):393-400. Epub 2018 Feb 20.

Rutgers, The State University of New Jersey, New Brunswick, USA.

This article focuses on the topic of how item response theory (IRT) scoring models reflect the intended content allocation in a set of test specifications or test blueprint. Although either an adaptive or linear assessment can be built to reflect a set of design specifications, the method of scoring is also a critical step. Standard IRT models employ a set of optimal scoring weights, and these weights depend on item parameters in the two-parameter logistic (2PL) and three-parameter logistic (3PL) models. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621618754897DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6023092PMC

Explanatory Cognitive Diagnostic Models: Incorporating Latent and Observed Predictors.

Appl Psychol Meas 2018 Jul 16;42(5):376-392. Epub 2017 Nov 16.

Columbia University, New York City, NY, USA.

Large-scale educational testing data often contain vast amounts of variables associated with information pertaining to test takers, schools, or access to educational resources-information that can help relationships between test taker performance and their learning environment. This study examines approaches to incorporate latent and observed explanatory variables as predictors for cognitive diagnostic models (CDMs). Methods to specify and simultaneously estimate observed and latent variables (estimated using item response theory) as predictors affecting attribute mastery were examined. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617738012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6023094PMC
July 2018
24 Reads

Sources of Error in IRT Trait Estimation.

Appl Psychol Meas 2018 Jul 6;42(5):359-375. Epub 2017 Oct 6.

University of California, Berkeley, USA.

In item response theory (IRT), item response probabilities are a function of item characteristics and latent trait scores. Within an IRT framework, trait score misestimation results from (a) random error, (b) the trait score estimation method, (c) errors in item parameter estimation, and (d) model misspecification. This study investigated the relative effects of these error sources on the bias and confidence interval coverage rates for trait scores. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617733955DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6023095PMC

Asymptotically Normally Distributed Person Fit Indices for Detecting Spuriously High Scores on Difficult Items.

Authors:
Yan Xia Yi Zheng

Appl Psychol Meas 2018 Jul 13;42(5):343-358. Epub 2017 Sep 13.

Arizona State University, Tempe, USA.

Snijders developed a family of person fit indices that asymptotically follow the standard normal distribution, when the ability parameter is estimated. So far, , *, *, , and from this family have been proposed in previous literature. One common property shared by , *, and * (also and in some specific conditions) is that they employ symmetric weight functions and thus identify spurious scores on both easy and difficult items in the same manner. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617730391DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6023093PMC
July 2018
1 Read

Measuring Patient-Reported Outcomes Adaptively: Multidimensionality Matters!

Appl Psychol Meas 2018 Jul 24;42(5):327-342. Epub 2017 Oct 24.

University of Twente, Enschede, The Netherlands.

As there is currently a marked increase in the use of both unidimensional (UCAT) and multidimensional computerized adaptive testing (MCAT) in psychological and health measurement, the main aim of the present study is to assess the incremental value of using MCAT rather than separate UCATs for each dimension. Simulations are based on empirical data that could be considered typical for health measurement: a large number of dimensions (4), strong correlations among dimensions (.77-. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617733954DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6009175PMC
July 2018
2 Reads

Plausible-Value Imputation Statistics for Detecting Item Misfit.

Appl Psychol Meas 2017 Jul 1;41(5):372-387. Epub 2017 Feb 1.

York University, Toronto, Ontario, Canada.

When tests consist of a small number of items, the use of latent trait estimates for secondary analyses is problematic. One area in particular where latent trait estimates have been problematic is when testing for item misfit. This article explores the use of plausible-value imputations to lessen the severity of the inherent measurement unreliability in shorter tests, and proposes a parametric bootstrap procedure to generate empirical sampling characteristics for null-hypothesis tests of item fit. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617692079DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978675PMC
July 2017
1 Read

Ignoring a Multilevel Structure in Mixture Item Response Models: Impact on Parameter Recovery and Model Selection.

Appl Psychol Meas 2018 Mar 19;42(2):136-154. Epub 2017 Jun 19.

Vanderbilt University, Nashville, TN, USA.

The current study investigated the consequences of ignoring a multilevel structure for a mixture item response model to show when a multilevel mixture item response model is needed. Study 1 focused on examining the consequence of ignoring dependency for within-level latent classes. Simulation conditions that may affect model selection and parameter recovery in the context of a multilevel data structure were manipulated: class-specific ICC, cluster size, and number of clusters. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617711999DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978650PMC

The Information Product Methods: A Unified Approach to Dual-Purpose Computerized Adaptive Testing.

Appl Psychol Meas 2018 Jun 27;42(4):321-324. Epub 2017 Sep 27.

Jiangxi Normal University, Nanchang, China.

This article gives a brief summary of major approaches in dual-purpose computerized adaptive testing (CAT) in which the test is tailored interactively to both an examinee's overall ability level, , and attribute mastery level, . It also proposes an information product approach whose connections to the current methods are revealed. An updated comprehensive empirical study demonstrated that the information product approach not only can offer a unified framework to connect all other approaches but also can mitigate the weighting issue in the dual-information approach. Read More

View Article

Download full-text PDF

Source
http://journals.sagepub.com/doi/10.1177/0146621617730392
Publisher Site
http://dx.doi.org/10.1177/0146621617730392DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978604PMC
June 2018
6 Reads

Inferential Item-Fit Evaluation in Cognitive Diagnosis Modeling.

Appl Psychol Meas 2017 Nov 19;41(8):614-631. Epub 2017 May 19.

Universidad de Zaragoza, Spain.

Research related to the fit evaluation at the item level involving cognitive diagnosis models (CDMs) has been scarce. According to the parsimony principle, balancing goodness of fit against model complexity is necessary. General CDMs require a larger sample size to be estimated reliably, and can lead to worse attribute classification accuracy than the appropriate reduced models when the sample size is small and the item quality is poor, which is typically the case in many empirical applications. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617707510DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978477PMC
November 2017
7 Reads

A New Online Calibration Method Based on Lord's Bias-Correction.

Appl Psychol Meas 2017 Sep 26;41(6):456-471. Epub 2017 Mar 26.

Beijing Normal University, China.

Online calibration technique has been widely employed to calibrate new items due to its advantages. Method A is the simplest online calibration method and has attracted many attentions from researchers recently. However, a key assumption of Method A is that it treats person-parameter estimates (obtained by maximum likelihood estimation [MLE]) as their true values , thus the deviation of the estimated from their true values might yield inaccurate item calibration when the deviation is nonignorable. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617697958DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978521PMC
September 2017
1 Read

On the Performance of the Marginal Homogeneity Test to Detect Rater Drift.

Appl Psychol Meas 2018 Jun 16;42(4):307-320. Epub 2017 Sep 16.

Educational Testing Service, Princeton, NJ, USA.

When constructed response items are administered repeatedly, "trend scoring" can be used to test for rater drift. In trend scoring, raters rescore responses from the previous administration. Two simulation studies evaluated the utility of Stuart's measure of marginal homogeneity as a way of evaluating rater drift when monitoring trend scoring. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617730390DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978607PMC

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports.

Appl Psychol Meas 2018 Jun 27;42(4):291-306. Epub 2017 Sep 27.

Australian Catholic University, North Sydney, New South Wales, Australia.

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups. Vignettes are used to adjust self-assessment responses on the respondent level but entail significant assumptions: They are supposed to be invariant across respondents, and the responses to vignette prompts are supposed to be without error and strictly ordered. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617730389DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978608PMC
June 2018
3 Reads

Projection-Based Stopping Rules for Computerized Adaptive Testing in Licensure Testing.

Appl Psychol Meas 2018 Jun 27;42(4):275-290. Epub 2017 Aug 27.

National Council of State Boards of Nursing, Chicago, IL, USA.

The confidence interval (CI) stopping rule is commonly used in licensure settings to make classification decisions with fewer items in computerized adaptive testing (CAT). However, it tends to be less efficient in the near-cut regions of the θ scale, as the CI often fails to be narrow enough for an early termination decision prior to reaching the maximum test length. To solve this problem, this study proposed the projection-based stopping rules that base the termination decisions on the algorithmically projected range of the final θ estimate at the hypothetical completion of the CAT. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617726790DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978606PMC
June 2018
2 Reads

Investigating the Effects of Differential Item Functioning on Proficiency Classification.

Authors:
Logan Rome Bo Zhang

Appl Psychol Meas 2018 Jun 29;42(4):259-274. Epub 2017 Aug 29.

University of Wisconsin-Milwaukee, WI, USA.

This study provides a comprehensive evaluation of the effects of differential item functioning (DIF) on proficiency classification. Using Monte Carlo simulation, item- and test-level DIF magnitudes were varied systematically to investigate their impact on proficiency classification at multiple decision points. Findings from this study clearly show that the presence of DIF affects proficiency classification not by lowering the overall correct classification rates but by affecting classification error rates differently for reference and focal group members. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617726789DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978605PMC
June 2018
15 Reads

Multivariate Hypothesis Testing Methods for Evaluating Significant Individual Change.

Appl Psychol Meas 2018 May 13;42(3):221-239. Epub 2017 Oct 13.

University of Minnesota, Minneapolis, MN, USA.

The measurement of individual change has been an important topic in both education and psychology. For instance, teachers are interested in whether students have significantly improved (e.g. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617726787DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985704PMC
May 2018
1 Read

Improving the Assessment of Differential Item Functioning in Large-Scale Programs With Dual-Scale Purification of Rasch Models: The PISA Example.

Appl Psychol Meas 2018 May 29;42(3):206-220. Epub 2017 Aug 29.

National Sun Yat-sen University, Kaohsiung, Taiwan.

By design, large-scale educational testing programs often have a large proportion of missing data. Since the effect of missing data on differential item functioning (DIF) assessment has been investigated in recent years and it has been found that Type I error rates tend to be inflated, it is of great importance to adapt existing DIF assessment methods to the inflation. The DIF-free-then-DIF (DFTD) strategy, which originally involved one single-scale purification procedure to identify DIF-free items, has been extended to involve another scale purification procedure for the DIF assessment in this study, and this new method is called the dual-scale purification (DSP) procedure. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617726786DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985702PMC

Asymptotic Variance of Linking Coefficient Estimators for Polytomous IRT Models.

Authors:
Björn Andersson

Appl Psychol Meas 2018 May 24;42(3):192-205. Epub 2017 Aug 24.

Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, China.

In item response theory (IRT), when two groups from different populations take two separate tests, there is a need to link the two ability scales so that the item parameters of the tests are comparable across the groups. To link the two scales, information from common items are utilized to estimate linking coefficients which place the item parameters on the same scale. For polytomous IRT models, the Haebara and Stocking-Lord methods for estimating the linking coefficients have commonly been recommended. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617721249DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985705PMC
May 2018
3 Reads

A Cognitive Diagnosis Model for Identifying Coexisting Skills and Misconceptions.

Appl Psychol Meas 2018 May 7;42(3):179-191. Epub 2017 Oct 7.

The University of Hong Kong.

At present, most existing cognitive diagnosis models (CDMs) are designed to either identify the presence and absence of skills or misconceptions, but not both. This article proposes a CDM that can be used to simultaneously identify what skills and misconceptions students possess. In addition, it proposes the use of the expectation-maximization algorithm to estimate the model parameters. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617722791DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985701PMC
May 2018
1 Read

IRT in SPSS Using the SPIRIT Macro.

Appl Psychol Meas 2018 Mar 6;42(2):173-174. Epub 2017 Oct 6.

University of California, Los Angeles, CA, USA.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617733956DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978649PMC
March 2018
1 Read