Publications by authors named "Julio Olea"

18 Publications

  • Page 1 of 1

Cheating on Unproctored Internet Test Applications: An Analysis of a Verification Test in a Real Personnel Selection Context.

Span J Psychol 2018 Dec 3;21:E62. Epub 2018 Dec 3.

Universidad Autónoma de Madrid (Spain).

This study analyses the extent to which cheating occurs in a real selection setting. A two-stage, unproctored and proctored, test administration was considered. Test score inconsistencies were concluded by applying a verification test (Guo and Drasgow Z-test). An initial simulation study showed that the Z-test has adequate Type I error and power rates in the specific selection settings explored. A second study applied the Z-test statistic verification procedure to a sample of 954 employment candidates. Additional external evidence based on item time response to the verification items was gathered. The results revealed a good performance of the Z-test statistic and a relatively low, but non-negligible, number of suspected cheaters that showed higher distorted ability estimates. The study with real data provided additional information on the presence of suspected cheating in unproctored applications and the viability of using item response times as an additional evidence of cheating. In the verification test, suspected cheaters spent 5.78 seconds per item more than expected considering the item difficulty and their assumed ability in the unproctored stage. We found that the percentage of suspected cheaters in the empirical study could be estimated at 13.84%. In summary, the study provides evidence of the usefulness of the Z-test in the detection of cheating in a specific setting, in which a computerized adaptive test for assessing English grammar knowledge was used for personnel selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1017/sjp.2018.50DOI Listing
December 2018

Assessing the Big Five with bifactor computerized adaptive testing.

Psychol Assess 2018 Dec 30;30(12):1678-1690. Epub 2018 Aug 30.

Department of Social Psychology and Methodology, Autonomous University of Madrid.

Multidimensional computerized adaptive testing based on the bifactor model (MCAT-B) can provide efficient assessments of multifaceted constructs. In this study, MCAT-B was compared with a short fixed-length scale and computerized adaptive testing based on unidimensional (UCAT) and multidimensional (correlated-factors) models (MCAT) to measure the Big Five model of personality. The sample comprised 826 respondents who completed a pool with 360 personality items measuring the Big Five domains and facets. The dimensionality of the Big Five domains was also tested. With only 12 items per domain, the MCAT and MCAT-B procedures were more efficient to assess highly multidimensional constructs (e.g., Agreeableness), whereas no differences were found with UCAT and the short scale with traits that were essentially unidimensional (e.g., Extraversion). Furthermore, the study showed that MCAT and MCAT-B provide better content-balance of the pool because, for each Big Five domain, items from all the facets are selected in similar proportions. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1037/pas0000631DOI Listing
December 2018

Inferential Item-Fit Evaluation in Cognitive Diagnosis Modeling.

Appl Psychol Meas 2017 Nov 19;41(8):614-631. Epub 2017 May 19.

Universidad de Zaragoza, Spain.

Research related to the fit evaluation at the item level involving cognitive diagnosis models (CDMs) has been scarce. According to the parsimony principle, balancing goodness of fit against model complexity is necessary. General CDMs require a larger sample size to be estimated reliably, and can lead to worse attribute classification accuracy than the appropriate reduced models when the sample size is small and the item quality is poor, which is typically the case in many empirical applications. The main purpose of this study was to systematically examine the statistical properties of four inferential item-fit statistics: , the likelihood ratio (LR) test, the Wald (W) test, and the Lagrange multiplier (LM) test. To evaluate the performance of the statistics, a comprehensive set of factors, namely, sample size, correlational structure, test length, item quality, and generating model, is systematically manipulated using Monte Carlo methods. Results show that the statistic has unacceptable power. Type I error and power comparisons favor LR and W tests over the LM test. However, all the statistics are highly affected by the item quality. With a few exceptions, their performance is only acceptable when the item quality is high. In some cases, this effect can be ameliorated by an increase in sample size and test length. This implies that using the above statistics to assess item fit in practical settings when the item quality is low remains a challenge.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/0146621617707510DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978477PMC
November 2017

Calibrating a new item pool to adaptively assess the Big Five.

Psicothema 2017 Aug;29(3):390-395

Universidad Autónoma de Madrid.

Background: Even though the Five Factor Model (FFM) has been the dominant paradigm in personality research for the past two decades, very few studies have measured the FFM adaptively. Thus, the purpose of this research was the building of a new item pool to develop a computerized adaptive test (CAT) for personality assessment.

Method: A pool of 480 items that measured the FFM facets was developed and applied to 826 participants. Facets were calibrated separately and item selection was performed being mindful of the preservation of unidimensionality of each facet. Then, a post-hoc simulation study was carried out to test the performance of separate CATs to measure the facets.

Results: The final item pool was composed of 360 items with good psychometric properties. Findings reveal that a CAT administration of four items per facet (total length of 120 items) provides accurate facets scores, while maintaining the factor structure of the FFM.

Conclusions: An item pool with good psychometric properties was obtained and a CAT simulation study demonstrated that the FFM facets could be measured with precision using a third of the items in the pool.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7334/psicothema2016.391DOI Listing
August 2017

Structural brain connectivity and cognitive ability differences: A multivariate distance matrix regression analysis.

Hum Brain Mapp 2017 02 11;38(2):803-816. Epub 2016 Oct 11.

Facultad de Psicología, Universidad Autónoma de Madrid, Madrid, Spain.

Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp 38:803-816, 2017. © 2016 Wiley Periodicals, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/hbm.23419DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6866971PMC
February 2017

A multistage adaptive test of fluid intelligence.

Psicothema 2016 Aug;28(3):346-52

Universidad Autónoma de Madrid.

Background: Multistage adaptive testing has recently emerged as an alternative to the computerized adaptive test. The current study details a new multistage test to assess fluid intelligence.

Method: An item pool of progressive matrices with constructed response format was developed, and divided into six subtests. The subtests were applied to a sample of 724 college students and their psychometric properties were studied (i.e., reliability, dimensionality and validity evidence). The item pool was calibrated under the graded response model, and two multistage structures were developed, based on the automatic test assembly principles. Finally, the test information provided by each structure was compared in order to select the most appropriate one.

Results: The item pool showed adequate psychometric properties. From the two compared multistage structures, the simplest structure (i.e., routing test and two modules in the next stages) were more informative across the latent trait continuum and were therefore kept.

Discussion: Taken together, the results of the two studies support the application of the FIMT (Fluid Intelligence Multistage Test), a multistage test to assess fluid intelligence accurately and innovatively.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7334/psicothema2015.287DOI Listing
August 2016

Exploratory factor analysis in validation studies: uses and recommendations.

Psicothema 2014 ;26(3):395-400

Universidad Autónoma de Madrid.

Background: The Exploratory Factor Analysis (EFA) procedure is one of the most commonly used in social and behavioral sciences. However, it is also one of the most criticized due to the poor management researchers usually display. The main goal is to examine the relationship between practices usually considered more appropriate and actual decisions made by researchers.

Method: The use of exploratory factor analysis is examined in 117 papers published between 2011 and 2012 in 3 Spanish psychological journals with the highest impact within the previous five years.

Results: RESULTS show significant rates of questionable decisions in conducting EFA, based on unjustified or mistaken decisions regarding the method of extraction, retention, and rotation of factors.

Conclusions: Overall, the current review provides support for some improvement guidelines regarding how to apply and report an EFA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7334/psicothema2013.349DOI Listing
June 2016

Application of cognitive diagnosis models to competency-based situational judgment tests.

Psicothema 2014 ;26(3):372-7

Instituto de Ingeniería del Conocimiento (IIC-UAM).

Background: Profiling of jobs in terms of competency requirements has increasingly been applied in many organizational settings. Testing these competencies through situational judgment tests (SJTs) leads to validity problems because it is not usually clear which constructs SJTs measure. The primary purpose of this paper is to evaluate whether the application of cognitive diagnosis models (CDM) to competency-based SJTs can ascertain the underlying competencies measured by the items, and whether these competencies can be estimated precisely.

Method: The generalized deterministic inputs, noisy "and" gate (G-DINA) model was applied to 26 situational judgment items measuring professional competencies based on the great eight model. These items were applied to 485 employees of a Spanish financial company. The fit of the model to the data and the convergent validity between the estimated competencies and personality dimensions were examined.

Results: The G-DINA showed a good fit to the data and the estimated competency factors, adapting and coping and interacting and presenting were positively related to emotional stability and extraversion, respectively.

Conclusions: This work indicates that CDM can be a useful tool when measuring professional competencies through SJTs. CDM can clarify the competencies being measured and provide precise estimates of these competencies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7334/psicothema2013.322DOI Listing
June 2016

Optimal number of strata for the stratified methods in computerized adaptive testing.

Span J Psychol 2014 ;17:E48

Universidad Autónoma de Madrid (Spain).

Test security can be a major problem in computerized adaptive testing, as examinees can share information about the items they receive. Of the different item selection rules proposed to alleviate this risk, stratified methods are among those that have received most attention. In these methods, only low discriminative items can be presented at the beginning of the test and the mean information of the items increases as the test goes on. To do so, the item bank must be divided into several strata according to the information of the items. To date, there is no clear guidance about the optimal number of strata into which the item bank should be split. In this study, we will simulate conditions with different numbers of strata, from 1 (no stratification) to a number of strata equal to test length (maximum level of stratification) while manipulating the maximum exposure rate that no item should surpass (r max ) in its whole domain. In this way, we can plot the relation between test security and accuracy, making it possible to determine the number of strata that leads to better security while holding constant measurement accuracy. Our data indicates that the best option is to stratify into as many strata as possible.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1017/sjp.2014.50DOI Listing
April 2015

A new IRT-based standard setting method: application to eCat-listening.

Psicothema 2013 ;25(2):238-44

Universidad Autónoma de Madrid, Madrid, Spain.

Background: Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores.

Method: eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR).

Results: The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR.

Conclusions: It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7334/psicothema2012.147DOI Listing
April 2014

Computerized adaptive testing: the capitalization on chance problem.

Span J Psychol 2012 Mar;15(1):424-41

Facultad de Psicología, Universidad Autónoma de Madrid, 28049-Madrid, Spain.

This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5209/rev_sjop.2012.v15.n1.37348DOI Listing
March 2012

eCAT-Listening: design and psychometric properties of a computerized adaptive test on English Listening.

Psicothema 2011 Nov;23(4):802-7

Facultad de Psicología, Universidad Autónoma de Madrid, 28049 Madrid, Spain.

In this study, eCAT-Listening, a new computerized adaptive test for the evaluation of English Listening, is described. Item bank development, anchor design for data collection, and the study of the psychometric properties of the item bank and the adaptive test are described. The calibration sample comprised 1.576 participants. Good psychometric guarantees: the bank is unidimensional, the items are satisfactorily fitted to the 3-parameter logistic model, and an accurate estimation of the trait level is obtained. As validity evidence, a high correlation was obtained between the estimated trait level and a latent factor made up of the diverse criteria selected. The analysis of the trait level estimation by means of a simulation led us to fix the test length at 20 items, with a maximum exposure rate of .40.
View Article and Find Full Text PDF

Download full-text PDF

Source
November 2011

Varying the valuating function and the presentable bank in computerized adaptive testing.

Span J Psychol 2011 May;14(1):500-8

Facultad de Psicología, Universidad Autónoma de Barcelona, 08193 Bellaterra, Barcelona, Spain.

In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5209/rev_sjop.2011.v14.n1.45DOI Listing
May 2011

[Item parameter drift in computerized adaptive testing: Study with eCAT].

Psicothema 2010 May;22(2):340-7

Facultad de Psicología, Universidad Autónoma de Madrid, Madrid, Spain.

This study describes the parameter drift analysis conducted on eCAT (a Computerized Adaptive Test to assess the written English level of Spanish speakers). The original calibration of the item bank (N = 3224) was compared to a new calibration obtained from the data provided by most eCAT operative administrations (N = 7254). A Differential Item Functioning (DIF) study was conducted between the original and the new calibrations. The impact that the new parameters have on the trait level estimates was obtained by simulation. Results show that parameter drift is found especially for a and c parameters, an important number of bank items show DIF, and the parameter change has a moderate impact on high-level-English ? estimates. It is then recommended to replace the original estimates by the new set.
View Article and Find Full Text PDF

Download full-text PDF

Source
May 2010

Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing.

Span J Psychol 2008 Nov;11(2):618-25

Facultad de Psicología, Universidad Autonoma de Barcelona, 08193 Bellaterra, Spain.

If examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.
View Article and Find Full Text PDF

Download full-text PDF

Source
November 2008

Incorporating randomness in the Fisher information for improving item-exposure control in CATs.

Br J Math Stat Psychol 2008 Nov 4;61(Pt 2):493-513. Epub 2007 Aug 4.

Facultad de Psicología, Universidad Auto noma de Barcelona, Barcelona, Spain.

The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item-exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods (Revuelta & Ponsoda, 1998) and the proportional method (Segall, 2004a), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1348/000711007X230937DOI Listing
November 2008

[Item selection rules in a Computerized Adaptive Test for the assessment of written English].

Psicothema 2006 Nov;18(4):828-34

Faculty of Psychology, Universidad Autónoma de Madrid, 28049 Madrid, Spain.

Item selection rules in a Computerized Adaptive Test for the assessment of written English. e-CAT is a Computerized Adaptive Test for the evaluation of written English knowledge, using the item selection rule most commonly employed: the maximum Fisher information criterion. Some of the problems of this criterion have a negative impact in the estimation accuracy and in the item bank security. In this study, the performance of this item selection rule is compared, by means of simulation, with two other rules: selecting the item with maximum Fisher information in an interval (Veerkamp y Berger, 1997) and a new criterion, called "maximum Fisher information in an interval with geometric mean". In general, this new rule shows smaller measurement error and smaller item overlap rates. It seems, thus, recommendable, as it allows the simultaneous improvement of estimation accuracy and the maintenance of the item bank security of e-CAT.
View Article and Find Full Text PDF

Download full-text PDF

Source
November 2006

Maximum information stratification method for controlling item exposure in computerized adaptive testing.

Psicothema 2006 Feb;18(1):156-9

Facultad de Psicología, Universidad Autónoma de Madrid, Spain.

The proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters ( a ) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter ( c ), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank.
View Article and Find Full Text PDF

Download full-text PDF

Source
February 2006
-->