%0 Journal Article %J Applied Psychological Measurement %D 2020 %T A Blocked-CAT Procedure for CD-CAT %A Mehmet Kaplan %A Jimmy de la Torre %X This article introduces a blocked-design procedure for cognitive diagnosis computerized adaptive testing (CD-CAT), which allows examinees to review items and change their answers during test administration. Four blocking versions of the new procedure were proposed. In addition, the impact of several factors, namely, item quality, generating model, block size, and test length, on the classification rates was investigated. Three popular item selection indices in CD-CAT were used and their efficiency compared using the new procedure. An additional study was carried out to examine the potential benefit of item review. The results showed that the new procedure is promising in that allowing item review resulted only in a small loss in attribute classification accuracy under some conditions. Moreover, using a blocked-design CD-CAT is beneficial to the extent that it alleviates the negative impact of test anxiety on examinees’ true performance. %B Applied Psychological Measurement %V 44 %P 49-64 %U https://doi.org/10.1177/0146621619835500 %R 10.1177/0146621619835500 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Bayesian Perspectives on Adaptive Testing %A Wim J. van der Linden %A Bingnan Jiang %A Hao Ren %A Seung W. Choi %A Qi Diao %K Bayesian Perspective %K CAT %X

Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.

The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.

Session Video 1

Session Video 2

 

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Bayesian Networks in Educational Assessment: The State of the Field %A Culbertson, Michael J. %X Bayesian networks (BN) provide a convenient and intuitive framework for specifying complex joint probability distributions and are thus well suited for modeling content domains of educational assessments at a diagnostic level. BN have been used extensively in the artificial intelligence community as student models for intelligent tutoring systems (ITS) but have received less attention among psychometricians. This critical review outlines the existing research on BN in educational assessment, providing an introduction to the ITS literature for the psychometric community, and points out several promising research paths. The online appendix lists 40 assessment systems that serve as empirical examples of the use of BN for educational assessment in a variety of domains. %B Applied Psychological Measurement %V 40 %P 3-21 %U http://apm.sagepub.com/content/40/1/3.abstract %R 10.1177/0146621615590401 %0 Journal Article %J Educational and Psychological Measurement %D 2015 %T Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model %A Seo, Dong Gi %A Weiss, David J. %X Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection. %B Educational and Psychological Measurement %V 75 %P 954-978 %U http://epm.sagepub.com/content/75/6/954.abstract %R 10.1177/0013164415575147 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T Balancing Flexible Constraints and Measurement Precision in Computerized Adaptive Testing %A Moyer, Eric L. %A Galindo, Jennifer L. %A Dodd, Barbara G. %X

Managing test specifications—both multiple nonstatistical constraints and flexibly defined constraints—has become an important part of designing item selection procedures for computerized adaptive tests (CATs) in achievement testing. This study compared the effectiveness of three procedures: constrained CAT, flexible modified constrained CAT, and the weighted penalty model in balancing multiple flexible constraints and maximizing measurement precision in a fixed-length CAT. The study also addressed the effect of two different test lengths—25 items and 50 items—and of including or excluding the randomesque item exposure control procedure with the three methods, all of which were found effective in selecting items that met flexible test constraints when used in the item selection process for longer tests. When the randomesque method was included to control for item exposure, the weighted penalty model and the flexible modified constrained CAT models performed better than did the constrained CAT procedure in maintaining measurement precision. When no item exposure control method was used in the item selection process, no practical difference was found in the measurement precision of each balancing method.

%B Educational and Psychological Measurement %V 72 %P 629-648 %U http://epm.sagepub.com/content/72/4/629.abstract %R 10.1177/0013164411431838 %0 Journal Article %J Journal of Methods and Measurement in the Social Sciences %D 2011 %T Better Data From Better Measurements Using Computerized Adaptive Testing %A Weiss, D. J. %X The process of constructing a fixed-length conventional test frequently focuses on maximizing internal consistency reliability by selecting test items that are of average difficulty and high discrimination (a ―peaked‖ test). The effect of constructing such a test, when viewed from the perspective of item response theory, is test scores that are precise for examinees whose trait levels are near the point at which the test is peaked; as examinee trait levels deviate from the mean, the precision of their scores decreases substantially. Results of a small simulation study demonstrate that when peaked tests are ―off target‖ for an examinee, their scores are biased and have spuriously high standard deviations, reflecting substantial amounts of error. These errors can reduce the correlations of these kinds of scores with other variables and adversely affect the results of standard statistical tests. By contrast, scores from adaptive tests are essentially unbiased and have standard deviations that are much closer to true values. Basic concepts of adaptive testing are introduced and fully adaptive computerized tests (CATs) based on IRT are described. Several examples of response records from CATs are discussed to illustrate how CATs function. Some operational issues, including item exposure, content balancing, and enemy items are also briefly discussed. It is concluded that because CAT constructs a unique test for examinee, scores from CATs will be more precise and should provide better data for social science research and applications. %B Journal of Methods and Measurement in the Social Sciences %7 No. 1 %V Vol. 2 %P 1-27 %N No. 1 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Building Affordable CD-CAT Systems for Schools To Address Today's Challenges In Assessment %A Chang, Hua-Hua %K affordability %K CAT %K cost %B Annual Conference of the International Association for Computerized Adaptive Testing %G eng %0 Journal Article %J Psicologica %D 2010 %T Bayesian item selection in constrained adaptive testing %A Veldkamp, B. P. %K computerized adaptive testing %X Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item selection process. The Shadow Test Approach is a general purpose algorithm for administering constrained CAT. In this paper it is shown how the approach can be slightly modified to handle Bayesian item selection criteria. No differences in performance were found between the shadow test approach and the modifiedapproach. In a simulation study of the LSAT, the effects of Bayesian item selection criteria are illustrated. The results are compared to item selection based on Fisher Information. General recommendations about the use of Bayesian item selection criteria are provided. %B Psicologica %V 31 %P 149-169 %G eng %0 Book Section %D 2009 %T A burdened CAT: Incorporating response burden with maximum Fisher's information for item selection %A Swartz, R.J.. %A Choi, S. W. %X Widely used in various educational and vocational assessment applications, computerized adaptive testing (CAT) has recently begun to be used to measure patient-reported outcomes Although successful in reducing respondent burden, most current CAT algorithms do not formally consider it as part of the item selection process. This study used a loss function approach motivated by decision theory to develop an item selection method that incorporates respondent burden into the item selection process based on maximum Fisher information item selection. Several different loss functions placing varying degrees of importance on respondent burden were compared, using an item bank of 62 polytomous items measuring depressive symptoms. One dataset consisted of the real responses from the 730 subjects who responded to all the items. A second dataset consisted of simulated responses to all the items based on a grid of latent trait scores with replicates at each grid point. The algorithm enables a CAT administrator to more efficiently control the respondent burden without severely affecting the measurement precision than when using MFI alone. In particular, the loss function incorporating respondent burden protected respondents from receiving longer tests when their estimated trait score fell in a region where there were few informative items. %C In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Applied Measurement %D 2008 %T Binary items and beyond: a simulation of computer adaptive testing using the Rasch partial credit model %A Lange, R. %K *Data Interpretation, Statistical %K *User-Computer Interface %K Educational Measurement/*statistics & numerical data %K Humans %K Illinois %K Models, Statistical %X Past research on Computer Adaptive Testing (CAT) has focused almost exclusively on the use of binary items and minimizing the number of items to be administrated. To address this situation, extensive computer simulations were performed using partial credit items with two, three, four, and five response categories. Other variables manipulated include the number of available items, the number of respondents used to calibrate the items, and various manipulations of respondents' true locations. Three item selection strategies were used, and the theoretically optimal Maximum Information method was compared to random item selection and Bayesian Maximum Falsification approaches. The Rasch partial credit model proved to be quite robust to various imperfections, and systematic distortions did occur mainly in the absence of sufficient numbers of items located near the trait or performance levels of interest. The findings further indicate that having small numbers of items is more problematic in practice than having small numbers of respondents to calibrate these items. Most importantly, increasing the number of response categories consistently improved CAT's efficiency as well as the general quality of the results. In fact, increasing the number of response categories proved to have a greater positive impact than did the choice of item selection method, as the Maximum Information approach performed only slightly better than the Maximum Falsification approach. Accordingly, issues related to the efficiency of item selection methods are far less important than is commonly suggested in the literature. However, being based on computer simulations only, the preceding presumes that actual respondents behave according to the Rasch model. CAT research could thus benefit from empirical studies aimed at determining whether, and if so, how, selection strategies impact performance. %B Journal of Applied Measurement %7 2008/01/09 %V 9 %P 81-104 %@ 1529-7713 (Print)1529-7713 (Linking) %G eng %M 18180552 %0 Book Section %D 2007 %T Bundle models for computerized adaptive testing in e-learning assessment %A Scalise, K. %A Wilson, M. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J International Journal of Artificial Intelligence in Education %D 2005 %T A Bayesian student model without hidden nodes and its comparison with item response theory %A Desmarais, M. C. %A Pu, X. %K Bayesian Student Model %K computer adaptive testing %K hidden nodes %K Item Response Theory %X The Bayesian framework offers a number of techniques for inferring an individual's knowledge state from evidence of mastery of concepts or skills. A typical application where such a technique can be useful is Computer Adaptive Testing (CAT). A Bayesian modeling scheme, POKS, is proposed and compared to the traditional Item Response Theory (IRT), which has been the prevalent CAT approach for the last three decades. POKS is based on the theory of knowledge spaces and constructs item-to-item graph structures without hidden nodes. It aims to offer an effective knowledge assessment method with an efficient algorithm for learning the graph structure from data. We review the different Bayesian approaches to modeling student ability assessment and discuss how POKS relates to them. The performance of POKS is compared to the IRT two parameter logistic model. Experimental results over a 34 item Unix test and a 160 item French language test show that both approaches can classify examinees as master or non-master effectively and efficiently, with relatively comparable performance. However, more significant differences are found in favor of POKS for a second task that consists in predicting individual question item outcome. Implications of these results for adaptive testing and student modeling are discussed, as well as the limitations and advantages of POKS, namely the issue of integrating concepts into its structure. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B International Journal of Artificial Intelligence in Education %I IOS Press: Netherlands %V 15 %P 291-323 %@ 1560-4292 (Print); 1560-4306 (Electronic) %G eng %M 2006-10770-003 %0 Book Section %D 2003 %T Bayesian checks on outlying response times in computerized adaptive testing %A van der Linden, W. J. %C H. Yanai, A. Okada, K. Shigemasu, Y. Kano, Y. and J. J. Meulman, (Eds.), New developments in psychometrics (pp. 215-222). New York: Springer-Verlag. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T A Bayesian method for the detection of item preknowledge in computerized adaptive testing %A McLeod L. D., Lewis, C., %A Thissen, D. %B Applied Psychological Measurement %V 27 %P 2, 121-137 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T A Bayesian method for the detection of item preknowledge in computerized adaptive testing %A McLeod, L. %A Lewis, C. %A Thissen, D. %K Adaptive Testing %K Cheating %K Computer Assisted Testing %K Individual Differences computerized adaptive testing %K Item %K Item Analysis (Statistical) %K Mathematical Modeling %K Response Theory %X With the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 27 %P 121-137 %G eng %0 Journal Article %J Psychometrika %D 1999 %T A Bayesian random effects model for testlets %A Bradlow, E. T. %A Wainer, H., %A Wang, X %B Psychometrika %V 64 %P 153-168 %G eng %0 Journal Article %J European Journal of Psychological Assessment %D 1999 %T Benefits from computerized adaptive testing as seen in simulation studies %A Hornke, L. F. %B European Journal of Psychological Assessment %V 15 %P 91-98 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T A Bayesian approach to detection of item preknowledge in a CAT %A McLeod, L. D. %A Lewis, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Journal Article %J Journal of the American Statistical Association %D 1998 %T Bayesian identification of outliers in computerized adaptive testing %A Bradlow, E. T. %A Weiss, R. E. %A Cho, M. %X We consider the problem of identifying examinees with aberrant response patterns in a computerized adaptive test (CAT). The vec-tor of responses yi of person i from the CAT comprise a multivariate response vector. Multivariate observations may be outlying in manydi erent directions and we characterize speci c directions as corre- sponding to outliers with different interpretations. We develop a class of outlier statistics to identify different types of outliers based on a con-trol chart type methodology. The outlier methodology is adaptable to general longitudinal discrete data structures. We consider several procedures to judge how extreme a particular outlier is. Data from the National Council Licensure EXamination (NCLEX) motivates our development and is used to illustrate the results. %B Journal of the American Statistical Association %V 93 %P 910-919 %G eng %0 Journal Article %J Psychometrika %D 1998 %T Bayesian item selection criteria for adaptive testing %A van der Linden, W. J. %B Psychometrika %V 63 %P 201-216 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T A Bayesian enhancement of Mantel Haenszel DIF analysis for computer adaptive tests %A Zwick, R. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Psychometrika %D 1996 %T Bayesian item selection criteria for adaptive testing %A van der Linden, W. J. %B Psychometrika %V 63 %P 201-216 %G eng %0 Generic %D 1996 %T Bayesian item selection criteria for adaptive testing (Research Report 96-01) %A van der Linden, W. J. %C Twente, The Netherlands: Department of Educational Measurement and Data Analysis %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1996 %T Building a statistical foundation for computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %B Paper presented at the annual meeting of the Psychometric Society %C Banff, Alberta, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1995 %T A Bayesian computerized mastery model with multiple cut scores %A Smith, R. L. %A Lewis, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco CA %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the Psychometric Society %D 1995 %T Bayesian item selection in adaptive testing %A van der Linden, W. J. %B Paper presented at the Annual Meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Journal Article %J Journal of Educational Measurement %D 1991 %T Building algebra testlets: A comparison of hierarchical and linear structures %A Wainer, H., %A Lewis, C. %A Kaplan, B. %A Braswell, J. %B Journal of Educational Measurement %V 8 %P xxx-xxx %G eng %0 Journal Article %J Journal of Educational Computing Research %D 1989 %T Bayesian adaptation during computer-based tests and computer-guided practice exercises %A Frick, T. W. %B Journal of Educational Computing Research %V 5(1) %P 89-114 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1984 %T Bias and Information of Bayesian Adaptive Testing %A Weiss, D. J. %A J. R. McBride %B Applied Psychological Measurement %V 8 %P 273-285 %G English %N 3 %0 Journal Article %J Applied Psychological Measurement %D 1984 %T Bias and information of Bayesian adaptive testing %A Weiss, D. J. %A J. R. McBride %B Applied Psychological Measurement %V 8 %P 273-285 %G eng %0 Generic %D 1983 %T Bias and information of Bayesian adaptive testing (Research Report 83-2) %A Weiss, D. J. %A J. R. McBride %C Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1979 %T Bayesian sequential design and analysis of dichotomous experiments with special reference to mental testing %A Owen, R. J. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T Bayesian Tailored Testing and the Influence of Item Bank Characteristics %A Jensema, C J %B Applied Psychological Measurement %V 1 %P 111-120 %G En %N 1 %0 Journal Article %J Applied Psychological Measurement %D 1977 %T Bayesian tailored testing and the influence of item bank characteristics %A Jensema, C J %B Applied Psychological Measurement %V 1 %P 111-120 %G eng %0 Book Section %D 1977 %T A brief overview of adaptive testing %A J. R. McBride %C D. J. Weiss (Ed.), Applications of computerized testing (Research Report 77-1). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T A broad-range tailored test of verbal ability %A Lord, F. M., %B Applied Psychological Measurement %V 1 %P 95-100 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T A Broad-Range Tailored Test of Verbal Ability %A Lord, F M %B Applied Psychological Measurement %V 1 %P 95-100 %G En %N 1 %0 Book Section %D 1976 %T Bandwidth, fidelity, and adaptive tests %A J. R. McBride %C T. J. McConnell, Jr. (Ed.), CAT/C 2 1975: The second conference on computer-assisted test construction. Atlanta GA: Atlanta Public Schools. %G eng %0 Book Section %D 1976 %T Bayesian tailored testing and the influence of item bank characteristics %A Jensema, C J %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 82-89). Washington DC: U.S. Government Printing Office. %G eng %0 Book Section %D 1976 %T A broad range tailored test of verbal ability %A Lord, F. M., %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 75-78). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1975 %T A basic test theory generalizable to tailored testing (Technical Report No 1) %A Cliff, N. A. %C Los Angeles CA: University of Southern California, Department of Psychology. %G eng %0 Journal Article %J Journal of the American Statistical Association %D 1975 %T A Bayesian sequential procedure for quantal response in the context of adaptive mental testing %A Owen, R. J. %B Journal of the American Statistical Association %V 70 %P 351-356 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1975 %T Behavior of the maximum likelihood estimate in a simulated tailored testing situation %A Samejima, F. %B Paper presented at the annual meeting of the Psychometric Society %C Iowa City %G eng %0 Generic %D 1975 %T Best test design and self-tailored testing (Research Memorandum No 19) %A Wright, B. D. %A Douglas, G. A. %C Chicago: University of Chicago, Department of Education, Statistical Laboratory. %G eng %0 Generic %D 1975 %T A broad range test of verbal ability (RB-75-5) %A Lord, F. M., %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B American Educational Research Association %D 1974 %T A Bayesian approach in sequential testing %A Hsu, T. %A Pingel, K. %B American Educational Research Association %C Chicago IL %8 04/1974 %G eng %0 Generic %D 1969 %T A Bayesian approach to tailored testing (Research Report 69-92) %A Owen, R. J. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1969 %T Bayesian methods in psychological testing (Research Bulletin RB-69-31) %A Novick, M. R. %C Princeton NJ: Educational Testing Service %G eng