%0 Journal Article %J Applied Psychological Measurement %D 2020 %T Framework for Developing Multistage Testing With Intersectional Routing for Short-Length Tests %A Kyung (Chris) T. Han %X Multistage testing (MST) has many practical advantages over typical item-level computerized adaptive testing (CAT), but there is a substantial tradeoff when using MST because of its reduced level of adaptability. In typical MST, the first stage almost always performs as a routing stage in which all test takers see a linear test form. If multiple test sections measure different but moderately or highly correlated traits, then a score estimate for one section might be capable of adaptively selecting item modules for following sections without having to administer routing stages repeatedly for each section. In this article, a new framework for developing MST with intersectional routing (ISR) was proposed and evaluated under several research conditions with different MST structures, section score distributions and relationships, and types of regression models for ISR. The overall findings of the study suggested that MST with ISR approach could improve measurement efficiency and test optimality especially with tests with short lengths. %B Applied Psychological Measurement %V 44 %P 87-102 %U https://doi.org/10.1177/0146621619837226 %R 10.1177/0146621619837226 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2018 %T Factors Affecting the Classification Accuracy and Average Length of a Variable-Length Cognitive Diagnostic Computerized Test %A Huebner, Alan %A Finkelman, Matthew D. %A Weissman, Alexander %B Journal of Computerized Adaptive Testing %V 6 %P 1-14 %U http://iacat.org/jcat/index.php/jcat/article/view/55/30 %N 1 %R 10.7333/1802-060101 %0 Journal Article %J Practical Assessment, Research & Evaluation %D 2018 %T From Simulation to Implementation: Two CAT Case Studies %A John J Barnard %B Practical Assessment, Research & Evaluation %V 23 %G English %U http://pareonline.net/getvn.asp?v=23&n=14 %N 14 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T FastCAT – Customizing CAT Administration Rules to Increase Response Efficiency %A Richard C. Gershon %K Administration Rules %K Efficiency %K FastCAT %X
A typical pre-requisite for CAT administration is the existence of an underlying item bank completely covering the range of the trait being measured. When a bank fails to cover the full range of the trait, examinees who are close to the floor or ceiling will often never achieve a standard error cut-off and examinees will be forced to answer items increasingly less relevant to their trait level. This scenario is fairly typical for many patients responding to patient reported outcome measures (PROMS). For IACAT 2017 ABSTRACTS BOOKLET 65 example, in the assessment of physical functioning, many item banks ceiling at about the 50%ile. For most healthy patients, after a few items the only items remaining in the bank will represent decreasing ability (even though the patient has already indicated that they are at or above the mean for the population). Another example would be for a patient with no pain taking a Pain CAT. They will probably answer “Never” pain for every succeeding item out to the maximum test length. For this project we sought to reduce patient burden, while maintaining test accuracy, through the reduction of CAT length using novel stopping rules.
We studied CAT administration assessment histories for patients who were administered Patient Reported Outcomes Measurement Information System (PROMIS) CATs. In the PROMIS 1 Wave 2 Back Pain/Depression Study, CATs were administered to N=417 cases assessed across 11 PROMIS domains. Original CAT administration rules were: start with a pre-identified item of moderate difficulty; administer a minimum four items per case; stop when an estimated theta’s SE declines to < 0.3 OR a maximum 12 items are administered.
Original CAT. 12,622 CAT administrations were analyzed. CATs ranged in number of items administered from 4 to 12 items; 72.5% were 4-item CATs. The second and third most frequently occurring CATs were 5-item (n=1102; 8.7%) and 12-item CATs (n=964; 7.6%). 64,062 items total were administered, averaging 5.1 items per CAT. Customized CAT. Three new CAT stopping rules were introduced, each with potential to increase item-presentation efficiency and maintain required score precision: Stop if a case responds to the first two items administered using an “extreme” response category (towards the ceiling or floor for the in item bank, or at ); administer a minimum two items per case; stop if the change in SE estimate (previous to current item administration) is positive but < 0.01.
The three new stopping rules reduced the total number of items administered by 25,643 to 38,419 items (40.0% reduction). After four items were administered, only n=1,824 CATs (14.5%) were still in assessment mode (vs. n=3,477 (27.5%) in the original CATs). On average, cases completed 3.0 items per CAT (vs. 5.1).
Each new rule addressed specific inefficiencies in the original CAT administration process: Cases not having or possessing a low/clinically unimportant level of the assessed domain; allow the SE <0.3 stopping criterion to come into effect earlier in the CAT administration process; cases experiencing poor domain item bank measurement, (e.g., “floor,” “ceiling” cases).
%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1oPJV-x0p9hRmgJ7t6k-MCC1nAoBSFM1w %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T From Blueprints to Systems: An Integrated Approach to Adaptive Testing %A Gage Kingsbury %A Tony Zara %K CAT %K integrated approach %K Keynote %XFor years, test blueprints have told test developers how many items and what types of items will be included in a test. Adaptive testing adopted this approach from paper testing, and it is reasonably useful. Unfortunately, 'how many items and what types of items' are not all the elements one should consider when choosing items for an adaptive test. To fill in gaps, practitioners have developed tools to allow an adaptive test to behave appropriately (i.e. examining exposure control, content balancing, item drift procedures, etc.). Each of these tools involves the use of a separate process external to the primary item selection process.
The use of these subsidiary processes makes item selection less optimal and makes it difficult to prioritize aspects of selection. This discussion describes systems-based adaptive testing. This approach uses metadata concerning items, test takers and test elements to select items. These elements are weighted by the stakeholders to shape an expanded blueprint designed for adaptive testing.
%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1CBaAfH4ES7XivmvrMjPeKyFCsFZOpQMJ %0 Journal Article %J Practical Assessment Research & Evaluation %D 2011 %T A framework for the development of computerized adaptive tests %A Thompson, N. A. %A Weiss, D. J. %X A substantial amount of research has been conducted over the past 40 years on technical aspects of computerized adaptive testing (CAT), such as item selection algorithms, item exposure controls, and termination criteria. However, there is little literature providing practical guidance on the development of a CAT. This paper seeks to collate some of the available research methodologies into a general framework for the development of any CAT assessment. %B Practical Assessment Research & Evaluation %I Practical Assessment Research & Evaluation %V 16 %G eng %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T From Reliability to Validity: Expanding Adaptive Testing Practice to Find the Most Valid Score for Each Test Taker %A Steven L. Wise %K CAT %K CIV %K construct-irrelevant variance %K Individual Score Validity %K ISV %K low test taking motivation %K Reliability %K validity %XCAT is an exception to the traditional conception of validity. It is one of the few examples of individualized testing. Item difficulty is tailored to each examinee. The intent, however, is increased efficiency. Focus on reliability (reduced standard error); Equivalence with paper & pencil tests is valued; Validity is enhanced through improved reliability.
How Else Might We Individualize Testing Using CAT?
An ISV-Based View of Validity
Test Event -- An examinee encounters a series of items in a particular context.
CAT Goal: individualize testing to address CIV threats to score validity (i.e., maximize ISV).
Some Research Issues:
A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level (θ) estimation and vice versa. When discrepancies exist between an examinee's estimated and true θ levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with optimal item selection can be predicted using item response theory (IRT), without knowledge of an examinee's true θ level, yielding a specific reference point for applying an internal correcting or feedback control mechanism. Incorporating such a mechanism in a CAT is shown to be an effective strategy for increasing item selection efficiency. Results from simulation studies using maximum likelihood (ML) and modal a posteriori (MAP) trait-level estimation and Fisher information (FI) and Fisher interval information (FII) item selection are provided.
%B Applied Psychological Measurement %V 30 %P 84-99 %U http://apm.sagepub.com/content/30/2/84.abstract %R 10.1177/0146621605282774 %0 Book Section %D 2005 %T Features of the estimated sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules %A Blais, J-G. %A Raîche, G. %C D. G. Englehard (Eds.), Objective measurement: Theory into practice. Volume 6. %G eng %0 Journal Article %J Journal of Educational Measurement %D 2004 %T ffects of practical constraints on item selection rules at the early stages of computerized adaptive testing %A Chen, Y.-Y. %A Ankenmann, R. D. %B Journal of Educational Measurement %V 41 %P 149-174 %G eng %0 Journal Article %J Quality of Life Research %D 2003 %T The feasibility of applying item response theory to measures of migraine impact: a re-analysis of three clinical studies %A Bjorner, J. B. %A Kosinski, M. %A Ware, J. E., Jr. %K *Sickness Impact Profile %K Adolescent %K Adult %K Aged %K Comparative Study %K Cost of Illness %K Factor Analysis, Statistical %K Feasibility Studies %K Female %K Human %K Male %K Middle Aged %K Migraine/*psychology %K Models, Psychological %K Psychometrics/instrumentation/*methods %K Quality of Life/*psychology %K Questionnaires %K Support, Non-U.S. Gov't %X BACKGROUND: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. OBJECTIVES: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact--the Migraine Specific Questionnaire (MSQ). METHODS: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. RESULTS: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. CONCLUSIONS: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing. %B Quality of Life Research %V 12 %P 887-902 %G eng %M 14661765 %0 Journal Article %J Journal of Technology, Learning, and Assessment %D 2003 %T A feasibility study of on-the-fly item generation in adaptive testing %A Bejar, I. I. %A Lawless, R. R., %A Morley, M. E., %A Wagner, M. E., %A Bennett R. E., %A Revuelta, J. %B Journal of Technology, Learning, and Assessment %V 2 %G eng %N 3 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Fairness issues in adaptive tests with strict time limits %A Bridgeman, B. %A Cline, F. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Journal Article %J Quality of Life Research %D 2002 %T Feasibility and acceptability of computerized adaptive testing (CAT) for fatigue monitoring in clinical practice %A Davis, K. M. %A Chang, C-H. %A Lai, J-S. %A Cella, D. %B Quality of Life Research %V 11(7) %P 134 %G eng %0 Generic %D 2002 %T A feasibility study of on-the-fly item generation in adaptive testing (GRE Board Report No 98-12) %A Bejar, I. I. %A Lawless, R. R %A Morley, M. E %A Wagner, M. E. %A Bennett, R. E. %A Revuelta, J. %C Educational Testing Service RR02-23. Princeton NJ: Educational Testing Service. Note = “{PDF file, 193 KB} %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T A further study on adjusting CAT item selection starting point for individual examinees %A Fan, M. %A Zhu. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book %D 2001 %T The FastTEST Professional Testing System, Version 1.6 [Computer software] %A Assessment-Systems-Corporation %C St. Paul MN: Author %G eng %0 Journal Article %J American School Board Journal %D 2001 %T Final answer? %A Coyle, J. %K computerized adaptive testing %X The Northwest Evaluation Association helped an Indiana school district develop a computerized adaptive testing system that was aligned with its curriculum and geared toward measuring individual student growth. Now the district can obtain such information from semester to semester and year to year, get immediate results, and test students on demand. (MLH) %B American School Board Journal %V 188 %P 24-26 %G eng %M EJ623034 %0 Generic %D 2000 %T A framework for comparing adaptive test designs %A Stocking, M. L. %C Unpublished manuscript %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council of Measurement in Education %D 2000 %T From simulation to application: Examinees react to computerized testing %A Pommerich, M %A Burden, T. %B Paper presented at the annual meeting of the National Council of Measurement in Education %C New Orleans, April 2000 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Fairness in computer-based testing %A Gallagher, Aand %A Bridgeman, B. %A Calahan, C %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the *?*. %D 1999 %T Formula score and direct optimization algorithms in CAT ASVAB on-line calibration %A Levine, M. V. %A Krass, I. A. %B Paper presented at the annual meeting of the *?*. %G eng %0 Generic %D 1998 %T Feasibility studies of two-stage testing in large-scale educational assessment: Implications for NAEP %A Bock, R. D. %A Zimowski, M. F. %C American Institutes for Research, CA %G eng %0 Generic %D 1998 %T A framework for comparing adaptive test designs %A Stocking, M. L. %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council for Measurement in Education %D 1998 %T A framework for exploring and controlling risks associated with test item exposure over time %A Luecht, RM %B Paper presented at the Annual Meeting of the National Council for Measurement in Education %C San Diego, CA %G eng %0 Journal Article %J Journal of Educational Measurement %D 1997 %T Flawed items in computerized adaptive testing %A Potenza, M. T. %A Stocking, M. L. %B Journal of Educational Measurement %V 4 %P 79-96 %G eng %0 Book Section %D 1995 %T From adaptive testing to automated scoring of architectural simulations %A Bejar, I. I. %C L. E. Mancall and P. G. Bashook (Eds.), Assessing clinical reasoning: The oral examination and alternative methods (pp. 115-130. Evanston IL: The American Board of Medical Specialities. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1994 %T A few more issues to consider in multidimensional computerized adaptive testing %A Luecht, RM %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Generic %D 1993 %T Field test of a computer-based GRE general test (GRE Board Technical Report 88-8; Educational Testing Service Research Rep No RR 93-07) %A Schaeffer, G. A. %A Reese, C. M. %A Steffen, M. %A McKinley, R. L. %A Mills, C. N. %C Princeton NJ: Educational Testing Service. %G eng %0 Book Section %D 1990 %T Future challenges %A Wainer, H., %A Dorans, N. J. %A Green, B. F. %A Mislevy, R. J. %A Steinberg, L. %A Thissen, D. %C H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 233-272). Hillsdale NJ: Erlbaum. %G eng %0 Generic %D 1990 %T Future directions for the National Council: The Computerized Adaptive Testing Project %A Bouchard, J. %C Issues, 11, 1-5(National Council of State Boards of Nursing) %G eng %0 Journal Article %J Issues %D 1990 %T Future directions for the National Council: the Computerized Adaptive Testing Project %A Bouchard, J. %K *Computers %K *Licensure %K Educational Measurement/*methods %K Societies, Nursing %K United States %B Issues %V 11 %P 1, 3, 5 %G eng %M 2074153 %0 Conference Paper %B Unpublished manuscript. %D 1988 %T Fitting the two-parameter model to personality data: The parameterization of the Multidimensional Personality Questionnaire %A Reise, S. P. %A Waller, N. G. %B Unpublished manuscript. %G eng %0 Generic %D 1988 %T The four generations of computerized educational measurement (Research Report 98-35) %A Bunderson, C. V %A Inouye, D. K %A Olsen, J. B. %C Princeton NJ: Educational Testing Service. %0 Generic %D 1987 %T Final report: Feasibility study of a computerized test administration of the CLAST %A Legg, S. M. %A Buhr, D. C. %C University of Florida: Institute for Student Assessment and Evaluation %G eng %0 Generic %D 1987 %T Full-information item factor analysis from the ASVAB CAT item pool (Methodology Research Center Report 87-1) %A Zimowski, M. F. %A Bock, R. D. %C Chicago IL: University of Chicago %G eng %0 Generic %D 1987 %T Functional and design specifications for the National Council of State Boards of Nursing adaptive testing system %A A Zara %A Bosma, J. %A Kaplan, R. %C Unpublished manuscript %G eng %0 Generic %D 1986 %T Final report: Adaptive testing of spatial abilities (ONR 150 531) %A Bejar, I. I. %C Princeton, NJ: Educational Testing Service %G eng %0 Generic %D 1986 %T Final report: The use of tailored testing with instructional programs (Research Report ONR 86-1) %A Reckase, M. D. %C Iowa City IA: The American College Testing Program, Assessment Programs Area, Test Development Division. %G eng %0 Book Section %D 1986 %T The four generations of computerized educational measurement %A Bunderson, C. V %A Inouye, D. K %A Olsen, J. B. %C In R. L. Linn (Ed.), Educational Measurement (3rd ed and pp. 367-407). New York: Macmillan. %G eng %0 Generic %D 1985 %T Final report: Computerized adaptive measurement of achievement and ability %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1983 %T Final report: Computer-based measurement of intellectual capabilities %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1981 %T Factors influencing the psychometric characteristics of an adaptive testing strategy for test batteries (Research Rep. No. 81-4) %A Maurelli, V. A. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1981 %T Final report: Computerized adaptive ability testing %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1981 %T Final report: Procedures for criterion referenced tailored testing %A Reckase, M. D. %C Columbia: University of Missouri, Educational Psychology Department %G eng %0 Generic %D 1980 %T Final report: Computerized adaptive performance evaluation %A Weiss, D. J. %C Minneapolis: Univerity of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1980 %T Final Report: Computerized adaptive testing, assessment of requirements %A Rehab-Group-Inc. %C Falls Church VA: Author %G eng %0 Journal Article %J Programmed Larning and Educational Technology %D 1979 %T Four realizations of pyramidal adaptive testing %A Hornke, L. F. %B Programmed Larning and Educational Technology %V 16 %P 164-169 %G eng %0 Generic %D 1977 %T Flexilevel adaptive testing paradigm: Validation in technical training %A Hansen, D. N. %A Ross, S. %A Harris, D. A. %C AFHRL Technical Report 77-35 (I) %G eng %0 Generic %D 1977 %T Flexilevel adaptive training paradigm: Hierarchical concept structures %A Hansen, D. N. %A Ross, S. %A Harris, D. A. %C AFHRL Technical Report 77-35 (II) %G eng %0 Conference Paper %B Paper presented at the Third International Symposium on Educational Testing %D 1977 %T Four realizations of pyramidal adaptive testing strategies %A Hornke, L. F. %B Paper presented at the Third International Symposium on Educational Testing %C University of Leiden, The Netherlands %G eng %0 Book Section %D 1976 %T A five-year quest: Is computerized adaptive testing feasible? %A Urry, V. W. %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 97-102). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1972 %T Fully adaptive sequential testing: A Bayesian procedure for efficient ability measurement %A Wood, R. L. %C Unpublished manuscript, University of Chicago %G eng %0 Journal Article %D 1964 %T Feasibility of a programmed testing machine %A Bayroff, A. G. %C US Army Personnel Research Office Research Study 64-3.