TY - JOUR T1 - Optimizing cognitive ability measurement with multidimensional computer adaptive testing JF - International Journal of Testing Y1 - In Press A1 - Makransky, G. A1 - Glas, C. A. W. ER - TY - JOUR T1 - An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model JF - Applied Psychological Measurement Y1 - 2019 A1 - Audrey J. Leroux A1 - J. Kay Waid-Ebbs A1 - Pey-Shan Wen A1 - Drew A. Helmer A1 - David P. Graham A1 - Maureen K. O’Connor A1 - Kathleen Ray AB - The purpose of this simulation study was to investigate the effect of several different item exposure control procedures in computerized adaptive testing (CAT) with variable-length stopping rules using the partial credit model. Previous simulation studies on CAT exposure control methods with polytomous items rarely considered variable-length tests. The four exposure control techniques examined were the randomesque with a group of three items, randomesque with a group of six items, progressive-restricted standard error (PR-SE), and no exposure control. The two variable-length stopping rules included were the SE and predicted standard error reduction (PSER), along with three item pools of varied sizes (43, 86, and 172 items). Descriptive statistics on number of nonconvergent cases, measurement precision, testing burden, item overlap, item exposure, and pool utilization were calculated. Results revealed that the PSER stopping rule administered fewer items on average while maintaining measurement precision similar to the SE stopping rule across the different item pool sizes and exposure controls. The PR-SE exposure control procedure surpassed the randomesque methods by further reducing test overlap, maintaining maximum exposure rates at the target rate or lower, and utilizing all items from the pool with a minimal increase in number of items administered and nonconvergent cases. VL - 43 UR - https://doi.org/10.1177/0146621618824856 ER - TY - JOUR T1 - Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Dongbo Tu A1 - Yuting Han A1 - Yan Cai A1 - Xuliang Gao AB - Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. VL - 42 UR - https://doi.org/10.1177/0146621618762748 ER - TY - JOUR T1 - Measuring patient-reported outcomes adaptively: Multidimensionality matters! JF - Applied Psychological Measurement Y1 - 2018 A1 - Paap, Muirne C. S. A1 - Kroeze, Karel A. A1 - Glas, C. A. W. A1 - Terwee, C. B. A1 - van der Palen, Job A1 - Veldkamp, Bernard P. ER - TY - JOUR T1 - Using Automatic Item Generation to Create Solutions and Rationales for Computerized Formative Testing JF - Applied Psychological Measurement Y1 - 2018 A1 - Mark J. Gierl A1 - Hollis Lai AB - Computerized testing provides many benefits to support formative assessment. However, the advent of computerized formative testing has also raised formidable new challenges, particularly in the area of item development. Large numbers of diverse, high-quality test items are required because items are continuously administered to students. Hence, hundreds of items are needed to develop the banks necessary for computerized formative testing. One promising approach that may be used to address this test development challenge is automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modeling practices are used to produce items with the aid of computer technology. The purpose of this study is to describe a new method for generating both the items and the rationales required to solve the items to produce the required feedback for computerized formative testing. The method for rationale generation is demonstrated and evaluated in the medical education domain. VL - 42 UR - https://doi.org/10.1177/0146621617726788 ER - TY - JOUR T1 - ATS-PD: An Adaptive Testing System for Psychological Disorders JF - Educational and Psychological Measurement Y1 - 2017 A1 - Ivan Donadello A1 - Andrea Spoto A1 - Francesco Sambo A1 - Silvana Badaloni A1 - Umberto Granziol A1 - Giulio Vidotto AB - The clinical assessment of mental disorders can be a time-consuming and error-prone procedure, consisting of a sequence of diagnostic hypothesis formulation and testing aimed at restricting the set of plausible diagnoses for the patient. In this article, we propose a novel computerized system for the adaptive testing of psychological disorders. The proposed system combines a mathematical representation of psychological disorders, known as the “formal psychological assessment,” with an algorithm designed for the adaptive assessment of an individual’s knowledge. The assessment algorithm is extended and adapted to the new application domain. Testing the system on a real sample of 4,324 healthy individuals, screened for obsessive-compulsive disorder, we demonstrate the system’s ability to support clinical testing, both by identifying the correct critical areas for each individual and by reducing the number of posed questions with respect to a standard written questionnaire. VL - 77 UR - https://doi.org/10.1177/0013164416652188 ER - TY - CONF T1 - FastCAT – Customizing CAT Administration Rules to Increase Response Efficiency T2 - IACAT 2017 Conference Y1 - 2017 A1 - Richard C. Gershon KW - Administration Rules KW - Efficiency KW - FastCAT AB -

A typical pre-requisite for CAT administration is the existence of an underlying item bank completely covering the range of the trait being measured. When a bank fails to cover the full range of the trait, examinees who are close to the floor or ceiling will often never achieve a standard error cut-off and examinees will be forced to answer items increasingly less relevant to their trait level. This scenario is fairly typical for many patients responding to patient reported outcome measures (PROMS). For IACAT 2017 ABSTRACTS BOOKLET 65 example, in the assessment of physical functioning, many item banks ceiling at about the 50%ile. For most healthy patients, after a few items the only items remaining in the bank will represent decreasing ability (even though the patient has already indicated that they are at or above the mean for the population). Another example would be for a patient with no pain taking a Pain CAT. They will probably answer “Never” pain for every succeeding item out to the maximum test length. For this project we sought to reduce patient burden, while maintaining test accuracy, through the reduction of CAT length using novel stopping rules.

We studied CAT administration assessment histories for patients who were administered Patient Reported Outcomes Measurement Information System (PROMIS) CATs. In the PROMIS 1 Wave 2 Back Pain/Depression Study, CATs were administered to N=417 cases assessed across 11 PROMIS domains. Original CAT administration rules were: start with a pre-identified item of moderate difficulty; administer a minimum four items per case; stop when an estimated theta’s SE declines to < 0.3 OR a maximum 12 items are administered.

Original CAT. 12,622 CAT administrations were analyzed. CATs ranged in number of items administered from 4 to 12 items; 72.5% were 4-item CATs. The second and third most frequently occurring CATs were 5-item (n=1102; 8.7%) and 12-item CATs (n=964; 7.6%). 64,062 items total were administered, averaging 5.1 items per CAT. Customized CAT. Three new CAT stopping rules were introduced, each with potential to increase item-presentation efficiency and maintain required score precision: Stop if a case responds to the first two items administered using an “extreme” response category (towards the ceiling or floor for the in item bank, or at ); administer a minimum two items per case; stop if the change in SE estimate (previous to current item administration) is positive but < 0.01.

The three new stopping rules reduced the total number of items administered by 25,643 to 38,419 items (40.0% reduction). After four items were administered, only n=1,824 CATs (14.5%) were still in assessment mode (vs. n=3,477 (27.5%) in the original CATs). On average, cases completed 3.0 items per CAT (vs. 5.1).

Each new rule addressed specific inefficiencies in the original CAT administration process: Cases not having or possessing a low/clinically unimportant level of the assessed domain; allow the SE <0.3 stopping criterion to come into effect earlier in the CAT administration process; cases experiencing poor domain item bank measurement, (e.g., “floor,” “ceiling” cases).

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1oPJV-x0p9hRmgJ7t6k-MCC1nAoBSFM1w ER - TY - CONF T1 - Generating Rationales to Support Formative Feedback in Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Mark Gierl A1 - Okan Bulut KW - Adaptive Testing KW - formative feedback KW - Item generation AB -

Computer adaptive testing offers many important benefits to support and promote life-long learning. Computers permit testing on-demand thereby allowing students to take the test at any time during instruction; items on computerized tests are scored immediately thereby providing students with instant feedback; computerized tests permit continuous administration thereby allowing students to have more choice about when they write their exams. But despite these important benefits, the advent of computer adaptive testing has also raised formidable challenges, particularly in the area of item development. Educators must have access to large numbers of diverse, high-quality test items to implement computerize adaptive testing because items are continuously administered to students. Hence, hundreds or even thousands of items are needed to develop the test item banks necessary for computer adaptive testing. Unfortunately, educational test items, as they are currently created, are time consuming and expensive to develop because each individual item is written, initially, by a content specialist and, then, reviewed, edited, and revised by groups of content specialists to ensure the items yield reliable and valid information. Hence, item development is one of the most important problems that must be solved before we can migrate to computer adaptive testing to support life-long learning because large numbers of high-quality, content-specific, test items are required.

One promising item development method that may be used to address this challenge is with automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modelling practices are used produce hundreds of new test items with the aid of computer technology. The purpose of our presentation is to describe a new methodology for generating both the items and the rationales required to solve each generated item in order to produce the feedback needed to support life-long learning. Our item generation methodology will first be described. To ensure our description is practical, the method will also be demonstrated using generated items from the health sciences to demonstrate how item generation can promote life-long learning for medical educators and practitioners.

 

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1O5KDFtQlDLvhNoDr7X4JO4arpJkIHKUP ER - TY - JOUR T1 - The Information Product Methods: A Unified Approach to Dual-Purpose Computerized Adaptive Testing JF - Applied Psychological MeasurementApplied Psychological Measurement Y1 - 2017 A1 - Zheng, Chanjin A1 - He, Guanrui A1 - Gao, Chunlei AB - This article gives a brief summary of major approaches in dual-purpose computerized adaptive testing (CAT) in which the test is tailored interactively to both an examinee?s overall ability level, ?, and attribute mastery level, α. It also proposes an information product approach whose connections to the current methods are revealed. An updated comprehensive empirical study demonstrated that the information product approach not only can offer a unified framework to connect all other approaches but also can mitigate the weighting issue in the dual-information approach. VL - 42 SN - 0146-6216 UR - https://doi.org/10.1177/0146621617730392 IS - 4 JO - Applied Psychological Measurement ER - TY - CONF T1 - Issues in Trait Range Coverage for Patient Reported Outcome Measure CATs - Extending the Ceiling for Above-average Physical Functioning T2 - IACAT 2017 Conference Y1 - 2017 A1 - Richard C. Gershon KW - CAT KW - Issues KW - Patient Reported Outcome AB -

The use of a measure which fails to cover the upper range of functioning may produce results which can lead to serious misinterpretation. Scores produced by such a measure may fail to recognize significant improvement, or may not be able to demonstrate functioning commensurate with an important milestone. Accurate measurement of this range is critical for the assessment of physically active adults, e.g., athletes recovering from injury and active military personnel who wish to return to active service. Alternatively, a PF measure with a low ceiling might fail to differentiate patients in rehabilitation who continue to improve, but for whom their score ceilings due to the measurement used.

The assessment of physical function (PF) has greatly benefited from modern psychometric theory and resulting scales, such as the Patient-Reported Outcomes Measurement Information System (PROMIS®) PF instruments. While PROMIS PF has extended the range of function upwards relative to older “legacy” instruments, few PROMIS PF items asses high levels of function. We report here on the development of higher functioning items for the PROMIS PF bank.

An expert panel representing orthopedics, sports/military medicine, and rehabilitation reviewed existing instruments and wrote new items. After internal review, cognitive interviews were conducted with 24 individuals of average and high levels of physical function. The remaining candidate items were administered along with 50 existing PROMIS anchor items to an internet panel screened for low, average, and high levels of physical function (N = 1,600), as well as members of Boston-area gyms (N= 344). The resulting data was subjected to standard psychometric analysis, along with multiple linking methods to place the new items on the existing PF metric. The new items were added to the full PF bank for simulated computerized adaptive testing (CAT).

Item response data was collected on 54 candidate items. Items that exhibited local dependence (LD) or differential item functioning (DIF) related to gender, age, race, education, or PF status. These items were removed from consideration. Of the 50 existing PROMIS PF items, 31 were free of DIF and LD and used as anchors. The parameters for the remaining new candidate items were estimated twice: freelyestimated and linked with coefficients and fixed-anchor calibration. Both methods were comparable and had appropriate fit. The new items were added to the full PF bank for simulated CATs. The resulting CAT was able to extend the ceiling with high precision to a T-score of 68, suggesting accurate measurement for 97% of the general population.

Extending the range of items by which PF is measured will substantially improve measurement quality, applicability, and efficiency. The bank has incorporated these extension items and is available for use in research and clinics for brief CAT administration (see www.healthmeasures.net). Future research projects should focus on recovery trajectories of the measure for individuals with above average function who are recovering from injury.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1ZC02F-dIyYovEjzpeuRdoXDiXMLFRuKb ER - TY - CONF T1 - Item Parameter Drifting and Online Calibration T2 - IACAT 2017 Conference Y1 - 2017 A1 - Hua-Hua Chang A1 - Rui Guo KW - online calibration KW - Parameter Drift AB -

Item calibration is a part of the most important topics in item response theory (IRT). Since many largescale testing programs have switched from paper and pencil (P&P) testing mode to computerized adaptive testing (CAT) mode, developing methods for efficiently calibrating new items have become vital. Among many proposed item calibration processes in CAT, online calibration is the most cost-effective. This presentation introduces an online (re)calibration design to detect item parameter drift for computerized adaptive testing (CAT) in both unidimensional and multidimensional environments. Specifically, for online calibration optimal design in unidimensional computerized adaptive testing model, a two-stage design is proposed by implementing a proportional density index algorithm. For a multidimensional computerized adaptive testing model, a four-quadrant online calibration pretest item selection design with proportional density index algorithm is proposed. Comparisons were made between different online calibration item selection strategies. Results showed that under unidimensional computerized adaptive testing, the proposed modified two-stage item selection criterion with the proportional density algorithm outperformed the other existing methods in terms of item parameter calibration and item parameter drift detection, and under multidimensional computerized adaptive testing, the online (re)calibration technique with the proposed four-quadrant item selection design with proportional density index outperformed other methods.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Multi-stage Testing for a Multi-disciplined End-of primary-school Test T2 - IACAT 2017 Conference Y1 - 2017 A1 - Hendrik Straat A1 - Maaike van Groen A1 - Wobbe Zijlstra A1 - Marie-Anne Keizer-Mittelhaëuser A1 - Michel Lamoré KW - mst KW - Multidisciplined KW - proficiency AB -

The Dutch secondary education system consists of five levels: basic, lower, and middle vocational education, general secondary education, and pre-academic education. The individual decision for level of secondary education is based on a combination of the teacher’s judgment and an end-of-primaryschool placement test.

This placement test encompasses the measurement of reading, language, mathematics and writing; each skill consisting of one to four subdomains. The Dutch end-of-primaryschool test is currently administered in two linear 200-item paper-based versions. The two versions differ in difficulty so as to motivate both less able and more able students, and measure both groups of students precisely. The primary goal of the test is providing a placement advice for five levels of secondary education. The secondary goal is the assessment of six different fundamental reference levels defined on reading, language, and mathematics. Because of the high stakes advice of the test, the Dutch parliament has instructed to change the format to a multistage test. A major advantage of multistage testing is that the tailoring of the tests is more strongly related to the ability of the students than to the teacher’s judgment. A separate multistage test is under development for each of the three skills measured by the reference levels to increase the classification accuracy for secondary education placement and to optimally measure the performance on the reference-level-related skills.

This symposium consists of three presentations discussing the challenges in transitioning from a linear paper-based test to a computer-based multistage test within an existing curriculum and the specification of the multistage test to meet the measurement purposes. The transitioning to a multistage test has to improve both classification accuracy and measurement precision.

First, we describe the Dutch educational system and the role of the end-of-primary-school placement test within this system. Special attention will be paid to the advantages of multistage testing over both linear testing and computerized adaptive testing, and on practical implications related to the transitioning from a linear to a multistage test.

Second, we discuss routing and reporting on the new multi-stage test. Both topics have a major impact on the quality of the placement advice and the reference mastery decisions. Several methods for routing and reporting are compared.

Third, the linear test contains 200 items to cover a broad range of different skills and to obtain a precise measurement of those skills separately. Multistage testing creates opportunities to reduce the cognitive burden for the students while maintaining the same quality of placement advice and assessment of mastering of reference levels. This presentation focuses on optimal allocation of items to test modules, optimal number of stages and modules per stage and test length reduction.

Session Video 1

Session Video 2

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1C5ys178p_Wl9eemQuIsI56IxDTck2z8P ER - TY - CONF T1 - New Challenges (With Solutions) and Innovative Applications of CAT T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chun Wang A1 - David J. Weiss A1 - Xue Zhang A1 - Jian Tao A1 - Yinhong He A1 - Ping Chen A1 - Shiyu Wang A1 - Susu Zhang A1 - Haiyan Lin A1 - Xiaohong Gao A1 - Hua-Hua Chang A1 - Zhuoran Shang KW - CAT KW - challenges KW - innovative applications AB -

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa ER - TY - JOUR T1 - Investigation of Response Changes in the GRE Revised General Test JF - Educational and Psychological Measurement Y1 - 2015 A1 - Liu, Ou Lydia A1 - Bridgeman, Brent A1 - Gu, Lixiong A1 - Xu, Jun A1 - Kong, Nan AB - Research on examinees’ response changes on multiple-choice tests over the past 80 years has yielded some consistent findings, including that most examinees make score gains by changing answers. This study expands the research on response changes by focusing on a high-stakes admissions test—the Verbal Reasoning and Quantitative Reasoning measures of the GRE revised General Test. We analyzed data from 8,538 examinees for Quantitative and 9,140 for Verbal sections who took the GRE revised General Test in 12 countries. The analyses yielded findings consistent with prior research. In addition, as examinees’ ability increases, the benefit of response changing increases. The study yielded significant implications for both test agencies and test takers. Computer adaptive tests often do not allow the test takers to review and revise. Findings from this study confirm the benefit of such features. VL - 75 UR - http://epm.sagepub.com/content/75/6/1002.abstract ER - TY - JOUR T1 - A Comparison of Multi-Stage and Linear Test Designs for Medium-Size Licensure and Certification Examinations JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Brossman, Bradley. G. A1 - Guille, R.A. VL - 2 IS - 2 ER - TY - JOUR T1 - The applicability of multidimensional computerized adaptive testing to cognitive ability measurement in organizational assessment JF - International Journal of Testing Y1 - 2013 A1 - Makransky, G. A1 - Glas, C. A. W. VL - 13 IS - 2 ER - TY - JOUR T1 - The Applicability of Multidimensional Computerized Adaptive Testing for Cognitive Ability Measurement in Organizational Assessment JF - International Journal of Testing Y1 - 2013 A1 - Makransky, Guido A1 - Glas, Cees A. W. VL - 13 UR - http://www.tandfonline.com/doi/abs/10.1080/15305058.2012.672352 ER - TY - JOUR T1 - Balancing Flexible Constraints and Measurement Precision in Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2012 A1 - Moyer, Eric L. A1 - Galindo, Jennifer L. A1 - Dodd, Barbara G. AB -

Managing test specifications—both multiple nonstatistical constraints and flexibly defined constraints—has become an important part of designing item selection procedures for computerized adaptive tests (CATs) in achievement testing. This study compared the effectiveness of three procedures: constrained CAT, flexible modified constrained CAT, and the weighted penalty model in balancing multiple flexible constraints and maximizing measurement precision in a fixed-length CAT. The study also addressed the effect of two different test lengths—25 items and 50 items—and of including or excluding the randomesque item exposure control procedure with the three methods, all of which were found effective in selecting items that met flexible test constraints when used in the item selection process for longer tests. When the randomesque method was included to control for item exposure, the weighted penalty model and the flexible modified constrained CAT models performed better than did the constrained CAT procedure in maintaining measurement precision. When no item exposure control method was used in the item selection process, no practical difference was found in the measurement precision of each balancing method.

VL - 72 UR - http://epm.sagepub.com/content/72/4/629.abstract ER - TY - JOUR T1 - Comparison Between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test JF - Educational and Psychological Measurement Y1 - 2012 A1 - Jiao, H. A1 - Liu, J. A1 - Haynie, K. A1 - Woo, A. A1 - Gorham, J. AB -

This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test settings were explored in one real data analysis and two simulation studies when two different polytomous scoring algorithms, automated polytomous scoring and rater-generated polytomous scoring, were applied. For the real data analyses, the ability estimates from dichotomous and polytomous scoring were highly correlated; the classification consistency between different scoring algorithms was nearly perfect. Information distribution changed slightly in the operational item bank. In the two simulation studies comparing each polytomous scoring with dichotomous scoring, the ability estimates resulting from polytomous scoring had slightly higher measurement precision than those resulting from dichotomous scoring. The practical impact related to classification decision was minor because of the extremely small number of items that could be scored polytomously in this current study.

VL - 72 ER - TY - JOUR T1 - Development of a computerized adaptive test for depression JF - Archives of General Psychiatry Y1 - 2012 A1 - Robert D. Gibbons A1 - David .J. Weiss A1 - Paul A. Pilkonis A1 - Ellen Frank A1 - Tara Moore A1 - Jong Bae Kim A1 - David J. Kupfer VL - 69 UR - WWW.ARCHGENPSYCHIATRY.COM IS - 11 ER - TY - JOUR T1 - Improving personality facet scores with multidimensional computerized adaptive testing: An illustration with the NEO PI-R JF - Assessment Y1 - 2012 A1 - Makransky, G. A1 - Mortensen, E. L. A1 - Glas, C. A. W. ER - TY - JOUR T1 - A Comment on Early Student Blunders on Computer-Based Adaptive Tests JF - Applied Psychological Measurement Y1 - 2011 A1 - Green, Bert F. AB -

This article refutes a recent claim that computer-based tests produce biased scores for very proficient test takers who make mistakes on one or two initial items and that the ‘‘bias’’ can be reduced by using a four-parameter IRT model. Because the same effect occurs with pattern scores on nonadaptive tests, the effect results from IRT scoring, not from adaptive testing. Because very proficient test takers rarely err on items of middle difficulty, the so-called bias is one of selective data analysis. Furthermore, the apparently large score penalty for one error on an otherwise perfect response pattern is shown to result from the relative stretching of the IRT scale at very high and very low proficiencies. The recommended use of a four-parameter IRT model is shown to have drawbacks.

VL - 35 UR - http://apm.sagepub.com/content/35/2/165.abstract ER - TY - JOUR T1 - A Comment on Early Student Blunders on Computer-Based Adaptive Tests JF - Applied Psychological Measurement Y1 - 2011 A1 - Green, B. F. AB -

This article refutes a recent claim that computer-based tests produce biased scores for very proficient test takers who make mistakes on one or two initial items and that they can be reduced by using a four-parameter IRT model. Because the same effect occurs with pattern scores on nonadaptive tests, the effect results from IRT scoring, not from adaptive testing. Because very proficient test takers rarely err on items of middle difficulty, the so-called bias is one of selective data analysis. Furthermore, the apparently large score penalty for one error on an otherwise perfect response pattern is shown to result from the relative stretching of the IRT scale at very high and very low proficiencies. The recommended use of a four-parameter IRT model is shown to have drawbacks.

VL - 35 IS - 2 ER - TY - JOUR T1 - Computer adaptive testing for small scale programs and instructional systems JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Rudner, L. M. A1 - Guo, F. AB -

This study investigates measurement decision theory (MDT) as an underlying model for computer adaptive testing when the goal is to classify examinees into one of a finite number of groups. The first analysis compares MDT with a popular item response theory model and finds little difference in terms of the percentage of correct classifications. The second analysis examines the number of examinees needed to calibrate MDT item parameters and finds accurate classifications even with calibration sample sizes as small as 100 examinees.

VL - 12 IS - 1 ER - TY - JOUR T1 - Computerized adaptive assessment of personality disorder: Introducing the CAT–PD project JF - Journal of Personality Assessment Y1 - 2011 A1 - Simms, L. J. A1 - Goldberg, L .R. A1 - Roberts, J. E. A1 - Watson, D. A1 - Welte, J. A1 - Rotterman, J. H. AB - Assessment of personality disorders (PD) has been hindered by reliance on the problematic categorical model embodied in the most recent Diagnostic and Statistical Model of Mental Disorders (DSM), lack of consensus among alternative dimensional models, and inefficient measurement methods. This article describes the rationale for and early results from a multiyear study funded by the National Institute of Mental Health that was designed to develop an integrative and comprehensive model and efficient measure of PD trait dimensions. To accomplish these goals, we are in the midst of a 5-phase project to develop and validate the model and measure. The results of Phase 1 of the project—which was focused on developing the PD traits to be assessed and the initial item pool—resulted in a candidate list of 59 PD traits and an initial item pool of 2,589 items. Data collection and structural analyses in community and patient samples will inform the ultimate structure of the measure, and computerized adaptive testing will permit efficient measurement of the resultant traits. The resultant Computerized Adaptive Test of Personality Disorder (CAT–PD) will be well positioned as a measure of the proposed DSM–5 PD traits. Implications for both applied and basic personality research are discussed. VL - 93 SN - 0022-3891 ER - TY - JOUR T1 - Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy JF - Physical & Occupational Therapy in Pediatrics Y1 - 2011 A1 - Montpetit, K. A1 - Haley, S. A1 - Bilodeau, N. A1 - Ni, P. A1 - Tian, F. A1 - Gorton, G., 3rd A1 - Mulcahey, M. J. AB - This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function. VL - 31 SN - 1541-3144 (Electronic)0194-2638 (Linking) N1 - Montpetit, KathleenHaley, StephenBilodeau, NathalieNi, PengshengTian, FengGorton, George 3rdMulcahey, M JEnglandPhys Occup Ther Pediatr. 2011 Feb;31(1):90-102. Epub 2010 Oct 13. JO - Phys Occup Ther Pediatr ER - TY - ABST T1 - Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients Y1 - 2011 A1 - Giesinger, J. M. A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Arraras, J. I. A1 - Conroy, T. A1 - Gamper, E. M. A1 - Kemmler, G. A1 - King, M. T. A1 - Oberguggenberger, A. S. A1 - Velikova, G. A1 - Young, T. A1 - Holzner, B. A1 - Eortc-Qlg, E. O. AB - ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. JF - Health and Quality of Life Outcomes VL - 9 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Health Qual Life Outcomes. 2011 Mar 29;9(1):19. ER - TY - JOUR T1 - Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Jacobsen, J. A1 - Ackermann, R. A1 - Egüez, J. A1 - Ganguli, D. A1 - Rickard, P. A1 - Taylor, L. AB -

A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.

VL - 12 UR - http://www.testpublishers.org/journal-of-applied-testing-technology IS - 1 ER - TY - CONF T1 - Item Selection Methods based on Multiple Objective Approaches for Classification of Respondents into Multiple Levels T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Maaike van Groen A1 - Theo Eggen A1 - Bernard Veldkamp KW - adaptive classification test KW - CAT KW - item selection KW - sequential classification test AB -

Is it possible to develop new item selection methods which take advantage of the fact that we want to classify into multiple categories? New methods: Taking multiple points on the ability scale into account; Based on multiple objective approaches.

Conclusions

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - A New Stopping Rule for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2011 A1 - Choi, Seung W. A1 - Grady, Matthew W. A1 - Dodd, Barbara G. AB -

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing (CAT). The predicted standard error reduction (PSER) stopping rule uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared with that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant.

VL - 71 UR - http://epm.sagepub.com/content/71/1/37.abstract ER - TY - JOUR T1 - Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores JF - Educational and Psychological Measurement Y1 - 2011 A1 - Gnambs, Timo A1 - Batinic, Bernad AB -

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification tests for two- and three-group scenarios (e.g., inferior, mediocre, and superior proficiencies). Results of two simulation experiments with generated and real responses (N = 2,000) to established personality scales of different length (12, 20, or 29 items) demonstrate that adaptive item presentations significantly reduce the number of items required to make such classification decisions while maintaining a consistent classification accuracy. Furthermore, the simulations highlight the importance of the selected test termination criterion, which has a significant impact on the average test length.

VL - 71 UR - http://epm.sagepub.com/content/71/6/1006.abstract ER - TY - JOUR T1 - Unproctored Internet test verification: Using adaptive confirmation testing JF - Organizational Research Methods Y1 - 2011 A1 - Makransky, G. A1 - Glas, C. A. W. VL - 14 ER - TY - CONF T1 - Walking the Tightrope: Using Better Content Control to Improve CAT T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Kathleen A. Gialluca KW - CAT KW - CAT evolution KW - test content AB -

All testing involves a balance between measurement precision and content considerations. CAT item-selection algorithms have evolved to accommodate content considerations. Reviews CAT evolution including: Original/”Pure” adaptive exams, Constrained CAT, Weighted-deviations method, Shadow-Test Approach, Testlets instead of fully adapted tests, Administration of one item may preclude the administration of other item(s), and item relationships.

Research Questions

 

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CHAP T1 - Adaptive Mastery Testing Using a Multidimensional IRT Model T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Glas, C. A. W. A1 - Vos, H. J. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - An automatic online calibration design in adaptive testing JF - Journal of Applied Testing Technology Y1 - 2010 A1 - Makransky, G. A1 - Glas, C. A. W. VL - 11 ER - TY - JOUR T1 - Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments JF - Sleep Y1 - 2010 A1 - Buysse, D. J. A1 - Yu, L. A1 - Moul, D. E. A1 - Germain, A. A1 - Stover, A. A1 - Dodds, N. E. A1 - Johnston, K. L. A1 - Shablesky-Cade, M. A. A1 - Pilkonis, P. A. KW - *Outcome Assessment (Health Care) KW - *Self Disclosure KW - Adult KW - Aged KW - Aged, 80 and over KW - Cross-Sectional Studies KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Psychometrics KW - Questionnaires KW - Reproducibility of Results KW - Sleep Disorders/*diagnosis KW - Young Adult AB - STUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions. VL - 33 SN - 0161-8105 (Print)0161-8105 (Linking) N1 - Buysse, Daniel JYu, LanMoul, Douglas EGermain, AnneStover, AngelaDodds, Nathan EJohnston, Kelly LShablesky-Cade, Melissa APilkonis, Paul AAR052155/AR/NIAMS NIH HHS/United StatesU01AR52155/AR/NIAMS NIH HHS/United StatesU01AR52158/AR/NIAMS NIH HHS/United StatesU01AR52170/AR/NIAMS NIH HHS/United StatesU01AR52171/AR/NIAMS NIH HHS/United StatesU01AR52177/AR/NIAMS NIH HHS/United StatesU01AR52181/AR/NIAMS NIH HHS/United StatesU01AR52186/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralValidation StudiesUnited StatesSleepSleep. 2010 Jun 1;33(6):781-92. U2 - 2880437 ER - TY - JOUR T1 - Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension JF - Quality of Life Research Y1 - 2010 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Chie, W. C. A1 - Conroy, T. A1 - Costantini, A. A1 - Fayers, P. A1 - Helbostad, J. A1 - Holzner, B. A1 - Kaasa, S. A1 - Singer, S. A1 - Velikova, G. A1 - Young, T. AB - PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. VL - 20 SN - 1573-2649 (Electronic)0962-9343 (Linking) N1 - Qual Life Res. 2010 Oct 23. ER - TY - BOOK T1 - Elements of Adaptive Testing Y1 - 2010 A1 - van der Linden, W. J. A1 - Glas, C. A. W. PB - Springer CY - New York ER - TY - CHAP T1 - Estimation of the Parameters in an Item-Cloning Model for Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Glas, C. A. W. A1 - van der Linden, W. J. A1 - Geerlings, H. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Item Parameter Estimation and Item Fit Analysis T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Glas, C. A. W. JF - Elements of Adaptive Testing ER - TY - COMP T1 - Manual for CATSim: Comprehensive simulation of computerized adaptive testing Y1 - 2010 A1 - Weiss, D. J. A1 - Guyer, R. D. PB - Assessment Systems Corporation CY - St. Paul, MN ER - TY - JOUR T1 - Marginal likelihood inference for a model for item responses and response times JF - British Journal of Mathematical and Statistical Psychology Y1 - 2010 A1 - Glas, C. A. W. A1 - van der Linden, W. J. AB -

Marginal maximum-likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first-level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.

VL - 63 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Glas, Cees A Wvan der Linden, Wim JResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2010 Nov;63(Pt 3):603-26. Epub 2010 Jan 28. ER - TY - JOUR T1 - A new stopping rule for computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 2010 A1 - Choi, S. W. A1 - Grady, M. W. A1 - Dodd, B. G. AB - The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. VL - 70 SN - 0013-1644 (Print)0013-1644 (Linking) N1 - U01 AR052177-04/NIAMS NIH HHS/Educ Psychol Meas. 2010 Dec 1;70(6):1-17. U2 - 3028267 ER - TY - CHAP T1 - Testlet-Based Adaptive Mastery Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Vos, H. J. A1 - Glas, C. A. W. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research JF - Journal of Applied Measurement Y1 - 2010 A1 - Gershon, R. C. A1 - Rothrock, N. A1 - Hanrahan, R. A1 - Bass, M. A1 - Cella, D. AB - The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed as one of the first projects funded by the NIH Roadmap for Medical Research Initiative to re-engineer the clinical research enterprise. The primary goal of PROMIS is to build item banks and short forms that measure key health outcome domains that are manifested in a variety of chronic diseases which could be used as a "common currency" across research projects. To date, item banks, short forms and computerized adaptive tests (CAT) have been developed for 13 domains with relevance to pediatric and adult subjects. To enable easy delivery of these new instruments, PROMIS built a web-based resource (Assessment Center) for administering CATs and other self-report data, tracking item and instrument development, monitoring accrual, managing data, and storing statistical analysis results. Assessment Center can also be used to deliver custom researcher developed content, and has numerous features that support both simple and complicated accrual designs (branching, multiple arms, multiple time points, etc.). This paper provides an overview of the development of the PROMIS item banks and details Assessment Center functionality. VL - 11 SN - 1529-7713 ER - TY - ABST T1 - Validation of a computer-adaptive test to evaluate generic health-related quality of life Y1 - 2010 A1 - Rebollo, P. A1 - Castejon, I. A1 - Cuervo, J. A1 - Villa, G. A1 - Garcia-Cueto, E. A1 - Diaz-Cuervo, H. A1 - Zardain, P. C. A1 - Muniz, J. A1 - Alonso, J. AB - BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. JF - Health and Quality of Life Outcomes VL - 8 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Rebollo, PabloCastejon, IgnacioCuervo, JesusVilla, GuillermoGarcia-Cueto, EduardoDiaz-Cuervo, HelenaZardain, Pilar CMuniz, JoseAlonso, JordiSpanish CAT-Health Research GroupEnglandHealth Qual Life Outcomes. 2010 Dec 3;8:147. U2 - 3022567 ER - TY - CHAP T1 - Applications of CAT in admissions to higher education in Israel: Twenty-two years of experience Y1 - 2009 A1 - Gafni, N. A1 - Cohen, Y. A1 - Roded, K A1 - Baumer, M A1 - Moshinsky, A. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 326 KB} ER - TY - CHAP T1 - Assessing the equivalence of Internet-based vs. paper-and-pencil psychometric tests. Y1 - 2009 A1 - Baumer, M A1 - Roded, K A1 - Gafni, N. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF File, 142 K ER - TY - JOUR T1 - Constraint-weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control JF - Educational and Psychological Measurement Y1 - 2009 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Douglas, J. A1 - Guo, F. VL - 69 ER - TY - CHAP T1 - Developing item variants: An empirical study Y1 - 2009 A1 - Wendt, A. A1 - Kao, S. A1 - Gorham, J. A1 - Woo, A. AB - Large-scale standardized test have been widely used for educational and licensure testing. In computerized adaptive testing (CAT), one of the practical concerns for maintaining large-scale assessments is to ensure adequate numbers of high-quality items that are required for item pool functioning. Developing items at specific difficulty levels and for certain areas of test plans is a wellknown challenge. The purpose of this study was to investigate strategies for varying items that can effectively generate items at targeted difficulty levels and specific test plan areas. Each variant item generation model was developed by decomposing selected source items possessing ideal measurement properties and targeting the desirable content domains. 341 variant items were generated from 72 source items. Data were collected from six pretest periods. Items were calibrated using the Rasch model. Initial results indicate that variant items showed desirable measurement properties. Additionally, compared to an average of approximately 60% of the items passing pretest criteria, an average of 84% of the variant items passed the pretest criteria. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 194 KB} ER - TY - JOUR T1 - Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis JF - Rehabilitation Psychology Y1 - 2009 A1 - Forkmann, T. A1 - Boecker, M. A1 - Norra, C. A1 - Eberle, N. A1 - Kircher, T. A1 - Schauerte, P. A1 - Mischke, K. A1 - Westhofen, M. A1 - Gauggel, S. A1 - Wirtz, M. KW - Adaptation, Psychological KW - Adult KW - Aged KW - Depressive Disorder/*diagnosis/psychology KW - Diagnosis, Computer-Assisted KW - Female KW - Heart Diseases/*psychology KW - Humans KW - Male KW - Mental Disorders/*psychology KW - Middle Aged KW - Models, Statistical KW - Otorhinolaryngologic Diseases/*psychology KW - Personality Assessment/statistics & numerical data KW - Personality Inventory/*statistics & numerical data KW - Psychometrics/statistics & numerical data KW - Questionnaires KW - Reproducibility of Results KW - Sick Role AB - OBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. VL - 54 SN - 0090-5550 (Print)0090-5550 (Linking) N1 - Forkmann, ThomasBoecker, MarenNorra, ChristineEberle, NicoleKircher, TiloSchauerte, PatrickMischke, KarlWesthofen, MartinGauggel, SiegfriedWirtz, MarkusResearch Support, Non-U.S. Gov'tUnited StatesRehabilitation psychologyRehabil Psychol. 2009 May;54(2):186-97. ER - TY - CHAP T1 - Effect of early misfit in computerized adaptive testing on the recovery of theta Y1 - 2009 A1 - Guyer, R. D. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 212 KB} ER - TY - CHAP T1 - Guess what? Score differences with rapid replies versus omissions on a computerized adaptive test Y1 - 2009 A1 - Talento-Miller, E. A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 215 KB} ER - TY - CHAP T1 - Limiting item exposure for target difficulty ranges in a high-stakes CAT Y1 - 2009 A1 - Li, X. A1 - Becker, K. A1 - Gorham, J. A1 - Woo, A. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. {PDF File, 1. N1 - MB} ER - TY - JOUR T1 - Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. A1 - Hambleton, R. K. A1 - Montpetit, K. A1 - Bilodeau, N. A1 - Gorton, G. E. A1 - Watson, K. A1 - Tucker, C. A. KW - *Computer Simulation KW - *Health Status KW - *Models, Statistical KW - Adaptation, Psychological KW - Adolescent KW - Cerebral Palsy/*physiopathology KW - Child KW - Child, Preschool KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Massachusetts KW - Pennsylvania KW - Questionnaires KW - Young Adult AB - PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AHambleton, Ronald KMontpetit, KathleenBilodeau, NathalieGorton, George EWatson, KyleTucker, Carole AK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 Apr;18(3):359-70. Epub 2009 Feb 17. U2 - 2692519 ER - TY - CHAP T1 - Quantifying the impact of compromised items in CAT Y1 - 2009 A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 438 KB} ER - TY - CHAP T1 - Using automatic item generation to address item demands for CAT Y1 - 2009 A1 - Lai, H. A1 - Alves, C. A1 - Gierl, M. J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 320 KB} ER - TY - JOUR T1 - Validation of the MMPI-2 computerized adaptive version (MMPI-2-CA) in a correctional intake facility JF - Psychological Services Y1 - 2009 A1 - Forbey, J. D. A1 - Ben-Porath, Y. S. A1 - Gartland, D. AB - Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. The time savings afforded by this technique could be of particular benefit in settings where large numbers of psychological screenings are conducted, such as correctional facilities. In the current study, item and time savings, as well as the test–retest and extratest correlations associated with an audio augmented administration of all the scales of the Minnesota Multiphasic Personality Inventory (MMPI)-2 Computerized Adaptive (MMPI-2-CA) are reported. Participants include 366 men, ages 18 to 62 years (M = 33.04, SD = 10.40), undergoing intake into a large Midwestern state correctional facility. Results of the current study indicate considerable item and corresponding time savings for the MMPI-2-CA compared to conventional administration of the test, as well as comparability in terms of test–retest and correlations with external measures. Future directions of adaptive personality testing are discussed. VL - 6 SN - 1939-148X ER - TY - JOUR T1 - CAT-MD: Computerized adaptive testing on mobile devices JF - International Journal of Web-Based Learning and Teaching Technologies Y1 - 2008 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 3 ER - TY - JOUR T1 - Computer Adaptive-Attribute Testing A New Approach to Cognitive Diagnostic Assessment JF - Zeitschrift für Psychologie / Journal of Psychology Y1 - 2008 A1 - Gierl, M. J. A1 - Zhou, J. KW - cognition and assessment KW - cognitive diagnostic assessment KW - computer adaptive testing AB -

The influence of interdisciplinary forces stemming from developments in cognitive science,mathematical statistics, educational
psychology, and computing science are beginning to appear in educational and psychological assessment. Computer adaptive-attribute testing (CA-AT) is one example. The concepts and procedures in CA-AT can be found at the intersection between computer adaptive testing and cognitive diagnostic assessment. CA-AT allows us to fuse the administrative benefits of computer adaptive testing with the psychological benefits of cognitive diagnostic assessment to produce an innovative psychologically-based adaptive testing approach. We describe the concepts behind CA-AT as well as illustrate how it can be used to promote formative, computer-based, classroom assessment.

VL - 216 IS - 1 ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Haley, S. M. A1 - Gandek, B. A1 - Siebens, H. A1 - Black-Schaffer, R. M. A1 - Sinclair, S. J. A1 - Tao, W. A1 - Coster, W. J. A1 - Ni, P. A1 - Jette, A. M. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Haley, Stephen MGandek, BarbaraSiebens, HilaryBlack-Schaffer, Randie MSinclair, Samuel JTao, WeiCoster, Wendy JNi, PengshengJette, Alan MK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01/HD/NICHD NIH HHS/United StatesR01 HD043568/HD/NICHD NIH HHS/United StatesR01 HD043568-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Feb;89(2):275-83. U2 - 2666330 ER - TY - BOOK T1 - Effect of early misfit in computerized adaptive testing on the recovery of theta Y1 - 2008 A1 - Guyer, R. D. CY - Unpublished Ph.D. dissertation, University of Minnesota, Minneapolis MN. N1 - {PDF file, 1,004 KB} ER - TY - JOUR T1 - Using computerized adaptive testing to reduce the burden of mental health assessment JF - Psychiatric Services Y1 - 2008 A1 - Gibbons, R. D. A1 - Weiss, D. J. A1 - Kupfer, D. J. A1 - Frank, E. A1 - Fagiolini, A. A1 - Grochocinski, V. J. A1 - Bhaumik, D. K. A1 - Stover, A. A1 - Bock, R. D. A1 - Immekus, J. C. KW - *Diagnosis, Computer-Assisted KW - *Questionnaires KW - Adolescent KW - Adult KW - Aged KW - Agoraphobia/diagnosis KW - Anxiety Disorders/diagnosis KW - Bipolar Disorder/diagnosis KW - Female KW - Humans KW - Male KW - Mental Disorders/*diagnosis KW - Middle Aged KW - Mood Disorders/diagnosis KW - Obsessive-Compulsive Disorder/diagnosis KW - Panic Disorder/diagnosis KW - Phobic Disorders/diagnosis KW - Reproducibility of Results KW - Time Factors AB - OBJECTIVE: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. METHODS: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. RESULTS: Tests of competing models based on item response theory supported the scale's bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients--one with bipolar disorder and one without--on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. CONCLUSIONS: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden. VL - 59 SN - 1075-2730 (Print) N1 - Gibbons, Robert DWeiss, David JKupfer, David JFrank, EllenFagiolini, AndreaGrochocinski, Victoria JBhaumik, Dulal KStover, AngelaBock, R DarrellImmekus, Jason CR01-MH-30915/MH/United States NIMHR01-MH-66302/MH/United States NIMHResearch Support, N.I.H., ExtramuralUnited StatesPsychiatric services (Washington, D.C.)Psychiatr Serv. 2008 Apr;59(4):361-8. ER - TY - CHAP T1 - CAT Security: A practitioner’s perspective Y1 - 2007 A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 104 KB} ER - TY - CHAP T1 - Computerized adaptive testing with the bifactor model Y1 - 2007 A1 - Weiss, D. J. A1 - Gibbons, R. D. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 159 KB} ER - TY - CHAP T1 - Computerized attribute-adaptive testing: A new computerized adaptive testing approach incorporating cognitive psychology Y1 - 2007 A1 - Zhou, J. A1 - Gierl, M. J. A1 - Cui, Y. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 296 KB} ER - TY - JOUR T1 - The design and evaluation of a computerized adaptive test on mobile devices JF - Computers & Education Y1 - 2007 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 49. ER - TY - CHAP T1 - Designing optimal item pools for computerized adaptive tests with Sympson-Hetter exposure control Y1 - 2007 A1 - Gu, L. A1 - Reckase, M. D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing N1 - 3 MB} ER - TY - JOUR T1 - The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Cella, D. A1 - Gershon, R. C. A1 - Lai, J-S. A1 - Choi, S. W. AB - The use of item banks and computerized adaptive testing (CAT) begins with clear definitions of important outcomes, and references those definitions to specific questions gathered into large and well-studied pools, or “banks” of items. Items can be selected from the bank to form customized short scales, or can be administered in a sequence and length determined by a computer programmed for precision and clinical relevance. Although far from perfect, such item banks can form a common definition and understanding of human symptoms and functional problems such as fatigue, pain, depression, mobility, social function, sensory function, and many other health concepts that we can only measure by asking people directly. The support of the National Institutes of Health (NIH), as witnessed by its cooperative agreement with measurement experts through the NIH Roadmap Initiative known as PROMIS (www.nihpromis.org), is a big step in that direction. Our approach to item banking and CAT is practical; as focused on application as it is on science or theory. From a practical perspective, we frequently must decide whether to re-write and retest an item, add more items to fill gaps (often at the ceiling of the measure), re-test a bank after some modifications, or split up a bank into units that are more unidimensional, yet less clinically relevant or complete. These decisions are not easy, and yet they are rarely unforgiving. We encourage people to build practical tools that are capable of producing multiple short form measures and CAT administrations from common banks, and to further our understanding of these banks with various clinical populations and ages, so that with time the scores that emerge from these many activities begin to have not only a common metric and range, but a shared meaning and understanding across users. In this paper, we provide an overview of item banking and CAT, discuss our approach to item banking and its byproducts, describe testing options, discuss an example of CAT for fatigue, and discuss models for long term sustainability of an entity such as PROMIS. Some barriers to success include limitations in the methods themselves, controversies and disagreements across approaches, and end-user reluctance to move away from the familiar. VL - 16 SN - 0962-9343 ER - TY - JOUR T1 - IRT health outcomes data analysis project: an overview and summary JF - Quality of Life Research Y1 - 2007 A1 - Cook, K. F. A1 - Teal, C. R. A1 - Bjorner, J. B. A1 - Cella, D. A1 - Chang, C-H. A1 - Crane, P. K. A1 - Gibbons, L. E. A1 - Hays, R. D. A1 - McHorney, C. A. A1 - Ocepek-Welikson, K. A1 - Raczek, A. E. A1 - Teresi, J. A. A1 - Reeve, B. B. KW - *Data Interpretation, Statistical KW - *Health Status KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Female KW - HIV Infections/psychology KW - Humans KW - Male KW - Neoplasms/psychology KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Stress, Psychological AB - BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. VL - 16 SN - 0962-9343 (Print) N1 - Cook, Karon FTeal, Cayla RBjorner, Jakob BCella, DavidChang, Chih-HungCrane, Paul KGibbons, Laura EHays, Ron DMcHorney, Colleen AOcepek-Welikson, KatjaRaczek, Anastasia ETeresi, Jeanne AReeve, Bryce B1U01AR52171-01/AR/United States NIAMSR01 (CA60068)/CA/United States NCIY1-PC-3028-01/PC/United States NCIResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:121-32. Epub 2007 Mar 10. ER - TY - CHAP T1 - Patient-reported outcomes measurement and computerized adaptive testing: An application of post-hoc simulation to a diagnostic screening instrument Y1 - 2007 A1 - Immekus, J. C. A1 - Gibbons, R. D. A1 - Rush, J. A. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 203 KB} ER - TY - JOUR T1 - The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years JF - Medical Care Y1 - 2007 A1 - Cella, D. A1 - Yount, S. A1 - Rothrock, N. A1 - Gershon, R. C. A1 - Cook, K. F. A1 - Reeve, B. A1 - Ader, D. A1 - Fries, J.F. A1 - Bruce, B. A1 - Rose, M. AB - The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. VL - 45 ER - TY - JOUR T1 - Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) JF - Medical Care Y1 - 2007 A1 - Reeve, B. B. A1 - Hays, R. D. A1 - Bjorner, J. B. A1 - Cook, K. F. A1 - Crane, P. K. A1 - Teresi, J. A. A1 - Thissen, D. A1 - Revicki, D. A. A1 - Weiss, D. J. A1 - Hambleton, R. K. A1 - Liu, H. A1 - Gershon, R. C. A1 - Reise, S. P. A1 - Lai, J. S. A1 - Cella, D. KW - *Health Status KW - *Information Systems KW - *Quality of Life KW - *Self Disclosure KW - Adolescent KW - Adult KW - Aged KW - Calibration KW - Databases as Topic KW - Evaluation Studies as Topic KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Questionnaires/standards KW - United States AB - BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. VL - 45 SN - 0025-7079 (Print) N1 - Reeve, Bryce BHays, Ron DBjorner, Jakob BCook, Karon FCrane, Paul KTeresi, Jeanne AThissen, DavidRevicki, Dennis AWeiss, David JHambleton, Ronald KLiu, HonghuGershon, RichardReise, Steven PLai, Jin-sheiCella, DavidPROMIS Cooperative GroupAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralUnited StatesMedical careMed Care. 2007 May;45(5 Suppl 1):S22-31. ER - TY - JOUR T1 - A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005 JF - Journal of Technology,Learning, and Assessment, Y1 - 2007 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. AB - Since researchers acknowledged the several advantages of computerized adaptive testing (CAT) over traditional linear test administration, the issue of item exposure control has received increased attention. Due to CAT’s underlying philosophy, particular items in the item pool may be presented too often and become overexposed, while other items are rarely selected by the CAT algorithm and thus become underexposed. Several item exposure control strategies have been presented in the literature aiming to prevent overexposure of some items and to increase the use rate of rarely or never selected items. This paper reviews such strategies that appeared in the relevant literature from 1983 to 2005. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item exposure control strategies for dichotomous scoring, polytomous scoring and testlet-based CAT systems. In addition, the paper discusses the strengths and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed with an overall objective of providing researchers of this field, especially newcomers, a wide view of item exposure control strategies. VL - 5(8) N1 - http://www.jtla.org. {PDF file, 326 KB} ER - TY - CHAP T1 - Statistical aspects of adaptive testing Y1 - 2007 A1 - van der Linden, W. J. A1 - Glas, C. A. W. CY - C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (Vol. 27: Psychometrics) (pp. 801838). Amsterdam: North-Holland. ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2006 A1 - Haley, S. M. A1 - Siebens, H. A1 - Coster, W. J. A1 - Tao, W. A1 - Black-Schaffer, R. M. A1 - Gandek, B. A1 - Sinclair, S. J. A1 - Ni, P. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. VL - 87 SN - 0003-9993 (Print) N1 - Haley, Stephen MSiebens, HilaryCoster, Wendy JTao, WeiBlack-Schaffer, Randie MGandek, BarbaraSinclair, Samuel JNi, PengshengK0245354-01/phsR01 hd043568/hd/nichdResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2006 Aug;87(8):1033-42. ER - TY - JOUR T1 - Effects of Estimation Bias on Multiple-Category Classification With an IRT-Based Adaptive Classification Procedure JF - Educational and Psychological Measurement Y1 - 2006 A1 - Yang, Xiangdong A1 - Poggio, John C. A1 - Glasnapp, Douglas R. AB -

The effects of five ability estimators, that is, maximum likelihood estimator, weighted likelihood estimator, maximum a posteriori, expected a posteriori, and Owen's sequential estimator, on the performances of the item response theory–based adaptive classification procedure on multiple categories were studied via simulations. The following results were found. (a) The Bayesian estimators were more likely to misclassify examinees into an inward category because of their inward biases, when a fixed start value of zero was assigned to every examinee. (b) When moderately accurate start values were available, however, Bayesian estimators produced classifications that were slightly more accurate than was the maximum likelihood estimator or weighted likelihood estimator. Expected a posteriori was the procedure that produced the most accurate results among the three Bayesian methods. (c) All five estimators produced equivalent efficiencies in terms of number of items required, which was 50 or more items except for abilities that were less than -2.00 or greater than 2.00.

VL - 66 UR - http://epm.sagepub.com/content/66/4/545.abstract ER - TY - JOUR T1 - Evaluation parameters for computer adaptive testing JF - British Journal of Educational Technology Y1 - 2006 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. VL - Vol. 37 IS - No 2 ER - TY - JOUR T1 - Item banks and their potential applications to health status assessment in diverse populations JF - Medical Care Y1 - 2006 A1 - Hahn, E. A. A1 - Cella, D. A1 - Bode, R. K. A1 - Gershon, R. C. A1 - Lai, J. S. AB - In the context of an ethnically diverse, aging society, attention is increasingly turning to health-related quality of life measurement to evaluate healthcare and treatment options for chronic diseases. When evaluating and treating symptoms and concerns such as fatigue, pain, or physical function, reliable and accurate assessment is a priority. Modern psychometric methods have enabled us to move from long, static tests that provide inefficient and often inaccurate assessment of individual patients, to computerized adaptive tests (CATs) that can precisely measure individuals on health domains of interest. These modern methods, collectively referred to as item response theory (IRT), can produce calibrated "item banks" from larger pools of questions. From these banks, CATs can be conducted on individuals to produce their scores on selected domains. Item banks allow for comparison of patients across different question sets because the patient's score is expressed on a common scale. Other advantages of using item banks include flexibility in terms of the degree of precision desired; interval measurement properties under most circumstances; realistic capability for accurate individual assessment over time (using CAT); and measurement equivalence across different patient populations. This work summarizes the process used in the creation and evaluation of item banks and reviews their potential contributions and limitations regarding outcome assessment and patient care, particularly when they are applied across people of different cultural backgrounds. VL - 44 N1 - 0025-7079 (Print)Journal ArticleResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov't ER - TY - JOUR T1 - Multidimensional computerized adaptive testing of the EORTC QLQ-C30: basic developments and evaluations JF - Quality of Life Research Y1 - 2006 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Fayers, P. A1 - Sprangers, M. A1 - Bjorner, J. B. KW - *Quality of Life KW - *Self Disclosure KW - Adult KW - Female KW - Health Status KW - Humans KW - Male KW - Middle Aged KW - Questionnaires/*standards KW - User-Computer Interface AB - OBJECTIVE: Self-report questionnaires are widely used to measure health-related quality of life (HRQOL). Ideally, such questionnaires should be adapted to the individual patient and at the same time scores should be directly comparable across patients. This may be achieved using computerized adaptive testing (CAT). Usually, CAT is carried out for a single domain at a time. However, many HRQOL domains are highly correlated. Multidimensional CAT may utilize these correlations to improve measurement efficiency. We investigated the possible advantages and difficulties of multidimensional CAT. STUDY DESIGN AND SETTING: We evaluated multidimensional CAT of three scales from the EORTC QLQ-C30: the physical functioning, emotional functioning, and fatigue scales. Analyses utilised a database with 2958 European cancer patients. RESULTS: It was possible to obtain scores for the three domains with five to seven items administered using multidimensional CAT that were very close to the scores obtained using all 12 items and with no or little loss of measurement precision. CONCLUSION: The findings suggest that multidimensional CAT may significantly improve measurement precision and efficiency and encourage further research into multidimensional CAT. Particularly, the estimation of the model underlying the multidimensional CAT and the conceptual aspects need further investigations. VL - 15 SN - 0962-9343 (Print) N1 - Petersen, Morten AaGroenvold, MogensAaronson, NeilFayers, PeterSprangers, MirjamBjorner, Jakob BEuropean Organisation for Research and Treatment of Cancer Quality of Life GroupResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2006 Apr;15(3):315-29. ER - TY - CHAP T1 - Applications of item response theory to improve health outcomes assessment: Developing item banks, linking instruments, and computer-adaptive testing T2 - Outcomes assessment in cancer Y1 - 2005 A1 - Hambleton, R. K. ED - C. C. Gotay ED - C. Snyder KW - Computer Assisted Testing KW - Health KW - Item Response Theory KW - Measurement KW - Test Construction KW - Treatment Outcomes AB - (From the chapter) The current chapter builds on Reise's introduction to the basic concepts, assumptions, popular models, and important features of IRT and discusses the applications of item response theory (IRT) modeling to health outcomes assessment. In particular, we highlight the critical role of IRT modeling in: developing an instrument to match a study's population; linking two or more instruments measuring similar constructs on a common metric; and creating item banks that provide the foundation for tailored short-form instruments or for computerized adaptive assessments. (PsycINFO Database Record (c) 2005 APA ) JF - Outcomes assessment in cancer PB - Cambridge University Press CY - Cambridge, UK N1 - Using Smart Source ParsingOutcomes assessment in cancer: Measures, methods, and applications. (pp. 445-464). New York, NY : Cambridge University Press. xiv, 662 pp ER - TY - JOUR T1 - An Authoring Environment for Adaptive Testing JF - Educational Technology & Society Y1 - 2005 A1 - Guzmán, E A1 - Conejo, R A1 - García-Hervás, E KW - Adaptability KW - Adaptive Testing KW - Authoring environment KW - Item Response Theory AB -

SIETTE is a web-based adaptive testing system. It implements Computerized Adaptive Tests. These tests are tailor-made, theory-based tests, where questions shown to students, finalization of the test, and student knowledge estimation is accomplished adaptively. To construct these tests, SIETTE has an authoring environment comprising a suite of tools that helps teachers create questions and tests properly, and analyze students’ performance after taking a test. In this paper, we present this authoring environment in the
framework of adaptive testing. As will be shown, this set of visual tools, that contain some adaptable eatures, can be useful for teachers lacking skills in this kind of testing. Additionally, other systems that implement adaptive testing will be studied. 

VL - 8 IS - 3 ER - TY - JOUR T1 - Computer adaptive testing JF - Journal of Applied Measurement Y1 - 2005 A1 - Gershon, R. C. KW - *Internet KW - *Models, Statistical KW - *User-Computer Interface KW - Certification KW - Health Surveys KW - Humans KW - Licensure KW - Microcomputers KW - Quality of Life AB - The creation of item response theory (IRT) and Rasch models, inexpensive accessibility to high speed desktop computers, and the growth of the Internet, has led to the creation and growth of computerized adaptive testing or CAT. This form of assessment is applicable for both high stakes tests such as certification or licensure exams, as well as health related quality of life surveys. This article discusses the historical background of CAT including its many advantages over conventional (typically paper and pencil) alternatives. The process of CAT is then described including descriptions of the specific differences of using CAT based upon 1-, 2- and 3-parameter IRT and various Rasch models. Numerous specific topics describing CAT in practice are described including: initial item selection, content balancing, test difficulty, test length and stopping rules. The article concludes with the author's reflections regarding the future of CAT. VL - 6 SN - 1529-7713 (Print) N1 - Gershon, Richard CReviewUnited StatesJournal of applied measurementJ Appl Meas. 2005;6(1):109-27. ER - TY - JOUR T1 - Computer adaptive testing JF - Journal of Applied Measurement Y1 - 2005 A1 - Gershon, R. C. VL - 6 ER - TY - JOUR T1 - Computerized Adaptive Testing With the Partial Credit Model: Estimation Procedures, Population Distributions, and Item Pool Characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, Joanna S. A1 - Dodd, Barbara G. A1 - Fitzpatrick, Steven J. A1 - Shieh, Yann Yann AB -

The primary purpose of this research is to examine the impact of estimation methods, actual latent trait distributions, and item pool characteristics on the performance of a simulated computerized adaptive testing (CAT) system. In this study, three estimation procedures are compared for accuracy of estimation: maximum likelihood estimation (MLE), expected a priori (EAP), and Warm's weighted likelihood estimation (WLE). Some research has shown that MLE and EAP perform equally well under certain conditions in polytomous CAT systems, such that they match the actual latent trait distribution. However, little research has compared these methods when prior estimates of. distributions are extremely poor. In general, it appears that MLE, EAP, and WLE procedures perform equally well when using an optimal item pool. However, the use of EAP procedures may be advantageous under nonoptimal testing conditions when the item pool is not appropriately matched to the examinees.

VL - 29 UR - http://apm.sagepub.com/content/29/6/433.abstract ER - TY - JOUR T1 - Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, J. A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. A1 - Shieh, Y. Y. VL - 29 ER - TY - JOUR T1 - Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, J. S. VL - 29 SN - 0146-6216 ER - TY - CHAP T1 - The development of the adaptive item language assessment (AILA) for mixed-ability students Y1 - 2005 A1 - Giouroglou, H. A1 - Economides, A. A. CY - Proceedings E-Learn 2005 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, 643-650, Vancouver, Canada, AACE, October 2005. ER - TY - CHAP T1 - An implemented theoretical framework for a common European foreign language adaptive assessment Y1 - 2005 A1 - Giouroglou, H. A1 - Economides, A. A. CY - Proceedings ICODL 2005, 3rd ternational Conference on Open and Distance Learning 'Applications of Pedagogy and Technology',339-350,Greek Open University, Patra, Greece ER - TY - JOUR T1 - An item bank was created to improve the measurement of cancer-related fatigue JF - Journal of Clinical Epidemiology Y1 - 2005 A1 - Lai, J-S. A1 - Cella, D. A1 - Dineen, K. A1 - Bode, R. A1 - Von Roenn, J. A1 - Gershon, R. C. A1 - Shevrin, D. KW - Adult KW - Aged KW - Aged, 80 and over KW - Factor Analysis, Statistical KW - Fatigue/*etiology/psychology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/*complications/psychology KW - Psychometrics KW - Questionnaires AB - OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. VL - 58 SN - 0895-4356 (Print)0895-4356 (Linking) N1 - Lai, Jin-SheiCella, DavidDineen, KellyBode, RitaVon Roenn, JamieGershon, Richard CShevrin, DanielEnglandJ Clin Epidemiol. 2005 Feb;58(2):190-7. ER - TY - JOUR T1 - Item response theory in computer adaptive testing: implications for outcomes measurement in rehabilitation JF - Rehabil Psychol Y1 - 2005 A1 - Ware, J. E A1 - Gandek, B. A1 - Sinclair, S. J. A1 - Bjorner, J. B. VL - 50 ER - TY - CHAP T1 - The ABCs of Computerized Adaptive Testing Y1 - 2004 A1 - Gershon, R. C. CY - T. M. Wood and W. Zhi (Eds.), Measurement issues and practice in physical activity. Champaign, IL: Human kinetics. ER - TY - JOUR T1 - Computerized adaptive measurement of depression: A simulation study JF - BMC Psychiatry Y1 - 2004 A1 - Gardner, W. A1 - Shear, K. A1 - Kelleher, K. J. A1 - Pajer, K. A. A1 - Mammen, O. A1 - Buysse, D. A1 - Frank, E. KW - *Computer Simulation KW - Adult KW - Algorithms KW - Area Under Curve KW - Comparative Study KW - Depressive Disorder/*diagnosis/epidemiology/psychology KW - Diagnosis, Computer-Assisted/*methods/statistics & numerical data KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Internet KW - Male KW - Mass Screening/methods KW - Patient Selection KW - Personality Inventory/*statistics & numerical data KW - Pilot Projects KW - Prevalence KW - Psychiatric Status Rating Scales/*statistics & numerical data KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Research Support, U.S. Gov't, P.H.S. KW - Severity of Illness Index KW - Software AB - Background: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity. VL - 4 ER - TY - RPRT T1 - Evaluating scale stability of a computer adaptive testing system Y1 - 2004 A1 - Guo, F. A1 - Wang, L. PB - GMAC CY - McLean, VA ER - TY - CHAP T1 - A Learning Environment for English for Academic Purposes Based on Adaptive Tests and Task-Based Systems T2 - Intelligent Tutoring Systems Y1 - 2004 A1 - Gonçalves, Jean P. A1 - Aluisio, Sandra M. A1 - de Oliveira, Leandro H.M. A1 - Oliveira Jr., Osvaldo N. ED - Lester, James C. ED - Vicari, Rosa Maria ED - Paraguaçu, Fábio JF - Intelligent Tutoring Systems T3 - Lecture Notes in Computer Science PB - Springer Berlin / Heidelberg VL - 3220 SN - 978-3-540-22948-3 UR - http://dx.doi.org/10.1007/978-3-540-30139-4_1 ER - TY - ABST T1 - Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project Y1 - 2004 A1 - Holman, R. A1 - Glas, C. A. A1 - Lindeboom, R. A1 - Zwinderman, A. H. A1 - de Haan, R. J. KW - *Disability Evaluation KW - *Health Surveys KW - *Logistic Models KW - *Questionnaires KW - Activities of Daily Living/*classification KW - Data Interpretation, Statistical KW - Health Status KW - Humans KW - Pilot Projects KW - Probability KW - Quality of Life KW - Severity of Illness Index AB - BACKGROUND: Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. METHODS: The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. RESULTS: The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. CONCLUSIONS: The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. JF - Health and Quality of Life Outcomes VL - 2 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Holman, RebeccaGlas, Cees A WLindeboom, RobertZwinderman, Aeilko Hde Haan, Rob JEnglandHealth Qual Life Outcomes. 2004 Jun 16;2:29. U2 - 441407 ER - TY - JOUR T1 - Siette: a web-based tool for adaptive testing JF - International Journal of Artificial Intelligence in Education Y1 - 2004 A1 - Conejo, R A1 - Guzmán, E A1 - Millán, E A1 - Trella, M A1 - Pérez-De-La-Cruz, JL A1 - Ríos, A KW - computerized adaptive testing VL - 14 ER - TY - CHAP T1 - State-of-the-art and adaptive open-closed items in adaptive foreign language assessment Y1 - 2004 A1 - Giouroglou, H. A1 - Economides, A. A. CY - Proceedings 4th Hellenic Conference with ternational Participation: Informational and Communication Technologies in Education, Athens,747-756 ER - TY - JOUR T1 - Testing vocabulary knowledge: Size, strength, and computer adaptiveness JF - Language Learning Y1 - 2004 A1 - Laufer, B. A1 - Goldstein, Z. AB - (from the journal abstract) In this article, we describe the development and trial of a bilingual computerized test of vocabulary size, the number of words the learner knows, and strength, a combination of four aspects of knowledge of meaning that are assumed to constitute a hierarchy of difficulty: passive recognition (easiest), active recognition, passive recall, and active recall (hardest). The participants were 435 learners of English as a second language. We investigated whether the above hierarchy was valid and which strength modality correlated best with classroom language performance. Results showed that the hypothesized hierarchy was present at all word frequency levels, that passive recall was the best predictor of classroom language performance, and that growth in vocabulary knowledge was different for the different strength modalities. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 54 N1 - References .Blackwell Publishing, United Kingdom ER - TY - CONF T1 - Cognitive CAT in foreign language assessment T2 - Proceedings 11th International PEG Conference Y1 - 2003 A1 - Giouroglou, H. A1 - Economides, A. A. JF - Proceedings 11th International PEG Conference CY - Powerful ICT Tools for Learning and Teaching, PEG '03, CD-ROM, 2003 ER - TY - JOUR T1 - Computerized adaptive rating scales for measuring managerial performance JF - International Journal of Selection and Assessment Y1 - 2003 A1 - Schneider, R. J. A1 - Goff, M. A1 - Anderson, S. A1 - Borman, W. C. KW - Adaptive Testing KW - Algorithms KW - Associations KW - Citizenship KW - Computer Assisted Testing KW - Construction KW - Contextual KW - Item Response Theory KW - Job Performance KW - Management KW - Management Personnel KW - Rating Scales KW - Test AB - Computerized adaptive rating scales (CARS) had been developed to measure contextual or citizenship performance. This rating format used a paired-comparison protocol, presenting pairs of behavioral statements scaled according to effectiveness levels, and an iterative item response theory algorithm to obtain estimates of ratees' citizenship performance (W. C. Borman et al, 2001). In the present research, we developed CARS to measure the entire managerial performance domain, including task and citizenship performance, thus addressing a major limitation of the earlier CARS. The paper describes this development effort, including an adjustment to the algorithm that reduces substantially the number of item pairs required to obtain almost as much precision in the performance estimates. (PsycINFO Database Record (c) 2005 APA ) VL - 11 ER - TY - JOUR T1 - Computerized adaptive testing with item cloning JF - Applied Psychological Measurement Y1 - 2003 A1 - Glas, C. A. W. A1 - van der Linden, W. J. KW - computerized adaptive testing AB - (from the journal abstract) To increase the number of items available for adaptive testing and reduce the cost of item writing, the use of techniques of item cloning has been proposed. An important consequence of item cloning is possible variability between the item parameters. To deal with this variability, a multilevel item response (IRT) model is presented which allows for differences between the distributions of item parameters of families of item clones. A marginal maximum likelihood and a Bayesian procedure for estimating the hyperparameters are presented. In addition, an item-selection procedure for computerized adaptive testing with item cloning is presented which has the following two stages: First, a family of item clones is selected to be optimal at the estimate of the person parameter. Second, an item is randomly selected from the family for administration. Results from simulation studies based on an item pool from the Law School Admission Test (LSAT) illustrate the accuracy of these item pool calibration and adaptive testing procedures. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 27 N1 - References .Sage Publications, US ER - TY - JOUR T1 - Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function (FLEX-SF) JF - Medical Care (in press) Y1 - 2003 A1 - Cook, K. F. A1 - Roddey, T. S. A1 - Gartsman, G M A1 - Olson, S L N1 - #CO03-01 ER - TY - CONF T1 - Evaluating the comparability of English- and French-speaking examinees on a science achievement test administered using two-stage testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Puhan, G. A1 - Gierl, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - PDF file, 568 K ER - TY - CONF T1 - Online calibration and scale stability of a CAT program T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Guo, F. A1 - Wang, G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 274 KB} ER - TY - CONF T1 - Standard-setting issues in computerized-adaptive testing T2 - Paper Prepared for Presentation at the Annual Conference of the Canadian Society for Studies in Education Y1 - 2003 A1 - Gushta, M. M. JF - Paper Prepared for Presentation at the Annual Conference of the Canadian Society for Studies in Education CY - Halifax, Nova Scotia, May 30th, 2003 ER - TY - CONF T1 - Content-stratified random item selection in computerized classification testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Guille, R. Lipner, R. S. A1 - Norcini, J. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - #GU02-01 ER - TY - JOUR T1 - The implications of the use of non-optimal items in a Computer Adaptive Testing (CAT) environment JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 2002 A1 - Grodenchik, D. J. KW - computerized adaptive testing AB - This study describes the effects of manipulating item difficulty in a computer adaptive testing (CAT) environment. There are many potential benefits when using CATS as compared to traditional tests. These include increased security, shorter tests, and more precise measurement. According to IRT, the theory underlying CAT, as the computer continually recalculates ability, items that match that current estimate of ability are administered. Such items provide maximum information about examinees during the test. Herein, however, lies a potential problem. These optimal CAT items result in an examinee having only a 50% chance of a correct response. Some examinees may consider such items unduly challenging. Further, when test anxiety is a factor, it is possible that test scores may be negatively affected. This research was undertaken to determine the effects of administering easier CAT items on ability estimation and test length using computer simulations. Also considered was the administration of different numbers of initial items prior to the start of the adaptive portion of the test, using three different levels of measurement precision. Results indicate that regardless of the number of initial items administered, the level of precision employed, or the modifications made to item difficulty, the approximation of estimated ability to true ability is good in all cases. Additionally, the standard deviations of the ability estimates closely approximate the theoretical levels of precision used as stopping rules for the simulated CATs. Since optimal CAT items are not used, each item administered provides less information about examinees than optimal CAT items. This results in longer tests. Fortunately, using easier items that provide up to a 66.4% chance of a correct response results in tests that only modestly increase in length, across levels of precision. For larger standard errors, even easier items (up to a 73.5% chance of a correct response) result in only negligible to modest increases in test length. Examinees who find optimal CAT items difficult or examinees with test anxiety may find CATs that implement easier items enhance the already existing benefits of CAT. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 63 ER - TY - JOUR T1 - Multidimensional adaptive testing for mental health problems in primary care JF - Medical Care Y1 - 2002 A1 - Gardner, W. A1 - Kelleher, K. J. A1 - Pajer, K. A. KW - Adolescent KW - Child KW - Child Behavior Disorders/*diagnosis KW - Child Health Services/*organization & administration KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Linear Models KW - Male KW - Mass Screening/*methods KW - Parents KW - Primary Health Care/*organization & administration AB - OBJECTIVES: Efficient and accurate instruments for assessing child psychopathology are increasingly important in clinical practice and research. For example, screening in primary care settings can identify children and adolescents with disorders that may otherwise go undetected. However, primary care offices are notorious for the brevity of visits and screening must not burden patients or staff with long questionnaires. One solution is to shorten assessment instruments, but dropping questions typically makes an instrument less accurate. An alternative is adaptive testing, in which a computer selects the items to be asked of a patient based on the patient's previous responses. This research used a simulation to test a child mental health screen based on this technology. RESEARCH DESIGN: Using half of a large sample of data, a computerized version was developed of the Pediatric Symptom Checklist (PSC), a parental-report psychosocial problem screen. With the unused data, a simulation was conducted to determine whether the Adaptive PSC can reproduce the results of the full PSC with greater efficiency. SUBJECTS: PSCs were completed by parents on 21,150 children seen in a national sample of primary care practices. RESULTS: Four latent psychosocial problem dimensions were identified through factor analysis: internalizing problems, externalizing problems, attention problems, and school problems. A simulated adaptive test measuring these traits asked an average of 11.6 questions per patient, and asked five or fewer questions for 49% of the sample. There was high agreement between the adaptive test and the full (35-item) PSC: only 1.3% of screening decisions were discordant (kappa = 0.93). This agreement was higher than that obtained using a comparable length (12-item) short-form PSC (3.2% of decisions discordant; kappa = 0.84). CONCLUSIONS: Multidimensional adaptive testing may be an accurate and efficient technology for screening for mental health problems in primary care settings. VL - 40 SN - 0025-7079 (Print)0025-7079 (Linking) N1 - Gardner, WilliamKelleher, Kelly JPajer, Kathleen AMCJ-177022/PHS HHS/MH30915/MH/NIMH NIH HHS/MH50629/MH/NIMH NIH HHS/Med Care. 2002 Sep;40(9):812-23. ER - TY - ABST T1 - CB BULATS: Examining the reliability of a computer based test using test-retest method Y1 - 2001 A1 - Geranpayeh, A. CY - Cambridge ESOL Research Notes, Issue 5, July 2001, pp N1 - #GE01-01 14-16. {PDF file, 456 KB} ER - TY - CONF T1 - Deriving a stopping rule for sequential adaptive tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Grabovsky, I. A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 111 KB} ER - TY - CONF T1 - Impact of item location effects on ability estimation in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Liu, M. A1 - Zhu, R. A1 - Guo, F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - #LI01-01 ER - TY - CONF T1 - The influence of item characteristics and administration position on CAT Scores T2 - Paper presented at the 33rd annual meeting of the Northeastern Educational Research Association Y1 - 2001 A1 - Wang, L. A1 - Gawlick, L. JF - Paper presented at the 33rd annual meeting of the Northeastern Educational Research Association CY - Hudson Valley, NY, October 26, 2001 ER - TY - CONF T1 - Modeling variability in item parameters in CAT T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Glas, C. A. W. A1 - van der Linden, W. J. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - CONF T1 - Multidimensional IRT-based adaptive sequential mastery testing T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Vos, H. J. A1 - Glas, C. E. W. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - JOUR T1 - Nouveaux développements dans le domaine du testing informatisé [New developments in the area of computerized testing] JF - Psychologie Française Y1 - 2001 A1 - Meijer, R. R. A1 - Grégoire, J. KW - Adaptive Testing KW - Computer Applications KW - Computer Assisted KW - Diagnosis KW - Psychological Assessment computerized adaptive testing AB - L'usage de l'évaluation assistée par ordinateur s'est fortement développé depuis la première formulation de ses principes de base dans les années soixante et soixante-dix. Cet article offre une introduction aux derniers développements dans le domaine de l'évaluation assistée par ordinateur, en particulier celui du testing adaptative informatisée (TAI). L'estimation de l'aptitude, la sélection des items et le développement d'une base d'items dans le cas du TAI sont discutés. De plus, des exemples d'utilisations innovantes de l'ordinateur dans des systèmes intégrés de testing et de testing via Internet sont présentés. L'article se termine par quelques illustrations de nouvelles applications du testing informatisé et des suggestions pour des recherches futures.Discusses the latest developments in computerized psychological assessment, with emphasis on computerized adaptive testing (CAT). Ability estimation, item selection, and item pool development in CAT are described. Examples of some innovative approaches to CAT are presented. (PsycINFO Database Record (c) 2005 APA ) VL - 46 ER - TY - CONF T1 - On-line Calibration Using PARSCALE Item Specific Prior Method: Changing Test Population and Sample Size T2 - Paper presented at National Council on Measurement in Education Annual Meeting Y1 - 2001 A1 - Guo, F. A1 - Stone, E. A1 - Cruz, D. JF - Paper presented at National Council on Measurement in Education Annual Meeting CY - Seattle, Washington ER - TY - ABST T1 - Scoring alternatives for incomplete computerized adaptive tests (Research Report 01-20) Y1 - 2001 A1 - Way, W. D. A1 - Gawlick, L. A. A1 - Eignor, D. R. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Adaptive mastery testing using a multidimensional IRT model and Bayesian sequential decision theory (Research Report 00-06) Y1 - 2000 A1 - Glas, C. A. W. A1 - Vos, H. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Capitalization on item calibration error in adaptive testing JF - Applied Measurement in Education Y1 - 2000 A1 - van der Linden, W. J. A1 - Glas, C. A. W. KW - computerized adaptive testing AB - (from the journal abstract) In adaptive testing, item selection is sequentially optimized during the test. Because the optimization takes place over a pool of items calibrated with estimation error, capitalization on chance is likely to occur. How serious the consequences of this phenomenon are depends not only on the distribution of the estimation errors in the pool or the conditional ratio of the test length to the pool size given ability, but may also depend on the structure of the item selection criterion used. A simulation study demonstrated a dramatic impact of capitalization on estimation errors on ability estimation. Four different strategies to minimize the likelihood of capitalization on error in computerized adaptive testing are discussed. VL - 13 N1 - References .Lawrence Erlbaum, US ER - TY - JOUR T1 - CAT administration of language placement examinations JF - Journal of Applied Measurement Y1 - 2000 A1 - Stahl, J. A1 - Bergstrom, B. A1 - Gershon, R. C. KW - *Language KW - *Software KW - Aptitude Tests/*statistics & numerical data KW - Educational Measurement/*statistics & numerical data KW - Humans KW - Psychometrics KW - Reproducibility of Results KW - Research Support, Non-U.S. Gov't AB - This article describes the development of a computerized adaptive test for Cegep de Jonquiere, a community college located in Quebec, Canada. Computerized language proficiency testing allows the simultaneous presentation of sound stimuli as the question is being presented to the test-taker. With a properly calibrated bank of items, the language proficiency test can be offered in an adaptive framework. By adapting the test to the test-taker's level of ability, an assessment can be made with significantly fewer items. We also describe our initial attempt to detect instances in which "cheating low" is occurring. In the "cheating low" situation, test-takers deliberately answer questions incorrectly, questions that they are fully capable of answering correctly had they been taking the test honestly. VL - 1 N1 - 1529-7713Journal Article ER - TY - BOOK T1 - Computerized adaptive testing: A primer (2nd edition) Y1 - 2000 A1 - Wainer, H., A1 - Dorans, N. A1 - Eignor, D. R. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale, N. J. : Lawrence Erlbaum Associates ER - TY - BOOK T1 - Computerized adaptive testing: Theory and practice Y1 - 2000 A1 - van der Linden, W. J. A1 - Glas, C. A. W. PB - Kluwer Academic Publishers CY - Dordrecht, The Netherlands ER - TY - CHAP T1 - Cross-validating item parameter estimation in adaptive testing Y1 - 2000 A1 - van der Linden, W. J. A1 - Glas, C. A. W. CY - A. Boorsma, M. A. J. van Duijn, and T. A. B. Snijders (Eds.) (pp. 205-219), Essays on item response theory. New York: Springer. ER - TY - JOUR T1 - Detection of known items in adaptive testing with a statistical quality control method JF - Journal of Educational and Behavioral Statistics Y1 - 2000 A1 - Veerkamp, W. J. J. A1 - Glas, C. E. W. VL - 25 ER - TY - CHAP T1 - Item calibration and parameter drift Y1 - 2000 A1 - Glas, C. A. W. CY - W. J. van der linden and C. A. W. Glas (Eds.). Computerized adaptive testing: Theory and practice (pp.183-199). Norwell MA: Kluwer Academic. ER - TY - JOUR T1 - Item selection algorithms in computerized adaptive testing JF - Psicothema Y1 - 2000 A1 - Garcia, David A. A1 - Santa Cruz, C. A1 - Dorronsoro, J. R. A1 - Rubio Franco, V. J. AB - Studied the efficacy of 3 different item selection algorithms in computerized adaptive testing. Ss were 395 university students (aged 20-25 yrs) in Spain. Ss were asked to submit answers via computer to 28 items of a personality questionnaire using item selection algorithms based on maximum item information, entropy, or mixed item-entropy algorithms. The results were evaluated according to ability of Ss to use item selection algorithms and number of questions. Initial results indicate that mixed criteria algorithms were more efficient than information or entropy algorithms for up to 15 questionnaire items, but that differences in efficiency decreased with increasing item number. Implications for developing computer adaptive testing methods are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 12 N1 - Spanish .Algoritmo mixto minima entropia-maxima informacion para la seleccion de items en un test adaptativo informatizado..Universidad de Oviedo, Spain ER - TY - CHAP T1 - MML and EAP estimation in testlet-based adaptive testing T2 - Computerized adaptive testing: Theory and practice Y1 - 2000 A1 - Glas, C. A. W. A1 - Wainer, H., A1 - Bradlow, E. T. JF - Computerized adaptive testing: Theory and practice PB - Kluwer Academic Publishers CY - Dordrecht, The Netherlands ER - TY - CONF T1 - Test security and the development of computerized tests T2 - Paper presented at the National Council on Measurement in Education invited symposium: Maintaining test security in computerized programs–Implications for practice Y1 - 2000 A1 - Guo, F. A1 - Way, W. D. A1 - Reshetar, R. JF - Paper presented at the National Council on Measurement in Education invited symposium: Maintaining test security in computerized programs–Implications for practice CY - New Orleans ER - TY - CHAP T1 - Testlet-based adaptive mastery testing, W Y1 - 2000 A1 - Vos, H. J. A1 - Glas, C. A. W. CY - J. van der Linden (Ed.), Computerized adaptive testing: Theory and practice (pp. 289-309). Norwell MA: Kluwer. ER - TY - CONF T1 - CAT administration of language placement exams T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Stahl, J. A1 - Gershon, R. C. A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Evaluating the usefulness of computerized adaptive testing for medical in-course assessment JF - Academic Medicine Y1 - 1999 A1 - Kreiter, C. D. A1 - Ferguson, K. A1 - Gruppen, L. D. KW - *Automation KW - *Education, Medical, Undergraduate KW - Educational Measurement/*methods KW - Humans KW - Internal Medicine/*education KW - Likelihood Functions KW - Psychometrics/*methods KW - Reproducibility of Results AB - PURPOSE: This study investigated the feasibility of converting an existing computer-administered, in-course internal medicine test to an adaptive format. METHOD: A 200-item internal medicine extended matching test was used for this research. Parameters were estimated with commercially available software with responses from 621 examinees. A specially developed simulation program was used to retrospectively estimate the efficiency of the computer-adaptive exam format. RESULTS: It was found that the average test length could be shortened by almost half with measurement precision approximately equal to that of the full 200-item paper-and-pencil test. However, computer-adaptive testing with this item bank provided little advantage for examinees at the upper end of the ability continuum. An examination of classical item statistics and IRT item statistics suggested that adding more difficult items might extend the advantage to this group of examinees. CONCLUSIONS: Medical item banks presently used for incourse assessment might be advantageously employed in adaptive testing. However, it is important to evaluate the match between the items and the measurement objective of the test before implementing this format. VL - 74 SN - 1040-2446 (Print) N1 - Kreiter, C DFerguson, KGruppen, L DUnited statesAcademic medicine : journal of the Association of American Medical CollegesAcad Med. 1999 Oct;74(10):1125-8. JO - Acad Med ER - TY - CONF T1 - Fairness in computer-based testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Gallagher, Aand A1 - Bridgeman, B. A1 - Calahan, C JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada N1 - Fairness in computer-based testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada. ER - TY - CHAP T1 - Item calibration and parameter drift Y1 - 1999 A1 - Glas, C. A. W. A1 - Veerkamp, W. J. J. CY - W. J. van der Linden and C. A. W. Glas (Eds.), Computer adaptive testing: Theory and practice. Norwell MA: Kluwer. ER - TY - CONF T1 - Managing CAT item development in the face of uncertainty T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Guo, F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - ABST T1 - Adaptive mastery testing using the Rasch model and Bayesian sequential decision theory (Research Report 98-15) Y1 - 1998 A1 - Glas, C. A. W. A1 - Vos, H. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Capitalization on item calibration error in adaptive testing (Research Report 98-07) Y1 - 1998 A1 - van der Linden, W. J. A1 - Glas, C. A. W. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Comparability of paper-and-pencil and computer adaptive test scores on the GRE General Test (GRE Board Professional Report No 95-08P; Educational Testing Service Research Report 98-38) Y1 - 1998 A1 - Schaeffer, G. A1 - Bridgeman, B. A1 - Golub-Smith, M. L. A1 - Lewis, C. A1 - Potenza, M. T. A1 - Steffen, M. CY - Princeton, NJ: Educational Testing Service ER - TY - RPRT T1 - Comparability of paper-and-pencil and computer adaptive test scores on the GRE General Test Y1 - 1998 A1 - Schaeffer, G. A. A1 - Bridgeman, B. A1 - Golub-Smith, M. L. A1 - Lewis, C. A1 - Potenza, M. T. A1 - Steffen, M. PB - Educational Testing Services CY - Princeton, N.J. SN - ETS Research Report 98-38 ER - TY - ABST T1 - Quality control of on-line calibration in computerized adaptive testing (Research Report 98-03) Y1 - 1998 A1 - Glas, C. A. W. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - RPRT T1 - Statistical tests for person misfit in computerized adaptive testing Y1 - 1998 A1 - Glas, C. A. W. A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. PB - Faculty of Educational Science and Technology, Univeersity of Twente CY - Enschede, The Netherlands SN - 98-01 ER - TY - ABST T1 - Statistical tests for person misfit in computerized adaptive testing (Research Report 98-01) Y1 - 1998 A1 - Glas, C. A. W. A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. L. A. CY - Enschede, The Netherlands : University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Testing word knowledge by telephone to estimate general cognitive aptitude using an adaptive test JF - Intelligence Y1 - 1998 A1 - Legree, P. J. A1 - Fischl, M. A A1 - Gade, P. A. A1 - Wilson, M. VL - 26 ER - TY - CONF T1 - Alternate methods of scoring computer-based adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Green, B. F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Assessing speededness in variable-length computer-adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Bontempo, B A1 - Julian, E. R A1 - Gorham, J. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - A computerized adaptive testing system for speech discrimination measurement: The Speech Sound Pattern Discrimination Test JF - Journal of the Accoustical Society of America Y1 - 1997 A1 - Bochner, J. A1 - Garrison, W. A1 - Palmer, L. A1 - MacKenzie, D. A1 - Braveman, A. KW - *Diagnosis, Computer-Assisted KW - *Speech Discrimination Tests KW - *Speech Perception KW - Adolescent KW - Adult KW - Audiometry, Pure-Tone KW - Human KW - Middle Age KW - Psychometrics KW - Reproducibility of Results AB - A computerized, adaptive test-delivery system for the measurement of speech discrimination, the Speech Sound Pattern Discrimination Test, is described and evaluated. Using a modified discrimination task, the testing system draws on a pool of 130 items spanning a broad range of difficulty to estimate an examinee's location along an underlying continuum of speech processing ability, yet does not require the examinee to possess a high level of English language proficiency. The system is driven by a mathematical measurement model which selects only test items which are appropriate in difficulty level for a given examinee, thereby individualizing the testing experience. Test items were administered to a sample of young deaf adults, and the adaptive testing system evaluated in terms of respondents' sensory and perceptual capabilities, acoustic and phonetic dimensions of speech, and theories of speech perception. Data obtained in this study support the validity, reliability, and efficiency of this test as a measure of speech processing ability. VL - 101 N1 - 972575560001-4966Journal Article ER - TY - JOUR T1 - Developing and scoring an innovative computerized writing assessment JF - Journal of Educational Measurement Y1 - 1997 A1 - Davey, T. A1 - Godwin, J., A1 - Mittelholz, D. VL - 34 ER - TY - CHAP T1 - Adaptive assessment using granularity hierarchies and Bayesian nets Y1 - 1996 A1 - Collins, J. A. A1 - Greer, J. E. A1 - Huang, S. X. CY - Frasson, C. and Gauthier, G. and Lesgold, A. (Eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montréal, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 569-577. ER - TY - JOUR T1 - The effect of individual differences variables on the assessment of ability for Computerized Adaptive Testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1996 A1 - Gershon, R. C. KW - computerized adaptive testing AB - Computerized Adaptive Testing (CAT) continues to gain momentum as the accepted testing modality for a growing number of certification, licensure, education, government and human resource applications. However, the developers of these tests have for the most part failed to adequately explore the impact of individual differences such as test anxiety on the adaptive testing process. It is widely accepted that non-cognitive individual differences variables interact with the assessment of ability when using written examinations. Logic would dictate that individual differences variables would equally affect CAT. Two studies were used to explore this premise. In the first study, 507 examinees were given a test anxiety survey prior to taking a high stakes certification exam using CAT or using a written format. All examinees had already completed their course of study, and the examination would be their last hurdle prior to being awarded certification. High test anxious examinees performed worse than their low anxious counterparts on both testing formats. The second study replicated the finding that anxiety depresses performance in CAT. It also addressed the differential effect of anxiety on within test performance. Examinees were candidates taking their final certification examination following a four year college program. Ability measures were calculated for each successive part of the test for 923 subjects. Within subject performance varied depending upon test position. High anxious examinees performed poorly at all points in the test, while low and medium anxious examinee performance peaked in the middle of the test. If test anxiety and performance measures were actually the same trait, then low anxious individuals should have performed equally well throughout the test. The observed interaction of test anxiety and time on task serves as strong evidence that test anxiety has motivationally mediated as well as cognitively mediated effects. The results of the studies are di (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 57 ER - TY - CONF T1 - The effects of methods of theta estimation, prior distribution, and number of quadrature points on CAT using the graded response model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Hou, L. A1 - Chen, S. A1 - Dodd. B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York NY ER - TY - JOUR T1 - A comparison of item selection routines in linear and adaptive tests JF - Journal of Educational Measurement Y1 - 1995 A1 - Schnipke, D. L., A1 - Green, B. F. VL - 32 ER - TY - CONF T1 - Does cheating on CAT pay: Not T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Gershon, R. C. A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco N1 - ERIC ED 392 844 ER - TY - BOOK T1 - CAT software system [computer program Y1 - 1994 A1 - Gershon, R. C. CY - Chicago IL: Computer Adaptive Technologies ER - TY - JOUR T1 - Computer adaptive testing JF - International journal of Educational Research Y1 - 1994 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. A1 - Gershon, R. C. VL - 6 ER - TY - CONF T1 - Computerized adaptive testing exploring examinee response time using hierarchical linear modeling T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1994 A1 - Bergstrom, B. A1 - Gershon, R. C. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - ERIC No. ED 400 286). ER - TY - JOUR T1 - Computerized adaptive testing for licensure and certification JF - CLEAR Exam Review Y1 - 1994 A1 - Bergstrom, Betty A. A1 - Gershon, R. C. VL - Winter 1994 ER - TY - ABST T1 - Introduction of a computer adaptive GRE General test (Research Report 93-57) Y1 - 1993 A1 - Schaeffer, G. A. A1 - Steffen, M. A1 - Golub-Smith, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Altering the level of difficulty in computer adaptive testing JF - Applied Measurement in Education Y1 - 1992 A1 - Bergstrom, Betty A. A1 - Lunz, M. E. A1 - Gershon, R. C. KW - computerized adaptive testing AB - Examines the effect of altering test difficulty on examinee ability measures and test length in a computer adaptive test. The 225 Ss were randomly assigned to 3 test difficulty conditions and given a variable length computer adaptive test. Examinees in the hard, medium, and easy test condition took a test targeted at the 50%, 60%, or 70% probability of correct response. The results show that altering the probability of a correct response does not affect estimation of examinee ability and that taking an easier computer adaptive test only slightly increases the number of items necessary to reach specified levels of precision. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 5 N1 - Lawrence Erlbaum, US ER - TY - CONF T1 - Comparison of item targeting strategies for pass/fail adaptive tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Bergstrom, B. A1 - Gershon, R. C. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA N1 - (ERIC NO. ED 400 287). ER - TY - CONF T1 - Student attitudes toward computer-adaptive test administration T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Baghi, H A1 - Ferrara, S. F A1 - Gabrys, R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - Applications of computer-adaptive testing in Maryland T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Baghi, H A1 - Gabrys, R. A1 - Ferrara, S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - ABST T1 - Collected works on the legal aspects of computerized adaptive testing Y1 - 1991 A1 - Stenson, H. A1 - Graves, P. A1 - Gardiner, J. A1 - Dally, L. CY - Chicago, IL: National Council of State Boards of Nursing, Inc ER - TY - CONF T1 - Development and evaluation of hierarchical testlets in two-stage tests using integer linear programming T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Lam, T. L. A1 - Goong, Y. Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - CONF T1 - Individual differences in computer adaptive testing: Anxiety, computer literacy, and satisfaction T2 - Paper presented at the annual meeting of the National Council on Measurement in Education. Y1 - 1991 A1 - Gershon, R. C. A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education. ER - TY - BOOK T1 - Computerized adaptive testing: A primer (Eds.) Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale NJ: Erlbaum ER - TY - CHAP T1 - Future challenges Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 233-272). Hillsdale NJ: Erlbaum. ER - TY - Generic T1 - Test-retest consistency of computer adaptive tests. T2 - annual meeting of the National Council on Measurement in Education Y1 - 1990 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. A1 - Gershon, R. C. JF - annual meeting of the National Council on Measurement in Education CY - Boston, MA USA ER - TY - ABST T1 - Utility of predicting starting abilities in sequential computer-based adaptive tests (Research Report 90-1) Y1 - 1990 A1 - Green, B. F. A1 - Thomas, T. J. CY - Baltimore MD: Johns Hopkins University, Department of Psychology ER - TY - JOUR T1 - Adaptive Estimation When the Unidimensionality Assumption of IRT is Violated JF - Applied Psychological Measurement Y1 - 1989 A1 - Folk, V.G. A1 - Green, B. F. VL - 13 IS - 4 ER - TY - JOUR T1 - Adaptive estimation when the unidimensionality assumption of IRT is violated JF - Applied Psychological Measurement Y1 - 1989 A1 - Folk, V.G., A1 - Green, B. F. VL - 13 ER - TY - BOOK T1 - CAT administrator [Computer program] Y1 - 1989 A1 - Gershon, R. C. CY - Chicago: Micro Connections ER - TY - ABST T1 - Computerized adaptive tests Y1 - 1989 A1 - Grist, S. A1 - Rudner, L. M. A1 - Wise CY - ERIC Clearinghouse on Tests, Measurement, and Evaluation, no. 107 ER - TY - ABST T1 - Item-presentation controls for computerized adaptive testing: Content-balancing versus min-CAT (Research Report 89-1) Y1 - 1989 A1 - Thomas, T. J. A1 - Green, B. F. CY - Baltimore MD: Johns Hopkins University, Department of Psychology, Psychometric Laboratory ER - TY - CHAP T1 - Construct validity of computer-based tests Y1 - 1988 A1 - Green, B. F. CY - H. Wainer and H. Braun (Eds.), Test validity (pp. 77-103). Hillsdale NJ: Erlbaum. ER - TY - JOUR T1 - Critical problems in computer-based psychological measurement, , , JF - Applied Measurement in Education Y1 - 1988 A1 - Green, B. F. VL - 1 ER - TY - CONF T1 - The Rasch model and missing data, with an emphasis on tailoring test items T2 - annual meeting of the American Educational Research Association Y1 - 1988 A1 - de Gruijter, D. N. M. AB - Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the Mokken scale) is considered. The possibility of tailoring test items in combination with their estimation is discussed; however, most methods for the estimation of item parameters are inadequate under tailoring. Without special measures, only marginal maximum likelihood produces adequate item parameter estimates under item tailoring. Fischer's approximate minimum-chi-square method for estimation of item parameters for the Rasch model is discussed, which efficiently produces item parameters. (TJH) JF - annual meeting of the American Educational Research Association CY - New Orleans, LA. USA ER - TY - JOUR T1 - The Rasch model and multi-stage testing JF - Journal of Educational and Behavioral Statistics Y1 - 1988 A1 - Glas, C. A. W. VL - 13 ER - TY - JOUR T1 - Wilcox' closed sequential testing procedure in stratified item domains JF - Methodika Y1 - 1987 A1 - de Gruijter, D. N. VL - 1(1) ER - TY - JOUR T1 - An application of computer adaptive testing with communication handicapped examinees JF - Educational and Psychological Measurement Y1 - 1986 A1 - Garrison, W. M. A1 - Baumgarten, B. S. KW - computerized adaptive testing AB - This study was conducted to evaluate a computerized adaptive testing procedure for the measurement of mathematical skills of entry level deaf college students. The theoretical basis of the study was the Rasch model for person measurement. Sixty persons were tested using an Apple II Plus microcomputer. Ability estimates provided by the computerized procedure were compared for stability with those obtained six to eight weeks earlier from conventional (written) testing of the same subject matter. Students' attitudes toward their testing experiences also were measured. Substantial increases in measurement efficiency (by reducing test length) were realized through the adaptive testing procedure. Because the item pool used was not specifically designed for adaptive testing purposes, the psychometric quality of measurements resulting from the different testing methods was approximately equal. Attitudes toward computerized testing were favorable. VL - 46 SN - 0013-1644 N1 - Using Smart Source Parsingno. pp. MarchJournal Article10.1177/0013164486461003 ER - TY - JOUR T1 - Equivalence of conventional and computer presentation of speed tests JF - Applied Psychological Measurement Y1 - 1986 A1 - Greaud, V. A., A1 - Green, B. F. VL - 10 ER - TY - CONF T1 - Operational characteristics of adaptive testing procedures using partial credit scoring T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1986 A1 - Koch, W. R. A1 - Dodd. B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA N1 - #KO86-01 ER - TY - JOUR T1 - Monitoring item calibrations from data yielded by an adaptive testing procedure JF - Educational Research Quarterly Y1 - 1985 A1 - Garrison, W. M. VL - 10 ER - TY - ABST T1 - Analysis of experimental CAT ASVAB data Y1 - 1984 A1 - Allred, L. A A1 - Green, B. F. CY - Baltimore MD: Johns Hopkins University, Department of Psychology ER - TY - ABST T1 - Analysis of speeded test data from experimental CAT system Y1 - 1984 A1 - Greaud, V. A., A1 - Green, B. F. CY - Baltimore MD: Johns Hopkins University, Department of Psychology ER - TY - ABST T1 - Evaluation plan for the computerized adaptive vocational aptitude battery (Research Report 82-1) Y1 - 1984 A1 - Green, B. F. A1 - Bock, R. D. A1 - Humphreys, L. G. A1 - Linn, R. L. A1 - Reckase, M. D. N1 - Baltimore MD: The Johns Hopkins University, Department of Psychology. ER - TY - JOUR T1 - A plan for scaling the computerized adaptive Armed Services Vocational Aptitude Battery JF - Journal of Educational Measurement Y1 - 1984 A1 - Green, B. F. A1 - Bock, B. D., A1 - Linn, R. L. A1 - Lord, F. M., A1 - Reckase, M. D. VL - 21 ER - TY - JOUR T1 - Technical guidelines for assessing computerized adaptive tests JF - Journal of Educational Measurement Y1 - 1984 A1 - Green, B. F. A1 - Bock, R. D. A1 - Humphreys, L. G. A1 - Linn, R. L. A1 - Reckase, M. D. KW - computerized adaptive testing KW - Mode effects KW - paper-and-pencil VL - 21 SN - 1745-3984 ER - TY - CHAP T1 - Adaptive testing by computer Y1 - 1983 A1 - Green, B. F. CY - R. B. Ekstrom (ed.), Measurement, technology, and individuality in education. New directions for testing and measurement, Number 17. San Francisco: Jossey-Bass. ER - TY - JOUR T1 - An application of computerized adaptive testing in U. S. Army recruiting. JF - Journal of Computer-Based Instruction Y1 - 1983 A1 - Sands, W. A. A1 - Gade, P. A. VL - 10 ER - TY - CHAP T1 - The promise of tailored tests Y1 - 1983 A1 - Green, B. F. CY - H. Wainer and S. Messick (Eds.). Principals of modern psychological measurement (pp. 69-80). Hillsdale NJ: Erlbaum. ER - TY - CONF T1 - Assessing mathematics achievement with a tailored testing program T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1982 A1 - Garrison, W. M. A1 - Baumgarten, B. S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York N1 - #GA82-01 ER - TY - BOOK T1 - A comparative evaluation of two Bayesian adaptive ability estimation procedures Y1 - 1980 A1 - Gorman, S. CY - Unpublished doctoral dissertation, the Catholic University of America ER - TY - THES T1 - A comparative evaluation of two Bayesian adaptive ability estimation procedures with a conventional test strategy Y1 - 1980 A1 - Gorman, S. PB - Catholic University of America CY - Washington DC VL - Ph.D. ER - TY - CHAP T1 - A comparison of the accuracy of Bayesian adaptive and static tests using a correction for regression Y1 - 1980 A1 - Gorman, S. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 35-50). Minneapolis MN: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory. N1 - {PDF file, 735 KB} ER - TY - ABST T1 - Effects of computerized adaptive testing on Black and White students (Research Report 79-2) Y1 - 1980 A1 - Pine, S. M. A1 - Church, A. T. A1 - Gialluca, K. A. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 2.323 MB} ER - TY - RPRT T1 - Efficiency of an adaptive inter-subtest branching strategy in the measurement of classroom achievement (Research Report 79-6) Y1 - 1979 A1 - Gialluca, K. A. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - JOUR T1 - Computer-assisted tailored testing: Examinee reactions and evaluation JF - Educational and Psychological Measurement Y1 - 1978 A1 - Schmidt, F. L. A1 - Urry, V. W. A1 - Gugel, J. F. VL - 38 ER - TY - ABST T1 - Computer-assisted tailored testing: Examinee reactions and evaluation (PB-276 748) Y1 - 1977 A1 - Schmidt, F. L. A1 - Urry, V. W. A1 - Gugel, J. F. CY - Washington DC: U. S. Civil Service Commission, Personnel Research and Development Center. N1 - #SC77-01 ER - TY - CHAP T1 - Computerized Adaptive Testing with a Military Population Y1 - 1977 A1 - Gorman, S. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolls MN: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - An information comparison of conventional and adaptive tests in the measurement of classroom achievement (Research Report 77-7) Y1 - 1977 A1 - Bejar, I. I. A1 - Weiss, D. J. A1 - Gialluca, K. A. CY - Minneapolis: Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - A two-stage testing procedure (Memorandum 403-77) Y1 - 1977 A1 - de Gruijter, D. N. M. CY - University of Leyden, The Netherlands, Educational Research Center ER - TY - CHAP T1 - Discussion Y1 - 1976 A1 - Green, B. F. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. pp. 118-119). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 347 KB} ER - TY - CHAP T1 - Effectiveness of the ancillary estimation procedure Y1 - 1976 A1 - Gugel, J. F. A1 - Schmidt, F. L. A1 - Urry, V. W. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 103-106). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 252 KB} ER - TY - CHAP T1 - Opening remarks Y1 - 1976 A1 - Gorham, W. A. CY - W. H. Gorham (Chair), Computers and testing: Steps toward the inevitable conquest (PS 76-1). Symposium presented at the 83rd annual convention of the APA, Chicago IL. Washington DC: U.S. Civil Service Commission, Personnel Research and Developement Center ER - TY - JOUR T1 - Individual intelligence testing without the examiner: reliability of an automated method JF - Journal of Consulting and Clinical Psychology Y1 - 1972 A1 - Elwood, D. L. A1 - Griffin, H.R. VL - 38 ER - TY - CHAP T1 - Comments on tailored testing Y1 - 1970 A1 - Green, B. F. CY - W. H. Holtzman, (Ed.), Computer-assisted instruction, testing, and guidance (pp. 184-197). New York: Harper and Row. ER - TY - JOUR T1 - Adaptive testing in an older population JF - Journal of Psychology Y1 - 1965 A1 - Greenwood, D. I. A1 - Taylor, C. VL - 60 ER -