TY - JOUR T1 - Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank JF - Assessment Y1 - In Press A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Philip Spinhoven A1 - Edwin de Beurs AB - We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. UR - https://doi.org/10.1177/1073191117746742 ER - TY - JOUR T1 - Measurement efficiency for fixed-precision multidimensional computerized adaptive tests: Comparing health measurement and educational testing using example banks JF - Applied Psychological Measurement Y1 - In Press A1 - Paap, Muirne C. S. A1 - Born, Sebastian A1 - Braeken, Johan ER - TY - JOUR T1 - Optimizing cognitive ability measurement with multidimensional computer adaptive testing JF - International Journal of Testing Y1 - In Press A1 - Makransky, G. A1 - Glas, C. A. W. ER - TY - JOUR T1 - The Influence of Computerized Adaptive Testing on Psychometric Theory and Practice JF - Journal of Computerized Adaptive Testing Y1 - 2024 A1 - Reckase, Mark D. KW - computerized adaptive testing KW - Item Response Theory KW - paradigm shift KW - scaling theory KW - test design AB -

The major premise of this article is that part of the stimulus for the evolution of psychometric theory since the 1950s was the introduction of the concept of computerized adaptive testing (CAT) or its earlier non-CAT variations. The conceptual underpinnings of CAT that had the most influence on psychometric theory was the shift of emphasis from the test (or test score) as the focus of analysis to the test item (or item score). The change in focus allowed a change in the way that test results are conceived of as measurements. It also resolved the conflict among a number of ideas that were present in the early work on psychometric theory. Some of the conflicting ideas are summarized below to show how work on the development of CAT resolved some of those conflicts.

 

VL - 11 UR - https://jcatpub.net/index.php/jcat/issue/view/34/9 IS - 1 ER - TY - JOUR T1 - Expanding the Meaning of Adaptive Testing to Enhance Validity JF - Journal of Computerized Adaptive Testing Y1 - 2023 A1 - Steven L. Wise KW - Adaptive Testing KW - CAT KW - CBT KW - test-taking disengagement KW - validity VL - 10 IS - 2 ER - TY - JOUR T1 - An Extended Taxonomy of Variants of Computerized Adaptive Testing JF - Journal of Computerized Adaptive Testing Y1 - 2023 A1 - Roy Levy A1 - John T. Behrens A1 - Robert J. Mislevy KW - Adaptive Testing KW - evidence-centered design KW - Item Response Theory KW - knowledge-based model construction KW - missingness VL - 10 IS - 1 ER - TY - JOUR T1 - How Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change? JF - Journal of Computerized Adaptive Testing Y1 - 2023 A1 - Ming Him Tai A1 - Allison W. Cooperman A1 - Joseph N. DeWeese A1 - David J. Weiss KW - adaptive measurement of change KW - computerized adaptive testing KW - longitudinal measurement KW - trait change patterns VL - 10 IS - 3 ER - TY - JOUR T1 - Improving Precision of CAT Measures JF - Journal of Computerized Adaptive Test-ing Y1 - 2022 A1 - John J. Barnard KW - : dichotomously scored items KW - option probability theory KW - scoring methods KW - subjective probability VL - 9 IS - 1 ER - TY - JOUR T1 - The (non)Impact of Misfitting Items in Computerized Adaptive Testing JF - Journal of Computerized Adaptive Testing Y1 - 2022 A1 - Christine E. DeMars KW - computerized adaptive testing KW - item fit KW - three-parameter logistic model VL - 9 UR - https://jcatpub.net/index.php/jcat/issue/view/26 IS - 2 ER - TY - JOUR T1 - A Blocked-CAT Procedure for CD-CAT JF - Applied Psychological Measurement Y1 - 2020 A1 - Mehmet Kaplan A1 - Jimmy de la Torre AB - This article introduces a blocked-design procedure for cognitive diagnosis computerized adaptive testing (CD-CAT), which allows examinees to review items and change their answers during test administration. Four blocking versions of the new procedure were proposed. In addition, the impact of several factors, namely, item quality, generating model, block size, and test length, on the classification rates was investigated. Three popular item selection indices in CD-CAT were used and their efficiency compared using the new procedure. An additional study was carried out to examine the potential benefit of item review. The results showed that the new procedure is promising in that allowing item review resulted only in a small loss in attribute classification accuracy under some conditions. Moreover, using a blocked-design CD-CAT is beneficial to the extent that it alleviates the negative impact of test anxiety on examinees’ true performance. VL - 44 UR - https://doi.org/10.1177/0146621619835500 ER - TY - JOUR T1 - Computerized adaptive testing to screen children for emotional and behavioral problems by preventive child healthcare JF - BMC Pediatrics Y1 - 2020 A1 - Theunissen, Meninou H.C. A1 - de Wolff, Marianne S. A1 - Deurloo, Jacqueline A. A1 - Vogels, Anton G. C. AB -

Background

Questionnaires to detect emotional and behavioral problems (EBP) in Preventive Child Healthcare (PCH) should be short which potentially affects validity and reliability. Simulation studies have shown that Computerized Adaptive Testing (CAT) could overcome these weaknesses. We studied the applicability (using the measures participation rate, satisfaction, and efficiency) and the validity of CAT in routine PCH practice.

Methods

We analyzed data on 461 children aged 10–11 years (response 41%), who were assessed during routine well-child examinations by PCH professionals. Before the visit, parents completed the CAT and the Child Behavior Checklist (CBCL). Satisfaction was measured by parent- and PCH professional-report. Efficiency of the CAT procedure was measured as number of items needed to assess whether a child has serious problems or not. Its validity was assessed using the CBCL as the criterion.

Results

Parents and PCH professionals rated the CAT on average as good. The procedure required at average 16 items to assess whether a child has serious problems or not. Agreement of scores on the CAT scales with corresponding CBCL scales was high (range of Spearman correlations 0.59–0.72). Area Under Curves (AUC) were high (range: 0.95–0.97) for the Psycat total, externalizing, and hyperactivity scales using corresponding CBCL scale scores as criterion. For the Psycat internalizing scale the AUC was somewhat lower but still high (0.86).

Conclusions

CAT is a valid procedure for the identification of emotional and behavioral problems in children aged 10–11 years. It may support the efficient and accurate identification of children with overall, and potentially also specific, emotional and behavioral problems in routine PCH.

VL - 20 UR - https://bmcpediatr.biomedcentral.com/articles/10.1186/s12887-020-2018-1 IS - Article number: 119 ER - TY - JOUR T1 - A Dynamic Stratification Method for Improving Trait Estimation in Computerized Adaptive Testing Under Item Exposure Control JF - Applied Psychological Measurement Y1 - 2020 A1 - Jyun-Hong Chen A1 - Hsiu-Yi Chao A1 - Shu-Ying Chen AB - When computerized adaptive testing (CAT) is under stringent item exposure control, the precision of trait estimation will substantially decrease. A new item selection method, the dynamic Stratification method based on Dominance Curves (SDC), which is aimed at improving trait estimation, is proposed to mitigate this problem. The objective function of the SDC in item selection is to maximize the sum of test information for all examinees rather than maximizing item information for individual examinees at a single-item administration, as in conventional CAT. To achieve this objective, the SDC uses dominance curves to stratify an item pool into strata with the number being equal to the test length to precisely and accurately increase the quality of the administered items as the test progresses, reducing the likelihood that a high-discrimination item will be administered to an examinee whose ability is not close to the item difficulty. Furthermore, the SDC incorporates a dynamic process for on-the-fly item–stratum adjustment to optimize the use of quality items. Simulation studies were conducted to investigate the performance of the SDC in CAT under item exposure control at different levels of severity. According to the results, the SDC can efficiently improve trait estimation in CAT through greater precision and more accurate trait estimation than those generated by other methods (e.g., the maximum Fisher information method) in most conditions. VL - 44 UR - https://doi.org/10.1177/0146621619843820 ER - TY - JOUR T1 - Framework for Developing Multistage Testing With Intersectional Routing for Short-Length Tests JF - Applied Psychological Measurement Y1 - 2020 A1 - Kyung (Chris) T. Han AB - Multistage testing (MST) has many practical advantages over typical item-level computerized adaptive testing (CAT), but there is a substantial tradeoff when using MST because of its reduced level of adaptability. In typical MST, the first stage almost always performs as a routing stage in which all test takers see a linear test form. If multiple test sections measure different but moderately or highly correlated traits, then a score estimate for one section might be capable of adaptively selecting item modules for following sections without having to administer routing stages repeatedly for each section. In this article, a new framework for developing MST with intersectional routing (ISR) was proposed and evaluated under several research conditions with different MST structures, section score distributions and relationships, and types of regression models for ISR. The overall findings of the study suggested that MST with ISR approach could improve measurement efficiency and test optimality especially with tests with short lengths. VL - 44 UR - https://doi.org/10.1177/0146621619837226 ER - TY - JOUR T1 - Item Calibration Methods With Multiple Subscale Multistage Testing JF - Journal of Educational Measurement Y1 - 2020 A1 - Wang, Chun A1 - Chen, Ping A1 - Jiang, Shengyu KW - EM KW - marginal maximum likelihood KW - missing data KW - multistage testing AB - Abstract Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait (θ) estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to calibrate items using the incomplete data from MST design. Further complication arises when there are multiple correlated subscales per test, and when items from different subscales need to be calibrated according to their respective score reporting metric. The current calibration-per-subscale method produced biased item parameters, and there is no available method for resolving the challenge. Deriving from the missing data principle, we showed when calibrating all items together the Rubin's ignorability assumption is satisfied such that the traditional single-group calibration is sufficient. When calibrating items per subscale, we proposed a simple modification to the current calibration-per-subscale method that helps reinstate the missing-at-random assumption and therefore corrects for the estimation bias that is otherwise existent. Three mainstream calibration methods are discussed in the context of MST, they are the marginal maximum likelihood estimation, the expectation maximization method, and the fixed parameter calibration. An extensive simulation study is conducted and a real data example from NAEP is analyzed to provide convincing empirical evidence. VL - 57 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12241 ER - TY - JOUR T1 - Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items JF - Journal of Educational Measurement Y1 - 2020 A1 - Chen, Chia-Wen A1 - Wang, Wen-Chung A1 - Chiu, Ming Ming A1 - Ro, Sage AB - Abstract The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within-person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson-Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision. VL - 57 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12252 ER - TY - JOUR T1 - Multidimensional Test Assembly Using Mixed-Integer Linear Programming: An Application of Kullback–Leibler Information JF - Applied Psychological Measurement Y1 - 2020 A1 - Dries Debeer A1 - Peter W. van Rijn A1 - Usama S. Ali AB - Many educational testing programs require different test forms with minimal or no item overlap. At the same time, the test forms should be parallel in terms of their statistical and content-related properties. A well-established method to assemble parallel test forms is to apply combinatorial optimization using mixed-integer linear programming (MILP). Using this approach, in the unidimensional case, Fisher information (FI) is commonly used as the statistical target to obtain parallelism. In the multidimensional case, however, FI is a multidimensional matrix, which complicates its use as a statistical target. Previous research addressing this problem focused on item selection criteria for multidimensional computerized adaptive testing (MCAT). Yet these selection criteria are not directly transferable to the assembly of linear parallel test forms. To bridge this gap the authors derive different statistical targets, based on either FI or the Kullback–Leibler (KL) divergence, that can be applied in MILP models to assemble multidimensional parallel test forms. Using simulated item pools and an item pool based on empirical items, the proposed statistical targets are compared and evaluated. Promising results with respect to the KL-based statistical targets are presented and discussed. VL - 44 UR - https://doi.org/10.1177/0146621619827586 ER - TY - JOUR T1 - New Efficient and Practicable Adaptive Designs for Calibrating Items Online JF - Applied Psychological Measurement Y1 - 2020 A1 - Yinhong He A1 - Ping Chen A1 - Yong Li AB - When calibrating new items online, it is practicable to first compare all new items according to some criterion and then assign the most suitable one to the current examinee who reaches a seeding location. The modified D-optimal design proposed by van der Linden and Ren (denoted as D-VR design) works within this practicable framework with the aim of directly optimizing the estimation of item parameters. However, the optimal design point for a given new item should be obtained by comparing all examinees in a static examinee pool. Thus, D-VR design still has room for improvement in calibration efficiency from the view of traditional optimal design. To this end, this article incorporates the idea of traditional optimal design into D-VR design and proposes a new online calibration design criterion, namely, excellence degree (ED) criterion. Four different schemes are developed to measure the information provided by the current examinee when implementing this new criterion, and four new ED designs equipped with them are put forward accordingly. Simulation studies were conducted under a variety of conditions to compare the D-VR design and the four proposed ED designs in terms of calibration efficiency. Results showed that the four ED designs outperformed D-VR design in almost all simulation conditions. VL - 44 UR - https://doi.org/10.1177/0146621618824854 ER - TY - JOUR T1 - The Optimal Item Pool Design in Multistage Computerized Adaptive Tests With the p-Optimality Method JF - Educational and Psychological Measurement Y1 - 2020 A1 - Lihong Yang A1 - Mark D. Reckase AB - The present study extended the p-optimality method to the multistage computerized adaptive test (MST) context in developing optimal item pools to support different MST panel designs under different test configurations. Using the Rasch model, simulated optimal item pools were generated with and without practical constraints of exposure control. A total number of 72 simulated optimal item pools were generated and evaluated by an overall sample and conditional sample using various statistical measures. Results showed that the optimal item pools built with the p-optimality method provide sufficient measurement accuracy under all simulated MST panel designs. Exposure control affected the item pool size, but not the item distributions and item pool characteristics. This study demonstrated that the p-optimality method can adapt to MST item pool design, facilitate the MST assembly process, and improve its scoring accuracy. VL - 80 UR - https://doi.org/10.1177/0013164419901292 ER - TY - JOUR T1 - Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2020 A1 - Jing Yang A1 - Hua-Hua Chang A1 - Jian Tao A1 - Ningzhong Shi AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback–Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst. VL - 44 UR - https://doi.org/10.1177/0146621619893783 ER - TY - JOUR T1 - Three Measures of Test Adaptation Based on Optimal Test Information JF - Journal of Computerized Adaotive Testing Y1 - 2020 A1 - G. Gage Kingsbury A1 - Steven L. Wise VL - 8 UR - http://iacat.org/jcat/index.php/jcat/article/view/80/37 IS - 1 ER - TY - JOUR T1 - Three Measures of Test Adaptation Based on Optimal Test Information JF - Journal of Computerized Adaotive Testing Y1 - 2020 A1 - G. Gage Kingsbury A1 - Steven L. Wise VL - 8 UR - http://iacat.org/jcat/index.php/jcat/article/view/80/37 IS - 1 ER - TY - JOUR T1 - Adaptive Testing With a Hierarchical Item Response Theory Model JF - Applied Psychological Measurement Y1 - 2019 A1 - Wenhao Wang A1 - Neal Kingston AB - The hierarchical item response theory (H-IRT) model is very flexible and allows a general factor and subfactors within an overall structure of two or more levels. When an H-IRT model with a large number of dimensions is used for an adaptive test, the computational burden associated with interim scoring and selection of subsequent items is heavy. An alternative approach for any high-dimension adaptive test is to reduce dimensionality for interim scoring and item selection and then revert to full dimensionality for final score reporting, thereby significantly reducing the computational burden. This study compared the accuracy and efficiency of final scoring for multidimensional, local multidimensional, and unidimensional item selection and interim scoring methods, using both simulated and real item pools. The simulation study was conducted under 10 conditions (i.e., five test lengths and two H-IRT models) with a simulated sample of 10,000 students. The study with the real item pool was conducted using item parameters from an actual 45-item adaptive test with a simulated sample of 10,000 students. Results indicate that the theta estimations provided by the local multidimensional and unidimensional item selection and interim scoring methods were relatively as accurate as the theta estimation provided by the multidimensional item selection and interim scoring method, especially during the real item pool study. In addition, the multidimensional method required the longest computation time and the unidimensional method required the shortest computation time. VL - 43 UR - https://doi.org/10.1177/0146621618765714 ER - TY - JOUR T1 - Application of Dimension Reduction to CAT Item Selection Under the Bifactor Model JF - Applied Psychological Measurement Y1 - 2019 A1 - Xiuzhen Mao A1 - Jiahui Zhang A1 - Tao Xin AB - Multidimensional computerized adaptive testing (MCAT) based on the bifactor model is suitable for tests with multidimensional bifactor measurement structures. Several item selection methods that proved to be more advantageous than the maximum Fisher information method are not practical for bifactor MCAT due to time-consuming computations resulting from high dimensionality. To make them applicable in bifactor MCAT, dimension reduction is applied to four item selection methods, which are the posterior-weighted Fisher D-optimality (PDO) and three non-Fisher information-based methods—posterior expected Kullback–Leibler information (PKL), continuous entropy (CE), and mutual information (MI). They were compared with the Bayesian D-optimality (BDO) method in terms of estimation precision. When both the general and group factors are the measurement objectives, BDO, PDO, CE, and MI perform equally well and better than PKL. When the group factors represent nuisance dimensions, MI and CE perform the best in estimating the general factor, followed by the BDO, PDO, and PKL. How the bifactor pattern and test length affect estimation accuracy was also discussed. VL - 43 UR - https://doi.org/10.1177/0146621618813086 ER - TY - JOUR T1 - Computerized Adaptive Testing for Cognitively Based Multiple-Choice Data JF - Applied Psychological Measurement Y1 - 2019 A1 - Hulya D. Yigit A1 - Miguel A. Sorrel A1 - Jimmy de la Torre AB - Cognitive diagnosis models (CDMs) are latent class models that hold great promise for providing diagnostic information about student knowledge profiles. The increasing use of computers in classrooms enhances the advantages of CDMs for more efficient diagnostic testing by using adaptive algorithms, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT). When multiple-choice items are involved, CD-CAT can be further improved by using polytomous scoring (i.e., considering the specific options students choose), instead of dichotomous scoring (i.e., marking answers as either right or wrong). In this study, the authors propose and evaluate the performance of the Jensen–Shannon divergence (JSD) index as an item selection method for the multiple-choice deterministic inputs, noisy “and” gate (MC-DINA) model. Attribute classification accuracy and item usage are evaluated under different conditions of item quality and test termination rule. The proposed approach is compared with the random selection method and an approximate approach based on dichotomized responses. The results show that under the MC-DINA model, JSD improves the attribute classification accuracy significantly by considering the information from distractors, even with a very short test length. This result has important implications in practical classroom settings as it can allow for dramatically reduced testing times, thus resulting in more targeted learning opportunities. VL - 43 UR - https://doi.org/10.1177/0146621618798665 ER - TY - JOUR T1 - Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation JF - Journal of Educational Measurement Y1 - 2019 A1 - Albano, Anthony D. A1 - Cai, Liuhan A1 - Lease, Erin M. A1 - McConnell, Scott R. AB - Abstract Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20-item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed. VL - 56 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12215 ER - TY - JOUR T1 - Developing Multistage Tests Using D-Scoring Method JF - Educational and Psychological Measurement Y1 - 2019 A1 - Kyung (Chris) T. Han A1 - Dimiter M. Dimitrov A1 - Faisal Al-Mashary AB - The D-scoring method for scoring and equating tests with binary items proposed by Dimitrov offers some of the advantages of item response theory, such as item-level difficulty information and score computation that reflects the item difficulties, while retaining the merits of classical test theory such as the simplicity of number correct score computation and relaxed requirements for model sample sizes. Because of its unique combination of those merits, the D-scoring method has seen quick adoption in the educational and psychological measurement field. Because item-level difficulty information is available with the D-scoring method and item difficulties are reflected in test scores, it conceptually makes sense to use the D-scoring method with adaptive test designs such as multistage testing (MST). In this study, we developed and compared several versions of the MST mechanism using the D-scoring approach and also proposed and implemented a new framework for conducting MST simulation under the D-scoring method. Our findings suggest that the score recovery performance under MST with D-scoring was promising, as it retained score comparability across different MST paths. We found that MST using the D-scoring method can achieve improvements in measurement precision and efficiency over linear-based tests that use D-scoring method. VL - 79 UR - https://doi.org/10.1177/0013164419841428 ER - TY - JOUR T1 - Efficiency of Targeted Multistage Calibration Designs Under Practical Constraints: A Simulation Study JF - Journal of Educational Measurement Y1 - 2019 A1 - Berger, Stéphanie A1 - Verschoor, Angela J. A1 - Eggen, Theo J. H. M. A1 - Moser, Urs AB - Abstract Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that considers ability-related background variables and performance for assigning students to suitable items. Furthermore, we investigated whether uncertainty about item difficulty could impair the assembling of efficient designs. The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Limited knowledge about item difficulty reduced the efficiency of one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust. VL - 56 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12203 ER - TY - JOUR T1 - How Adaptive Is an Adaptive Test: Are All Adaptive Tests Adaptive? JF - Journal of Computerized Adaptive Testing Y1 - 2019 A1 - Mark Reckase A1 - Unhee Ju A1 - Sewon Kim KW - computerized adaptive test KW - multistage test KW - statistical indicators of amount of adaptation VL - 7 UR - http://iacat.org/jcat/index.php/jcat/article/view/69/34 IS - 1 ER - TY - JOUR T1 - Imputation Methods to Deal With Missing Responses in Computerized Adaptive Multistage Testing JF - Educational and Psychological Measurement Y1 - 2019 A1 - Dee Duygu Cetin-Berber A1 - Halil Ibrahim Sari A1 - Anne Corinne Huggins-Manley AB - Routing examinees to modules based on their ability level is a very important aspect in computerized adaptive multistage testing. However, the presence of missing responses may complicate estimation of examinee ability, which may result in misrouting of individuals. Therefore, missing responses should be handled carefully. This study investigated multiple missing data methods in computerized adaptive multistage testing, including two imputation techniques, the use of full information maximum likelihood and the use of scoring missing data as incorrect. These methods were examined under the missing completely at random, missing at random, and missing not at random frameworks, as well as other testing conditions. Comparisons were made to baseline conditions where no missing data were present. The results showed that imputation and the full information maximum likelihood methods outperformed incorrect scoring methods in terms of average bias, average root mean square error, and correlation between estimated and true thetas. VL - 79 UR - https://doi.org/10.1177/0013164418805532 ER - TY - JOUR T1 - An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model JF - Applied Psychological Measurement Y1 - 2019 A1 - Audrey J. Leroux A1 - J. Kay Waid-Ebbs A1 - Pey-Shan Wen A1 - Drew A. Helmer A1 - David P. Graham A1 - Maureen K. O’Connor A1 - Kathleen Ray AB - The purpose of this simulation study was to investigate the effect of several different item exposure control procedures in computerized adaptive testing (CAT) with variable-length stopping rules using the partial credit model. Previous simulation studies on CAT exposure control methods with polytomous items rarely considered variable-length tests. The four exposure control techniques examined were the randomesque with a group of three items, randomesque with a group of six items, progressive-restricted standard error (PR-SE), and no exposure control. The two variable-length stopping rules included were the SE and predicted standard error reduction (PSER), along with three item pools of varied sizes (43, 86, and 172 items). Descriptive statistics on number of nonconvergent cases, measurement precision, testing burden, item overlap, item exposure, and pool utilization were calculated. Results revealed that the PSER stopping rule administered fewer items on average while maintaining measurement precision similar to the SE stopping rule across the different item pool sizes and exposure controls. The PR-SE exposure control procedure surpassed the randomesque methods by further reducing test overlap, maintaining maximum exposure rates at the target rate or lower, and utilizing all items from the pool with a minimal increase in number of items administered and nonconvergent cases. VL - 43 UR - https://doi.org/10.1177/0146621618824856 ER - TY - JOUR T1 - Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2019 A1 - Chuan-Ju Lin A1 - Hua-Hua Chang AB - For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. VL - 79 UR - https://doi.org/10.1177/0013164418790634 ER - TY - JOUR T1 - Measurement Efficiency for Fixed-Precision Multidimensional Computerized Adaptive Tests: Comparing Health Measurement and Educational Testing Using Example Banks JF - Applied Psychological Measurement Y1 - 2019 A1 - Muirne C. S. Paap A1 - Sebastian Born A1 - Johan Braeken AB - It is currently not entirely clear to what degree the research on multidimensional computerized adaptive testing (CAT) conducted in the field of educational testing can be generalized to fields such as health assessment, where CAT design factors differ considerably from those typically used in educational testing. In this study, the impact of a number of important design factors on CAT performance is systematically evaluated, using realistic example item banks for two main scenarios: health assessment (polytomous items, small to medium item bank sizes, high discrimination parameters) and educational testing (dichotomous items, large item banks, small- to medium-sized discrimination parameters). Measurement efficiency is evaluated for both between-item multidimensional CATs and separate unidimensional CATs for each latent dimension. In this study, we focus on fixed-precision (variable-length) CATs because it is both feasible and desirable in health settings, but so far most research regarding CAT has focused on fixed-length testing. This study shows that the benefits associated with fixed-precision multidimensional CAT hold under a wide variety of circumstances. VL - 43 UR - https://doi.org/10.1177/0146621618765719 ER - TY - JOUR T1 - Multidimensional Computerized Adaptive Testing Using Non-Compensatory Item Response Theory Models JF - Applied Psychological Measurement Y1 - 2019 A1 - Chia-Ling Hsu A1 - Wen-Chung Wang AB - Current use of multidimensional computerized adaptive testing (MCAT) has been developed in conjunction with compensatory multidimensional item response theory (MIRT) models rather than with non-compensatory ones. In recognition of the usefulness of MCAT and the complications associated with non-compensatory data, this study aimed to develop MCAT algorithms using non-compensatory MIRT models and to evaluate their performance. For the purpose of the study, three item selection methods were adapted and compared, namely, the Fisher information method, the mutual information method, and the Kullback–Leibler information method. The results of a series of simulations showed that the Fisher information and mutual information methods performed similarly, and both outperformed the Kullback–Leibler information method. In addition, it was found that the more stringent the termination criterion and the higher the correlation between the latent traits, the higher the resulting measurement precision and test reliability. Test reliability was very similar across the dimensions, regardless of the correlation between the latent traits and termination criterion. On average, the difficulties of the administered items were found to be at a lower level than the examinees’ abilities, which shed light on item bank construction for non-compensatory items. VL - 43 UR - https://doi.org/10.1177/0146621618800280 ER - TY - JOUR T1 - Nonparametric CAT for CD in Educational Settings With Small Samples JF - Applied Psychological Measurement Y1 - 2019 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small. VL - 43 UR - https://doi.org/10.1177/0146621618813113 ER - TY - JOUR T1 - Routing Strategies and Optimizing Design for Multistage Testing in International Large-Scale Assessments JF - Journal of Educational Measurement Y1 - 2019 A1 - Svetina, Dubravka A1 - Liaw, Yuan-Ling A1 - Rutkowski, Leslie A1 - Rutkowski, David AB - Abstract This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s) and whether routing choices (optimal versus suboptimal routing) have an impact on achievement precision. Additionally, we examine the impact of testlet length on both person and item recovery. Overall, our results suggest that no single approach works best across the studied conditions. With respect to the mean person parameter recovery, IRT scoring (via either Fisher information or preliminary EAP estimates) outperformed classical NC methods, although differences in bias and root mean squared error were generally small. Item exposure rates were found to be more evenly distributed when suboptimal routing methods were used, and item recovery (both difficulty and discrimination) was most precisely observed for items with moderate difficulties. Based on the results of the simulation study, we draw conclusions and discuss implications for practice in the context of international large-scale assessments that recently introduced adaptive assessment in the form of MST. Future research directions are also discussed. VL - 56 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12206 ER - TY - JOUR T1 - Time-Efficient Adaptive Measurement of Change JF - Journal of Computerized Adaptive Testing Y1 - 2019 A1 - Matthew Finkelman A1 - Chun Wang KW - adaptive measurement of change KW - computerized adaptive testing KW - Fisher information KW - item selection KW - response-time modeling AB -

The adaptive measurement of change (AMC) refers to the use of computerized adaptive testing (CAT) at multiple occasions to efficiently assess a respondent’s improvement, decline, or sameness from occasion to occasion. Whereas previous AMC research focused on administering the most informative item to a respondent at each stage of testing, the current research proposes the use of Fisher information per time unit as an item selection procedure for AMC. The latter procedure incorporates not only the amount of information provided by a given item but also the expected amount of time required to complete it. In a simulation study, the use of Fisher information per time unit item selection resulted in a lower false positive rate in the majority of conditions studied, and a higher true positive rate in all conditions studied, compared to item selection via Fisher information without accounting for the expected time taken. Future directions of research are suggested.

VL - 7 UR - http://iacat.org/jcat/index.php/jcat/article/view/73/35 IS - 2 ER - TY - JOUR T1 - Adaptive Item Selection Under Matroid Constraints JF - Journal of Computerized Adaptive Testing Y1 - 2018 A1 - Daniel Bengs A1 - Ulf Brefeld A1 - Ulf Kröhne VL - 6 UR - http://www.iacat.org/jcat/index.php/jcat/article/view/64/32 IS - 2 ER - TY - JOUR T1 - A Comparison of Constraint Programming and Mixed-Integer Programming for Automated Test-Form Generation JF - Journal of Educational Measurement Y1 - 2018 A1 - Li, Jie A1 - van der Linden, Wim J. AB - Abstract The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been proposed to automate it. The goal of this article is to show how constraint programming (CP) can be used as an alternative to automate test-form generation problems with a large variety of formatting constraints, and how it compares with MIP-based form generation as for its models, solutions, and running times. Two empirical examples are presented: (i) automated generation of a computerized fixed-form; and (ii) automated generation of shadow tests for multistage testing. Both examples show that CP works well with feasible solutions and running times likely to be better than that for MIP-based applications. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12187 ER - TY - JOUR T1 - Constructing Shadow Tests in Variable-Length Adaptive Testing JF - Applied Psychological Measurement Y1 - 2018 A1 - Qi Diao A1 - Hao Ren AB - Imposing content constraints is very important in most operational computerized adaptive testing (CAT) programs in educational measurement. Shadow test approach to CAT (Shadow CAT) offers an elegant solution to imposing statistical and nonstatistical constraints by projecting future consequences of item selection. The original form of Shadow CAT presumes fixed test lengths. The goal of the current study was to extend Shadow CAT to tests under variable-length termination conditions and evaluate its performance relative to other content balancing approaches. The study demonstrated the feasibility of constructing Shadow CAT with variable test lengths and in operational CAT programs. The results indicated the superiority of the approach compared with other content balancing methods. VL - 42 UR - https://doi.org/10.1177/0146621617753736 ER - TY - JOUR T1 - A Continuous a-Stratification Index for Item Exposure Control in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2018 A1 - Alan Huebner A1 - Chun Wang A1 - Bridget Daly A1 - Colleen Pinkelman AB - The method of a-stratification aims to reduce item overexposure in computerized adaptive testing, as items that are administered at very high rates may threaten the validity of test scores. In existing methods of a-stratification, the item bank is partitioned into a fixed number of nonoverlapping strata according to the items’a, or discrimination, parameters. This article introduces a continuous a-stratification index which incorporates exposure control into the item selection index itself and thus eliminates the need for fixed discrete strata. The new continuous a-stratification index is compared with existing stratification methods via simulation studies in terms of ability estimation bias, mean squared error, and control of item exposure rates. VL - 42 UR - https://doi.org/10.1177/0146621618758289 ER - TY - JOUR T1 - Evaluation of a New Method for Providing Full Review Opportunities in Computerized Adaptive Testing—Computerized Adaptive Testing With Salt JF - Journal of Educational Measurement Y1 - 2018 A1 - Cui, Zhongmin A1 - Liu, Chunyan A1 - He, Yong A1 - Chen, Hanwei AB - Abstract Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees’ abilities. The practice of item review in CAT, however, is hindered by the potential danger of test-manipulation strategies. To provide review opportunities to examinees while minimizing the effect of test-manipulation strategies, researchers have proposed different algorithms to implement CAT with restricted revision options. In this article, we propose and evaluate a new method that implements CAT without any restriction on item review. In particular, we evaluate the new method in terms of the accuracy on ability estimates and the robustness against test-manipulation strategies. This study shows that the newly proposed method is promising in a win-win situation: examinees have full freedom to review and change answers, and the impacts of test-manipulation strategies are undermined. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12193 ER - TY - JOUR T1 - Factors Affecting the Classification Accuracy and Average Length of a Variable-Length Cognitive Diagnostic Computerized Test JF - Journal of Computerized Adaptive Testing Y1 - 2018 A1 - Huebner, Alan A1 - Finkelman, Matthew D. A1 - Weissman, Alexander VL - 6 UR - http://iacat.org/jcat/index.php/jcat/article/view/55/30 IS - 1 ER - TY - JOUR T1 - From Simulation to Implementation: Two CAT Case Studies JF - Practical Assessment, Research & Evaluation Y1 - 2018 A1 - John J Barnard VL - 23 UR - http://pareonline.net/getvn.asp?v=23&n=14 IS - 14 ER - TY - JOUR T1 - A Hybrid Strategy to Construct Multistage Adaptive Tests JF - Applied Psychological Measurement Y1 - 2018 A1 - Xinhui Xiong AB - How to effectively construct multistage adaptive test (MST) panels is a topic that has spurred recent advances. The most commonly used approaches for MST assembly use one of two strategies: bottom-up and top-down. The bottom-up approach splits the whole test into several modules, and each module is built first, then all modules are compiled to obtain the whole test, while the top-down approach follows the opposite direction. Both methods have their pros and cons, and sometimes neither is convenient for practitioners. This study provides an innovative hybrid strategy to build optimal MST panels efficiently most of the time. Empirical data and results by using this strategy will be provided. VL - 42 UR - https://doi.org/10.1177/0146621618762739 ER - TY - JOUR T1 - Implementing Three CATs Within Eighteen Months JF - Journal of Computerized Adaptive Testing Y1 - 2018 A1 - Christian Spoden A1 - Andreas Frey A1 - Raphael Bernhardt VL - 6 UR - http://iacat.org/jcat/index.php/jcat/article/view/70/33 IS - 3 ER - TY - JOUR T1 - Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2018 A1 - Chuan-Ju Lin A1 - Hua-Hua Chang AB - For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. UR - https://doi.org/10.1177/0013164418790634 ER - TY - JOUR T1 - Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Dongbo Tu A1 - Yuting Han A1 - Yan Cai A1 - Xuliang Gao AB - Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. VL - 42 UR - https://doi.org/10.1177/0146621618762748 ER - TY - JOUR T1 - Latent Class Analysis of Recurrent Events in Problem-Solving Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Haochen Xu A1 - Guanhua Fang A1 - Yunxiao Chen A1 - Jingchen Liu A1 - Zhiliang Ying AB - Computer-based assessment of complex problem-solving abilities is becoming more and more popular. In such an assessment, the entire problem-solving process of an examinee is recorded, providing detailed information about the individual, such as behavioral patterns, speed, and learning trajectory. The problem-solving processes are recorded in a computer log file which is a time-stamped documentation of events related to task completion. As opposed to cross-sectional response data from traditional tests, process data in log files are massive and irregularly structured, calling for effective exploratory data analysis methods. Motivated by a specific complex problem-solving item “Climate Control” in the 2012 Programme for International Student Assessment, the authors propose a latent class analysis approach to analyzing the events occurred in the problem-solving processes. The exploratory latent class analysis yields meaningful latent classes. Simulation studies are conducted to evaluate the proposed approach. VL - 42 UR - https://doi.org/10.1177/0146621617748325 ER - TY - JOUR T1 - Measuring patient-reported outcomes adaptively: Multidimensionality matters! JF - Applied Psychological Measurement Y1 - 2018 A1 - Paap, Muirne C. S. A1 - Kroeze, Karel A. A1 - Glas, C. A. W. A1 - Terwee, C. B. A1 - van der Palen, Job A1 - Veldkamp, Bernard P. ER - TY - JOUR T1 - On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis JF - Journal of Educational Measurement Y1 - 2018 A1 - Liu, Shuchang A1 - Cai, Yan A1 - Tu, Dongbo AB - Abstract This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index method (MPI), the revised maximum priority index (RMPI), the weighted deviation model (WDM), and the two revised Monte Carlo methods (R1-MC, R2-MC). Simulation results showed that on the whole the CD-OMST performs well in that it not only has acceptable attribute pattern correct classification rates but also satisfies both statistical and nonstatistical constraints; the RMPI method was generally better than the MPI method, the R2-MC method was generally better than the R1-MC method, and the two revised Monte Carlo methods performed best in terms of test security and constraint management, whereas the RMPI and WDM methods worked best in terms of measurement precision. The study is not only expected to provide information about how to combine MST and CD using an on-the-fly method and how do these assembled methods in CD-OMST perform relative to each other but also offer guidance for practitioners to assemble modules in CD-OMST with both statistical and nonstatistical constraints. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12194 ER - TY - JOUR T1 - Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes JF - Quality of Life Research Y1 - 2018 A1 - Smits, Niels A1 - Paap, Muirne C. S. A1 - Böhnke, Jan R. AB - Multidimensional item response theory and computerized adaptive testing (CAT) are increasingly used in mental health, quality of life (QoL), and patient-reported outcome measurement. Although multidimensional assessment techniques hold promises, they are more challenging in their application than unidimensional ones. The authors comment on minimal standards when developing multidimensional CATs. VL - 27 UR - https://doi.org/10.1007/s11136-018-1821-8 ER - TY - JOUR T1 - A Top-Down Approach to Designing the Computerized Adaptive Multistage Test JF - Journal of Educational Measurement Y1 - 2018 A1 - Luo, Xiao A1 - Kim, Doyoung AB - Abstract The top-down approach to designing a multistage test is relatively understudied in the literature and underused in research and practice. This study introduced a route-based top-down design approach that directly sets design parameters at the test level and utilizes the advanced automated test assembly algorithm seeking global optimality. The design process in this approach consists of five sub-processes: (1) route mapping, (2) setting objectives, (3) setting constraints, (4) routing error control, and (5) test assembly. Results from a simulation study confirmed that the assembly, measurement and routing results of the top-down design eclipsed those of the bottom-up design. Additionally, the top-down design approach provided unique insights into design decisions that could be used to refine the test. Regardless of these advantages, it is recommended applying both top-down and bottom-up approaches in a complementary manner in practice. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12174 ER - TY - JOUR T1 - Using Automatic Item Generation to Create Solutions and Rationales for Computerized Formative Testing JF - Applied Psychological Measurement Y1 - 2018 A1 - Mark J. Gierl A1 - Hollis Lai AB - Computerized testing provides many benefits to support formative assessment. However, the advent of computerized formative testing has also raised formidable new challenges, particularly in the area of item development. Large numbers of diverse, high-quality test items are required because items are continuously administered to students. Hence, hundreds of items are needed to develop the banks necessary for computerized formative testing. One promising approach that may be used to address this test development challenge is automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modeling practices are used to produce items with the aid of computer technology. The purpose of this study is to describe a new method for generating both the items and the rationales required to solve the items to produce the required feedback for computerized formative testing. The method for rationale generation is demonstrated and evaluated in the medical education domain. VL - 42 UR - https://doi.org/10.1177/0146621617726788 ER - TY - JOUR T1 - What Information Works Best?: A Comparison of Routing Methods JF - Applied Psychological Measurement Y1 - 2018 A1 - Halil Ibrahim Sari A1 - Anthony Raborn AB - There are many item selection methods proposed for computerized adaptive testing (CAT) applications. However, not all of them have been used in computerized multistage testing (ca-MST). This study uses some item selection methods as a routing method in ca-MST framework. These are maximum Fisher information (MFI), maximum likelihood weighted information (MLWI), maximum posterior weighted information (MPWI), Kullback–Leibler (KL), and posterior Kullback–Leibler (KLP). The main purpose of this study is to examine the performance of these methods when they are used as a routing method in ca-MST applications. These five information methods under four ca-MST panel designs and two test lengths (30 items and 60 items) were tested using the parameters of a real item bank. Results were evaluated with overall findings (mean bias, root mean square error, correlation between true and estimated thetas, and module exposure rates) and conditional findings (conditional absolute bias, standard error of measurement, and root mean square error). It was found that test length affected the outcomes much more than other study conditions. Under 30-item conditions, 1-3 designs outperformed other panel designs. Under 60-item conditions, 1-3-3 designs were better than other panel designs. Each routing method performed well under particular conditions; there was no clear best method in the studied conditions. The recommendations for routing methods in any particular condition were provided for researchers and practitioners as well as the limitations of these results. VL - 42 UR - https://doi.org/10.1177/0146621617752990 ER - TY - CONF T1 - Adapting Linear Models for Optimal Test Design to More Complex Test Specifications T2 - IACAT 2017 Conference Y1 - 2017 A1 - Maxim Morin KW - Complex Test Specifications KW - Linear Models KW - Optimal Test Design AB -

Combinatorial optimization (CO) has proven to be a very helpful approach for addressing test assembly issues and for providing solutions. Furthermore, CO has been applied for several test designs, including: (1) for the development of linear test forms; (2) for computerized adaptive testing and; (3) for multistage testing. In his seminal work, van der Linden (2006) laid out the basis for using linear models for simultaneously assembling exams and item pools in a variety of conditions: (1) for single tests and multiple tests; (2) with item sets, etc. However, for some testing programs, the number and complexity of test specifications can grow rapidly. Consequently, the mathematical representation of the test assembly problem goes beyond most approaches reported either in van der Linden’s book or in the majority of other publications related to test assembly. In this presentation, we extend van der Linden’s framework by including the concept of blocks for test specifications. We modify the usual mathematical notation of a test assembly problem by including this concept and we show how it can be applied to various test designs. Finally, we will demonstrate an implementation of this approach in a stand-alone software, called the ATASolver.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Adaptive Item and Feedback Selection in Personalized Learning with a Network Approach T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nikky van Buuren A1 - Hendrik Straat A1 - Theo Eggen A1 - Jean-Paul Fox KW - feedback selection KW - item selection KW - network approach KW - personalized learning AB -

Personalized learning is a term used to describe educational systems that adapt student-specific curriculum sequencing, pacing, and presentation based on their unique backgrounds, knowledge, preferences, interests, and learning goals. (Chen, 2008; Netcoh, 2016). The technological approach to personalized learning provides data-driven models to incorporate these adaptations automatically. Examples of applications include online learning systems, educational games, and revision-aid systems. In this study we introduce Bayesian networks as a methodology to implement an adaptive framework within a personalized learning environment. Existing ideas from Computerized Adaptive Testing (CAT) with Item Response Theory (IRT), where choices about content provision are based on maximizing information, are related to the goals of personalized learning environments. Personalized learning entails other goals besides efficient ability estimation by maximizing information, such as an adaptive configuration of preferences and feedback to the student. These considerations will be discussed and their application in networks will be illustrated.

Adaptivity in Personalized Learning.In standard CAT’s there is a focus on selecting items that provide maximum information about the ability of an individual at a certain point in time (Van der Linden & Glas, 2000). When learning is the main goal of testing, alternative adaptive item selection methods were explored by Eggen (2012). The adaptive choices made in personalized learning applications require additional adaptivity with respect to the following aspects; the moment of feedback, the kind of feedback, and the possibility for students to actively influence the learning process.

Bayesian Networks and Personalized Learning.Personalized learning aims at constructing a framework to incorporate all the aspects mentioned above. Therefore, the goal of this framework is not only to focus on retrieving ability estimates by choosing items on maximum information, but also to construct a framework that allows for these other factors to play a role. Plajner and Vomlel (2016) have already applied Bayesian Networks to adaptive testing, selecting items with help of entropy reduction. Almond et al. (2015) provide a reference work on Bayesian Networks in Educational Assessment. Both acknowledge the potential of the method in terms of features such as modularity options to build finer-grained models. IRT does not allow to model sub-skills very easily and to gather information on fine-grained level, due to its dependency on the assumption of generally one underlying trait. The local independence assumption in IRT implies being interested in mainly the student’s overall ability on the subject of interest. When the goal is to improve student’s learning, we are not just interested in efficiently coming to their test score on a global subject. One wants a model that is able to map educational problems and talents in detail over the whole educational program, while allowing for dependency between items. The moment in time can influence topics to be better mastered than others, and this is exactly what we can to get out of a model. The possibility to model flexible structures, estimate abilities on a very detailed level for sub-skills and to easily incorporate other variables such as feedback in Bayesian Networks makes it a very promising method for making adaptive choices in personalized learning. It is shown in this research how item and feedback selection can be performed with help of the promising Bayesian Networks. A student involvement possibility is also introduced and evaluated.

References

Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian Networks in Educational Assessment. Test. New York: Springer Science+Business Media. http://doi.org/10.1007/978-0-387-98138-3

Eggen, T.J.H.M. (2012) Computerized Adaptive Testing Item Selection in Computerized Adaptive Learning Systems. In: Eggen. TJHM & Veldkamp, BP.. (Eds). Psychometrics in Practice at RCEC. Enschede: RCEC

Netcoh, S. (2016, March). “What Do You Mean by ‘Personalized Learning?’. Croscutting Conversations in Education – Research, Reflections & Practice. Blogpost.

Plajner, M., & Vomlel, J. (2016). Student Skill Models in Adaptive Testing. In Proceedings of the Eighth International Conference on Probabilistic Graphical Models (pp. 403-414).

Van der Linden, W. J., & Glas, C. A. (2000). Computerized adaptive testing: Theory and practice. Dordrecht: Kluwer Academic Publishers.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Adaptivity in a Diagnostic Educational Test T2 - IACAT 2017 Conference Y1 - 2017 A1 - Sanneke Schouwstra KW - CAT KW - Diagnostic tests KW - Education AB -

During the past five years a diagnostic educational test for three subjects (writing Dutch, writing English and math) has been developed in the Netherlands. The test informs students and their teachers about the students’ strengths and weaknesses in such a manner that the learning process can be adjusted to their personal needs. It is a computer-based assessment for students in five different educational tracks midway secondary education that can yield diagnoses of many sub-skills. One of the main challenges at the outset of the development was to devise a way to deliver many diagnoses within a reasonably testing time. The answer to this challenge was to make the DET adaptive.

In this presentation we will discuss first how the adaptivity is shaped towards the purpose of the Diagnostic Educational Test. The adaptive design, particularly working with item blocks, will be discussed as well as the implemented adaptive rules. We will also show a simulation of different adaptive paths of students and some empirical information on the paths students took through the test

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Analysis of CAT Precision Depending on Parameters of the Item Pool T2 - IACAT 2017 Conference Y1 - 2017 A1 - Anatoly Maslak A1 - Stanislav Pozdniakov KW - CAT KW - Item parameters KW - Precision AB -

The purpose of this research project is to analyze the measurement precision of a latent variable depending on parameters of the item pool. The influence of the following factors is analyzed:

Factor A – range of variation of items in the pool. This factor varies on three levels with the following ranges in logits: a1 – [-3.0; +3.0], a2 - [-4.0; +4.0], a3 - [-5.0; +5.0].

Factor B – number of items in the pool. The factor varies on six levels with the following number of items for every factor: b1 - 128, b2 - 256, b3 – 512, b4 - 1024, b5 – 2048, b6 – 4096. The items are evenly distributed in each of the variation ranges.

Factor C – examinees’ proficiency varies at 30 levels (c1, c2, …, c30), which are evenly distributed in the range [-3.0; +3.0] logit.

The investigation was based on a simulation experiment within the framework of the theory of latent variables.

Response Y is the precision of measurement of examinees’ proficiency, which is calculated as the difference between the true levels of examinees’ proficiency and estimates obtained by means of adaptive testing. Three factor ANOVA was used for data processing.

The following results were obtained:

1. Factor A is significant. Ceteris paribus, the greater the range of variation of items in the pool, the higher the estimation precision is.

2. Factor B is significant. Ceteris paribus, the greater the number of items in the pool, the higher the estimation precision is.

3. Factor C is statistically insignificant at level α = .05. It means that the precision of estimation of examinees’ proficiency is the same within the range of their variation.

4. The only significant interaction among all interactions is AB. The significance of this interaction is explained by the fact that increasing the number of items in the pool decreases the effect of the range of variation of items in the pool. 

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/file/d/1Bwe58kOQRgCSbB8x6OdZTDK4OIm3LQI3/view?usp=drive_web ER - TY - JOUR T1 - Application of Binary Searching for Item Exposure Control in Cognitive Diagnostic Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2017 A1 - Chanjin Zheng A1 - Chun Wang AB - Cognitive diagnosis has emerged as a new generation of testing theory for educational assessment after the item response theory (IRT). One distinct feature of cognitive diagnostic models (CDMs) is that they assume the latent trait to be discrete instead of continuous as in IRT. From this perspective, cognitive diagnosis bears a close resemblance to searching problems in computer science and, similarly, item selection problem in cognitive diagnostic computerized adaptive testing (CD-CAT) can be considered as a dynamic searching problem. Previously, item selection algorithms in CD-CAT were developed from information indices in information science and attempted to achieve a balance among several objectives by assigning different weights. As a result, they suffered from low efficiency from a tug-of-war competition among multiple goals in item selection and, at the same time, put an undue responsibility of assigning the weights for these goals by trial and error on users. Based on the searching problem perspective on CD-CAT, this article adapts the binary searching algorithm, one of the most well-known searching algorithms in searching problems, to item selection in CD-CAT. The two new methods, the stratified dynamic binary searching (SDBS) algorithm for fixed-length CD-CAT and the dynamic binary searching (DBS) algorithm for variable-length CD-CAT, can achieve multiple goals without any of the aforementioned issues. The simulation studies indicate their performances are comparable or superior to the previous methods. VL - 41 UR - https://doi.org/10.1177/0146621617707509 ER - TY - JOUR T1 - ATS-PD: An Adaptive Testing System for Psychological Disorders JF - Educational and Psychological Measurement Y1 - 2017 A1 - Ivan Donadello A1 - Andrea Spoto A1 - Francesco Sambo A1 - Silvana Badaloni A1 - Umberto Granziol A1 - Giulio Vidotto AB - The clinical assessment of mental disorders can be a time-consuming and error-prone procedure, consisting of a sequence of diagnostic hypothesis formulation and testing aimed at restricting the set of plausible diagnoses for the patient. In this article, we propose a novel computerized system for the adaptive testing of psychological disorders. The proposed system combines a mathematical representation of psychological disorders, known as the “formal psychological assessment,” with an algorithm designed for the adaptive assessment of an individual’s knowledge. The assessment algorithm is extended and adapted to the new application domain. Testing the system on a real sample of 4,324 healthy individuals, screened for obsessive-compulsive disorder, we demonstrate the system’s ability to support clinical testing, both by identifying the correct critical areas for each individual and by reducing the number of posed questions with respect to a standard written questionnaire. VL - 77 UR - https://doi.org/10.1177/0013164416652188 ER - TY - CONF T1 - Bayesian Perspectives on Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Wim J. van der Linden A1 - Bingnan Jiang A1 - Hao Ren A1 - Seung W. Choi A1 - Qi Diao KW - Bayesian Perspective KW - CAT AB -

Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.

The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.

Session Video 1

Session Video 2

 

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Is CAT Suitable for Automated Speaking Test? T2 - IACAT 2017 Conference Y1 - 2017 A1 - Shingo Imai KW - Automated Speaking Test KW - CAT KW - language testing AB -

We have developed automated scoring system of Japanese speaking proficiency, namely SJ-CAT (Speaking Japanese Computerized Adaptive Test), which is operational for last few months. One of the unique features of the test is an adaptive test base on polytomous IRT.

SJ-CAT consists of two sections; Section 1 has sentence reading aloud tasks and a multiple choicereading tasks and Section 2 has sentence generation tasks and an open answer tasks. In reading aloud tasks, a test taker reads a phoneme-balanced sentence on the screen after listening to a model reading. In a multiple choice-reading task, a test taker sees a picture and reads aloud one sentence among three sentences on the screen, which describe the scene most appropriately. In a sentence generation task, a test taker sees a picture or watches a video clip and describes the scene with his/her own words for about ten seconds. In an open answer tasks, the test taker expresses one’s support for or opposition to e.g., a nuclear power generation with reasons for about 30 seconds.

In the course of the development of the test, we found many unexpected and unique characteristics of speaking CAT, which are not found in usual CATs with multiple choices. In this presentation, we will discuss some of such factors that are not previously noticed in our previous project of developing dichotomous J-CAT (Japanese Computerized Adaptive Test), which consists of vocabulary, grammar, reading, and listening. Firstly, we will claim that distribution of item difficulty parameters depends on the types of items. An item pool with unrestricted types of items such as open questions is difficult to achieve ideal distributions, either normal distribution or uniform distribution. Secondly, contrary to our expectations, open questions are not necessarily more difficult to operate in automated scoring system than more restricted questions such as sentence reading, as long as if one can set up suitable algorithm for open question scoring. Thirdly, we will show that the speed of convergence of standard deviation of posterior distribution, or standard error of theta parameter in polytomous IRT used for SJCAT is faster than dichotomous IRT used in J-CAT. Fourthly, we will discuss problems in equation of items in SJ-CAT, and suggest introducing deep learning with reinforcement learning instead of equation. And finally, we will discuss the issues of operation of SJ-CAT on the web, including speed of scoring, operation costs, security among others.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Comparison of Pretest Item Calibration Methods in a Computerized Adaptive Test (CAT) T2 - IACAT 2017 Conference Y1 - 2017 A1 - Huijuan Meng A1 - Chris Han KW - CAT KW - Pretest Item Calibration AB -

Calibration methods for pretest items in a computerized adaptive test (CAT) are not a new area of research inquiry. After decades of research on CAT, the fixed item parameter calibration (FIPC) method has been widely accepted and used by practitioners to address two CAT calibration issues: (a) a restricted ability range each item is exposed to, and (b) a sparse response data matrix. In FIPC, the parameters of the operational items are fixed at their original values, and multiple expectation maximization (EM) cycles are used to estimate parameters of the pretest items with prior ability distribution being updated multiple times (Ban, Hanson, Wang, Yi, & Harris, 2001; Kang & Peterson, 2009; Pommerich & Segall, 2003).

Another calibration method is the fixed person parameter calibration (FPPC) method proposed by Stocking (1988) as “Method A.” Under this approach, candidates’ ability estimates are fixed in the calibration of pretest items and they define the scale on which the parameter estimates are reported. The logic of FPPC is suitable for CAT applications because the person parameters are estimated based on operational items and available for pretest item calibration. In Stocking (1988), the FPPC was evaluated using the LOGIST computer program developed by Wood, Wingersky, and Lord (1976). He reported that “Method A” produced larger root mean square errors (RMSEs) in the middle ability range than “Method B,” which required the use of anchor items (administered non-adaptively) and linking steps to attempt to correct for the potential scale drift due to the use of imperfect ability estimates.

Since then, new commercial software tools such as BILOG-MG and flexMIRT (Cai, 2013) have been developed to handle the FPPC method with different implementations (e.g., the MH-RM algorithm with flexMIRT). The performance of the FPPC method with those new software tools, however, has rarely been researched in the literature.

In our study, we evaluated the performance of two pretest item calibration methods using flexMIRT, the new software tool. The FIPC and FPPC are compared under various CAT settings. Each simulated exam contains 75% operational items and 25% pretest items, and real item parameters are used to generate the CAT data. This study also addresses the lack of guidelines in existing CAT item calibration literature regarding population ability shift and exam length (more accurate theta estimates are expected in longer exams). Thus, this study also investigates the following four factors and their impact on parameter estimation accuracy, including: (1) candidate population changes (3 ability distributions); (2) exam length (20: 15 OP + 5 PT, 40: 30 OP + 10 PT, and 60: 45 OP + 15 PT); (3) data model fit (3PL and 3PL with fixed C), and (4) pretest item calibration sample sizes (300, 500, and 1000). This study’s findings will fill the gap in this area of research and thus provide new information on which practitioners can base their decisions when selecting a pretest calibration method for their exams.

References

Ban, J. C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001). A comparative study of online pretest item—Calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38(3), 191–212.

Cai, L. (2013). flexMIRT® Flexible Multilevel Multidimensional Item Analysis and Test Scoring (Version 2) [Computer software]. Chapel Hill, NC: Vector Psychometric Group.

Kang, T., & Petersen, N. S. (2009). Linking item parameters to a base scale (Research Report No. 2009– 2). Iowa City, IA: ACT.

Pommerich, M., & Segall, D.O. (2003, April). Calibrating CAT pools and online pretest items using marginal maximum likelihood methods. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Stocking, M. L. (1988). Scale drift in online calibration (Research Report No. 88–28). Princeton, NJ: Educational Testing Service.

Wood, R. L., Wingersky, M. S., & Lord, F. M. (1976). LOGIST: A computer program for estimating examinee ability and item characteristic curve parameters (RM76-6) [Computer program]. Princeton, NJ: Educational Testing Service.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - A Comparison of Three Empirical Reliability Estimates for Computerized Adaptive Testing T2 - IACAT 2017 conference Y1 - 2017 A1 - Dong Gi Seo KW - CAT KW - Reliability AB -

Reliability estimates in Computerized Adaptive Testing (CAT) are derived from estimated thetas and standard error of estimated thetas. In practical, the observed standard error (OSE) of estimated thetas can be estimated by test information function for each examinee with respect to Item response theory (IRT). Unlike classical test theory (CTT), OSEs in IRT are conditional values given each estimated thetas so that those values should be marginalized to consider test reliability. Arithmetic mean, Harmonic mean, and Jensen equality were applied to marginalize OSEs to estimate CAT reliability. Based on different marginalization method, three empirical CAT reliabilities were compared with true reliabilities. Results showed that three empirical CAT reliabilities were underestimated compared to true reliability in short test length (< 40), whereas the magnitude of CAT reliabilities was followed by Jensen equality, Harmonic mean, and Arithmetic mean in long test length (> 40). Specifically, Jensen equality overestimated true reliability across all conditions in long test length (>50).

Session Video 

JF - IACAT 2017 conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/file/d/1gXgH-epPIWJiE0LxMHGiCAxZZAwy4dAH/view?usp=sharing ER - TY - JOUR T1 - Is a Computerized Adaptive Test More Motivating Than a Fixed-Item Test? JF - Applied Psychological Measurement Y1 - 2017 A1 - Guangming Ling A1 - Yigal Attali A1 - Bridgid Finn A1 - Elizabeth A. Stone AB - Computer adaptive tests provide important measurement advantages over traditional fixed-item tests, but research on the psychological reactions of test takers to adaptive tests is lacking. In particular, it has been suggested that test-taker engagement, and possibly test performance as a consequence, could benefit from the control that adaptive tests have on the number of test items examinees answer correctly. However, previous research on this issue found little support for this possibility. This study expands on previous research by examining this issue in the context of a mathematical ability assessment and by considering the possible effect of immediate feedback of response correctness on test engagement, test anxiety, time on task, and test performance. Middle school students completed a mathematics assessment under one of three test type conditions (fixed, adaptive, or easier adaptive) and either with or without immediate feedback about the correctness of responses. Results showed little evidence for test type effects. The easier adaptive test resulted in higher engagement and lower anxiety than either the adaptive or fixed-item tests; however, no significant differences in performance were found across test types, although performance was significantly higher across all test types when students received immediate feedback. In addition, these effects were not related to ability level, as measured by the state assessment achievement levels. The possibility that test experiences in adaptive tests may not in practice be significantly different than in fixed-item tests is raised and discussed to explain the results of this and previous studies. VL - 41 UR - https://doi.org/10.1177/0146621617707556 ER - TY - CONF T1 - Computerized Adaptive Testing for Cognitive Diagnosis in Classroom: A Nonparametric Approach T2 - IACAT 2017 Conference Y1 - 2017 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai KW - CD-CAT KW - non-parametric approach AB -

In the past decade, CDMs of educational test performance have received increasing attention among educational researchers (for details, see Fu & Li, 2007, and Rupp, Templin, & Henson, 2010). CDMs of educational test performance decompose the ability domain of a given test into specific skills, called attributes, each of which an examinee may or may not have mastered. The resulting attribute profile documents the individual’s strengths and weaknesses within the ability domain. The Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation (e.g., Cheng & Chang, 2007; Cheng, 2009; Liu, You, Wang, Ding, & Chang, 2013; Tatsuoka & Tatsuoka, 1997). While model-based CD-CAT is relatively well-researched in the context of large-scale assessments, this type of system has not received the same degree of development in small-scale settings, where it would be most useful. The main challenge is that the statistical estimation techniques successfully applied to the parametric CD-CAT require large samples to guarantee the reliable calibration of item parameters and accurate estimation of examinees’ attribute profiles. In response to the challenge, a nonparametric approach that does not require any parameter calibration, and thus can be used in small educational programs, is proposed. The proposed nonparametric CD-CAT relies on the same principle as the regular CAT algorithm, but uses the nonparametric classification method (Chiu & Douglas, 2013) to assess and update the student’s ability state while the test proceeds. Based on a student’s initial responses, 2 a neighborhood of candidate proficiency classes is identified, and items not characteristic of the chosen proficiency classes are precluded from being chosen next. The response to the next item then allows for an update of the skill profile, and the set of possible proficiency classes is further narrowed. In this manner, the nonparametric CD-CAT cycles through item administration and update stages until the most likely proficiency class has been pinpointed. The simulation results show that the proposed method outperformed the compared parametric CD-CAT algorithms and the differences were significant when the item parameter calibration was not optimal.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.

Cheng, Y., & Chang, H. (2007). The modified maximum global discrimination index method for cognitive diagnostic CAT. In D. Weiss (Ed.) Proceedings of the 2007 GMAC Computerized Adaptive Testing Conference.

Chiu, C.-Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250.

Fu, J., & Li, Y. (2007). An integrative review of cognitively diagnostic psychometric models. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, Illinois.

Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172.

Rupp, A. A., & Templin, J. L., & Henson, R. A. (2010). Diagnostic Measurement. Theory, Methods, and Applications. New York: Guilford.

Tatsuoka, K.K., & Tatsuoka, M.M. (1997), Computerized cognitive diagnostic adaptive testing: Effect on remedial instruction as empirical validation. Journal of Educational Measurement, 34, 3–20.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Concerto 5 Open Source CAT Platform: From Code to Nodes T2 - IACAT 2017 Conference Y1 - 2017 A1 - David Stillwell KW - Concerto 5 KW - Open Source CAT AB -

Concerto 5 is the newest version of the Concerto open source R-based Computer-Adaptive Testing platform, which is currently used in educational testing and in clinical trials. In our quest to make CAT accessible to all, the latest version uses flowchart nodes to connect different elements of a test, so that CAT test creation is an intuitive high-level process that does not require writing code.

A test creator might connect an Info Page node, to a Consent Page node, to a CAT node, to a Feedback node. And after uploading their items, their test is done.

This talk will show the new flowchart interface, and demonstrate the creation of a CAT test from scratch in less than 10 minutes.

Concerto 5 also includes a new Polytomous CAT node, so CATs with Likert items can be easily created in the flowchart interface. This node is currently used in depression and anxiety tests in a clinical trial.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=11eu1KKILQEoK5c-CYO1P1AiJgiQxX0E0 ER - TY - CONF T1 - Considerations in Performance Evaluations of Computerized Formative Assessments T2 - IACAT 2017 Conference Y1 - 2017 A1 - Michael Chajewski A1 - John Harnisher KW - algebra KW - Formative Assessment KW - Performance Evaluations AB -

Computerized adaptive instruments have been widely established and used in the context of summative assessments for purposes including licensure, admissions and proficiency testing. The benefits of examinee tailored examinations, which can provide estimates of performance that are more reliable and valid, have in recent years attracted a greater audience (i.e. patient oriented outcomes, test prep, etc.). Formative assessment, which are most widely understood in their implementation as diagnostic tools, have recently started to expand to lesser known areas of computerized testing such as in implementations of instructional designs aiming to maximize examinee learning through targeted practice.

Using a CAT instrument within the framework of evaluating repetitious examinee performances (in such settings as a Quiz Bank practices for example) poses unique challenges not germane to summative assessments. The scale on which item parameters (and subsequently examinee performance estimates such as Maximum Likelihood Estimates) are determined usually do not take change over time under consideration. While vertical scaling features resolve the learning acquisition problem, most content practice engines do not make use of explicit practice windows which could be vertically aligned. Alternatively, the Multidimensional (MIRT)- and Hierarchical Item Response Theory (HIRT) models allow for the specification of random effects associated with change over time in examinees’ skills, but are often complex and require content and usage resources not often observed.

The research submitted for consideration simulated examinees’ repeated variable length Quiz Bank practice in algebra using a 500 1-PL operational item pool. The stability simulations sought to determine with which rolling item interval size ability estimates would provide the most informative insight into the examinees’ learning progression over time. Estimates were evaluated in terms of reduction in estimate uncertainty, bias and RMSD with the true and total item based ability estimates. It was found that rolling item intervals between 20-25 items provided the best reduction of uncertainty around the estimate without compromising the ability to provide informed performance estimates to students. However, while asymptotically intervals of 20-25 items tended to provide adequate estimates of performance, changes over shorter periods of time assessed with shorter quizzes could not be detected as those changes would be suppressed in lieu of the performance based on the full interval considered. Implications for infrastructure (such as recommendation engines, etc.), product and scale development are discussed.

Session video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Construction of Gratitude Scale Using Polytomous Item Response Theory Model T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nurul Arbiyah KW - Gratitude Scale KW - polytomous items AB -

Various studies have shown that gratitude is essential to increase the happiness and quality of life of every individual. Unfortunately, research on gratitude still received little attention, and there is no standardized measurement for it. Gratitude measurement scale was developed overseas, and has not adapted to the Indonesian culture context. Moreover, the scale development is generally performed with classical theory approach, which has some drawbacks. This research will develop a gratitude scale using polytomous Item Response Theory model (IRT) by applying the Partial Credit Model (PCM).

The pilot study results showed that the gratitude scale (with 44 items) is a reliable measure (α = 0.944) and valid (meet both convergent and discriminant validity requirements). The pilot study results also showed that the gratitude scale satisfies unidimensionality assumptions.

The test results using the PCM model showed that the gratitude scale had a fit model. Of 44 items, there was one item that does not fit, so it was eliminated. Second test results for the remaining 43 items showed that they fit the model, and all items were fit to measure gratitude. Analysis using Differential Item Functioning (DIF) showed four items have a response bias based on gender. Thus, there are 39 items remaining in the scale.

Session Video 

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1pHhO4cq2-wh24ht3nBAoXNHv7234_mjH ER - TY - CONF T1 - Developing a CAT: An Integrated Perspective T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nathan Thompson KW - CAT Development KW - integrated approach AB -

Most resources on computerized adaptive testing (CAT) tend to focus on psychometric aspects such as mathematical formulae for item selection or ability estimation. However, development of a CAT assessment requires a holistic view of project management, financials, content development, product launch and branding, and more. This presentation will develop such a holistic view, which serves several purposes, including providing a framework for validity, estimating costs and ROI, and making better decisions regarding the psychometric aspects.

Thompson and Weiss (2011) presented a 5-step model for developing computerized adaptive tests (CATs). This model will be presented and discussed as the core of this holistic framework, then applied to real-life examples. While most CAT research focuses on developing new quantitative algorithms, this presentation is instead intended to help researchers evaluate and select algorithms that are most appropriate for their needs. It is therefore ideal for practitioners that are familiar with the basics of item response theory and CAT, and wish to explore how they might apply these methodologies to improve their assessments.

Steps include:

1. Feasibility, applicability, and planning studies

2. Develop item bank content or utilize existing bank

3. Pretest and calibrate item bank

4. Determine specifications for final CAT

5. Publish live CAT.

So, for example, Step 1 will contain simulation studies which estimate item bank requirements, which then can be used to determine costs of content development, which in turn can be integrated into an estimated project cost timeline. Such information is vital in determining if the CAT should even be developed in the first place.

References

Thompson, N. A., & Weiss, D. J. (2011). A Framework for the Development of Computerized Adaptive Tests. Practical Assessment, Research & Evaluation, 16(1). Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Jv8bpH2zkw5TqSMi03e5JJJ98QtXf-Cv ER - TY - JOUR T1 - Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank JF - Evaluation & the Health Professions Y1 - 2017 A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Edwin de Beurs AB - We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. VL - 40 UR - https://doi.org/10.1177/0163278716684168 ER - TY - CONF T1 - The Development of a Web-Based CAT in China T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chongli Liang A1 - Danjun Wang A1 - Dan Zhou A1 - Peida Zhan KW - China KW - Web-Based CAT AB -

Cognitive ability assessment has been widely used as the recruitment tool in hiring potential employees. Traditional cognitive ability tests have been encountering threats from item-exposures and long time for answering. Especially in China, campus recruitment thinks highly of short answering time and anti-cheating. Beisen, as the biggest native online assessment software provider, developed a web-based CAT for cognitive ability which assessing verbal, quantitative, logical and spatial ability in order to decrease answering times, improve assessment accuracy and reduce threats from cheating and faking in online ability test. The web-based test provides convenient testing for examinees who can access easily to the test via internet just by login the test website at any time and any place through any Internet-enabled devices (e.g., laptops, IPADs, and smart phones).

We designed the CAT following strategies of establishing item bank, setting starting point, item selection, scoring and terminating. Additionally, we pay close attention to administrating the test via web. For the CAT procedures, we employed online calibration for establishing a stable and expanding item bank, and integrated maximum Fisher information, α-stratified strategy and randomization for item selection and coping with item exposures. Fixed-length and variable-length strategies were combined in terminating the test. For fulfilling the fluid web-based testing, we employed cloud computing techniques and designed each computing process subtly. Distributed computation was used to process scoring which executes EAP and item selecting at high speed. Caching all items to the servers in advance helps shortening the process of loading items to examinees’ terminal equipment. Horizontally scalable cloud servers function coping with great concurrency. The massive computation in item selecting was conversed to searching items from an information matrix table.

We examined the average accuracy, bank usage and computing performance in the condition of laboratory and real testing. According to a test for almost 28000 examinees, we found that bank usage is averagely 50%, and that 80% tests terminate at test information of 10 and averagely at 9.6. In context of great concurrency, the testing is unhindered and the process of scoring and item selection only takes averagely 0.23s for each examiner.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - JOUR T1 - The Development of MST Test Information for the Prediction of Test Performances JF - Educational and Psychological Measurement Y1 - 2017 A1 - Ryoungsun Park A1 - Jiseon Kim A1 - Hyewon Chung A1 - Barbara G. Dodd AB - The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance. VL - 77 UR - http://dx.doi.org/10.1177/0013164416662960 ER - TY - CONF T1 - DIF-CAT: Doubly Adaptive CAT Using Subgroup Information to Improve Measurement Precision T2 - IACAT 2017 Conference Y1 - 2017 A1 - Joy Wang A1 - David J. Weiss A1 - Chun Wang KW - DIF-CAT KW - Doubly Adaptive CAT KW - Measurement Precision KW - subgroup information AB -

Differential item functioning (DIF) is usually regarded as a test fairness issue in high-stakes tests. In low-stakes tests, it is more of an accuracy problem. However, in low-stakes tests, the same method, deleting items that demonstrate significant DIF, is still employed to treat DIF items. When political concerns are not important, such as in low-stakes tests and instruments that are not used to make decisions about people, deleting items might not be optimal. Computerized adaptive testing (CAT) is more and more frequently used in low-stakes tests. The DIF-CAT method evaluated in this research is designed to cope with DIF in a CAT environment. Using this method, item parameters are separately estimated for the focal group and the reference group in a DIF study, then CATs are administered based on different sets of item parameters for the focal and reference groups.

To evaluate the performance of the DIF-CAT procedure, it was compared in a simulation study to (1) deleting all the DIF items in a CAT bank and (2) ignoring DIF. A 300-item flat item bank and a 300-item peaked item bank were simulated using the three-parameter logistic IRT model with D = 1,7. 40% of the items in each bank showed DIF. The DIF size was b and/or a = 0.5 while original b ranged from -3 to 3 and a ranged from 0.3 to 2.1. Three types of DIF were considered: (1) uniform DIF caused by differences in b, non-uniform DIF caused by differences in a, and non-uniform DIF caused by differences in both a and b. 500 normally distributed simulees in each of reference and focal groups were used in item parameter re-calibration. In the Delete DIF method, only DIF-free items were calibrated. In the Ignore DIF method, all the items were calibrated using all simulees without differentiating the groups. In the DIF-CAT method, the DIF-free items were used as anchor items to estimate the item parameters for the focal and reference groups and the item parameters from recalibration were used. All simulees used the same item parameters in the Delete method and the Ignore method. CATs for simulees within the two groups used group-specific item parameters in the DIF-CAT method. In the CAT stage, 100 simulees were generated for each of the reference and focal groups, at each of six discrete q levels ranging from -2.5 to 2.5. CAT test length was fixed at 40 items. Bias, average absolute difference, RMSE, standard error of θ estimates, and person fit, were used to compare the performance of the DIF methods. DIF item usage was also recorded for the Ignore method and the DIF-CAT method.

Generally, the DIF-CAT method outperformed both the Delete method and the Ignore method in dealing with DIF items in CAT. The Delete method, which is the most frequently used method for handling DIF, performed the worst of the three methods in a CAT environment, as reflected in multiple indices of measurement precision. Even the Ignore method, which simply left DIF items in the item bank, provided θ estimates of higher precision than the Delete method. This poor performance of the Delete method was probably due to reduction in size of the item bank available for each CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Gu4FR06qM5EZNp_Ns0Kt3HzBqWAv3LPy ER - TY - JOUR T1 - Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing JF - Journal of Educational Measurement Y1 - 2017 A1 - Kang, Hyeon-Ah A1 - Zhang, Susu A1 - Chang, Hua-Hua AB - The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen-Shannon (JS) divergence, a symmetrized version of the Kullback-Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. VL - 54 UR - http://dx.doi.org/10.1111/jedm.12139 ER - TY - CONF T1 - Efficiency of Item Selection in CD-CAT Based on Conjunctive Bayesian Network Modeling Hierarchical attributes T2 - IACAT 2017 Conference Y1 - 2017 A1 - Soo-Yun Han A1 - Yun Joo Yoo KW - CD-CAT KW - Conjuctive Bayesian Network Modeling KW - item selection AB -

Cognitive diagnosis models (CDM) aim to diagnosis examinee’s mastery status of multiple fine-grained skills. As new development for cognitive diagnosis methods emerges, much attention is given to cognitive diagnostic computerized adaptive testing (CD-CAT) as well. The topics such as item selection methods, item exposure control strategies, and online calibration methods, which have been wellstudied for traditional item response theory (IRT) based CAT, are also investigated in the context of CD-CAT (e.g., Xu, Chang, & Douglas, 2003; Wang, Chang, & Huebner, 2011; Chen et al., 2012).

In CDM framework, some researchers suggest to model structural relationship between cognitive skills, or namely, attributes. Especially, attributes can be hierarchical, such that some attributes must be acquired before the subsequent ones are mastered. For example, in mathematics, addition must be mastered before multiplication, which gives a hierarchy model for addition skill and multiplication skill. Recently, new CDMs considering attribute hierarchies have been suggested including the Attribute Hierarchy Method (AHM; Leighton, Gierl, & Hunka, 2004) and the Hierarchical Diagnostic Classification Models (HDCM; Templin & Bradshaw, 2014).

Bayesian Networks (BN), the probabilistic graphical models representing the relationship of a set of random variables using a directed acyclic graph with conditional probability distributions, also provide an efficient framework for modeling the relationship between attributes (Culbertson, 2016). Among various BNs, conjunctive Bayesian network (CBN; Beerenwinkel, Eriksson, & Sturmfels, 2007) is a special kind of BN, which assumes partial ordering between occurrences of events and conjunctive constraints between them.

In this study, we propose using CBN for modeling attribute hierarchies and discuss the advantage of CBN for CDM. We then explore the impact of the CBN modeling on the efficiency of item selection methods for CD-CAT when the attributes are truly hierarchical. To this end, two simulation studies, one for fixed-length CAT and another for variable-length CAT, are conducted. For each studies, two attribute hierarchy structures with 5 and 8 attributes are assumed. Among the various item selection methods developed for CD-CAT, six algorithms are considered: posterior-weighted Kullback-Leibler index (PWKL; Cheng, 2009), the modified PWKL index (MPWKL; Kaplan, de la Torre, Barrada, 2015), Shannon entropy (SHE; Tatsuoka, 2002), mutual information (MI; Wang, 2013), posterior-weighted CDM discrimination index (PWCDI; Zheng & Chang, 2016) and posterior-weighted attribute-level CDM discrimination index (PWACDI; Zheng & Chang, 2016). The impact of Q-matrix structure, item quality, and test termination rules on the efficiency of item selection algorithms is also investigated. Evaluation measures include the attribute classification accuracy (fixed-length experiment) and test length of CDCAT until stopping (variable-length experiment).

The results of the study indicate that the efficiency of item selection is improved by directly modeling the attribute hierarchies using CBN. The test length until achieving diagnosis probability threshold was reduced to 50-70% for CBN based CAT compared to the CD-CAT assuming independence of attributes. The magnitude of improvement is greater when the cognitive model of the test includes more attributes and when the test length is shorter. We conclude by discussing how Q-matrix structure, item quality, and test termination rules affect the efficiency.

References

Beerenwinkel, N., Eriksson, N., & Sturmfels, B. (2007). Conjunctive bayesian networks. Bernoulli, 893- 909.

Chen, P., Xin, T., Wang, C., & Chang, H. H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77(2), 201-222.

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619-632.

Culbertson, M. J. (2016). Bayesian networks in educational assessment: the state of the field. Applied Psychological Measurement, 40(1), 3-21.

Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167-188.

Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for cognitive assessment: a variation on Tatsuoka's rule‐space approach. Journal of Educational Measurement, 41(3), 205-237.

Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(3), 337-350.

Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317-339. Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73(6), 1017-1035.

Wang, C., Chang, H. H., & Huebner, A. (2011). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48(3), 255-273.

Xu, X., Chang, H., & Douglas, J. (2003, April). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago.

Zheng, C., & Chang, H. H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40(8), 608-624.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1RbO2gd4aULqsSgRi_VZudNN_edX82NeD ER - TY - CONF T1 - Efficiency of Targeted Multistage Calibration Designs under Practical Constraints: A Simulation Study T2 - IACAT 2017 Conference Y1 - 2017 A1 - Stephanie Berger A1 - Angela J. Verschoor A1 - Theo Eggen A1 - Urs Moser KW - CAT KW - Efficiency KW - Multistage Calibration AB -

Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we focused on two related research questions. First, we investigated whether the efficiency of item calibration under the Rasch model could be enhanced by calibration designs that optimize the match between item difficulty and student ability (Berger, 1991). Therefore, we introduced targeted multistage calibration designs, a design type that refers to a combination of traditional targeted calibration designs and multistage designs. As such, targeted multistage calibration designs consider ability-related background variables (e.g., grade in school), as well as performance (i.e., outcome of a preceding test stage) for assigning students to suitable items.

Second, we explored how limited a priori knowledge about item difficulty affects the efficiency of both targeted calibration designs and targeted multistage calibration designs. When arranging items within a given calibration design, test developers need to know the item difficulties to locate items optimally within the design. However, usually, no empirical information about item difficulty is available before item calibration. Owing to missing empirical data, test developers might fail to assign all items to the most suitable location within a calibration design.

Both research questions were addressed in a simulation study in which we varied the calibration design, as well as the accuracy of item distribution across the different booklets or modules within each design (i.e., number of misplaced items). The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Especially, targeted multistage calibration designs provided more accurate estimates for very easy and 52 IACAT 2017 ABSTRACTS BOOKLET very difficult items. Limited knowledge about item difficulty during test construction impaired the efficiency of all designs. The loss of efficiency was considerably large for one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust.

References

Berger, M. P. F. (1991). On the efficiency of IRT models when applied to different sampling designs. Applied Psychological Measurement, 15(3), 293–306. doi:10.1177/014662169101500310

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/file/d/1ko2LuiARKqsjL_6aupO4Pj9zgk6p_xhd/view?usp=sharing ER - TY - CONF T1 - An Empirical Simulation Study Using mstR for MST Designs T2 - IACAT 2017 Conference Y1 - 2017 A1 - Soo Lee KW - mstR KW - multistage testing AB -

Unlike other systems of adaptive testing, multistage testing (MST) provides many benefits of adaptive testing and linear testing, and has become the most sought-after form for computerized testing in educational assessment recently. It is greatly fit for testing educational achievement and can be adapted to practical educational surveys testing. However, there are many practical considerations for MST design for operational implementations including costs and benefits. As a practitioner, we need to start with various simulations to evaluate the various MST designs and their performances before the implementations. A recently developed statistical tool mstR, an open source R package, was released to support the researchers and practitioners to aid their MST simulations for implementations.

Conventional MST design has three stages of module (i.e., 1-2-3 design) structure. Alternatively, the composition of modules diverges from one design to another (e.g., 1-3 design). For advance planning of equivalence studies, this paper utilizes both 1-2-3 design and 1-3 design for the MST structures. In order to study the broad structure of these values, this paper evaluates the different MST designs through simulations using the R package mstR. The empirical simulation study provides an introductory overview of mstR and describes what mstR offers using different MST structures from 2PL item bank. Further comparisons will show the advantages of the different MST designs (e.g., 1-2-3 design and 1-3 design) for different practical implementations.

As an open-source statistical environment R, mstR provides a great simulation tool and allows psychologists, social scientists, and educational measurement scientists to apply it to innovative future assessments in the operational use of MST.

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Evaluation of Parameter Recovery, Drift, and DIF with CAT Data T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nathan Thompson A1 - Jordan Stoeger KW - CAT KW - DIF KW - Parameter Drift KW - Parameter Recovery AB -

Parameter drift and differential item functioning (DIF) analyses are frequent components of a test maintenance plan. That is, after a test form(s) is published, organizations will often calibrate postpublishing data at a later date to evaluate whether the performance of the items or the test has changed over time. For example, if item content is leaked, the items might gradually become easier over time, and item statistics or parameters can reflect this.

When tests are published under a computerized adaptive testing (CAT) paradigm, they are nearly always calibrated with item response theory (IRT). IRT calibrations assume that range restriction is not an issue – that is, each item is administered to a range of examinee ability. CAT data violates this assumption. However, some organizations still wish to evaluate continuing performance of the items from a DIF or drift paradigm.

This presentation will evaluate just how inaccurate DIF and drift analyses might be on CAT data, using a Monte Carlo parameter recovery methodology. Known item parameters will be used to generate both linear and CAT data sets, which are then calibrated for DIF and drift. In addition, we will implement Randomesque item exposure constraints in some CAT conditions, as this randomization directly alleviates the range restriction problem somewhat, but it is an empirical question as to whether this improves the parameter recovery calibrations.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1F7HCZWD28Q97sCKFIJB0Yps0H66NPeKq ER - TY - CONF T1 - FastCAT – Customizing CAT Administration Rules to Increase Response Efficiency T2 - IACAT 2017 Conference Y1 - 2017 A1 - Richard C. Gershon KW - Administration Rules KW - Efficiency KW - FastCAT AB -

A typical pre-requisite for CAT administration is the existence of an underlying item bank completely covering the range of the trait being measured. When a bank fails to cover the full range of the trait, examinees who are close to the floor or ceiling will often never achieve a standard error cut-off and examinees will be forced to answer items increasingly less relevant to their trait level. This scenario is fairly typical for many patients responding to patient reported outcome measures (PROMS). For IACAT 2017 ABSTRACTS BOOKLET 65 example, in the assessment of physical functioning, many item banks ceiling at about the 50%ile. For most healthy patients, after a few items the only items remaining in the bank will represent decreasing ability (even though the patient has already indicated that they are at or above the mean for the population). Another example would be for a patient with no pain taking a Pain CAT. They will probably answer “Never” pain for every succeeding item out to the maximum test length. For this project we sought to reduce patient burden, while maintaining test accuracy, through the reduction of CAT length using novel stopping rules.

We studied CAT administration assessment histories for patients who were administered Patient Reported Outcomes Measurement Information System (PROMIS) CATs. In the PROMIS 1 Wave 2 Back Pain/Depression Study, CATs were administered to N=417 cases assessed across 11 PROMIS domains. Original CAT administration rules were: start with a pre-identified item of moderate difficulty; administer a minimum four items per case; stop when an estimated theta’s SE declines to < 0.3 OR a maximum 12 items are administered.

Original CAT. 12,622 CAT administrations were analyzed. CATs ranged in number of items administered from 4 to 12 items; 72.5% were 4-item CATs. The second and third most frequently occurring CATs were 5-item (n=1102; 8.7%) and 12-item CATs (n=964; 7.6%). 64,062 items total were administered, averaging 5.1 items per CAT. Customized CAT. Three new CAT stopping rules were introduced, each with potential to increase item-presentation efficiency and maintain required score precision: Stop if a case responds to the first two items administered using an “extreme” response category (towards the ceiling or floor for the in item bank, or at ); administer a minimum two items per case; stop if the change in SE estimate (previous to current item administration) is positive but < 0.01.

The three new stopping rules reduced the total number of items administered by 25,643 to 38,419 items (40.0% reduction). After four items were administered, only n=1,824 CATs (14.5%) were still in assessment mode (vs. n=3,477 (27.5%) in the original CATs). On average, cases completed 3.0 items per CAT (vs. 5.1).

Each new rule addressed specific inefficiencies in the original CAT administration process: Cases not having or possessing a low/clinically unimportant level of the assessed domain; allow the SE <0.3 stopping criterion to come into effect earlier in the CAT administration process; cases experiencing poor domain item bank measurement, (e.g., “floor,” “ceiling” cases).

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1oPJV-x0p9hRmgJ7t6k-MCC1nAoBSFM1w ER - TY - CONF T1 - From Blueprints to Systems: An Integrated Approach to Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Gage Kingsbury A1 - Tony Zara KW - CAT KW - integrated approach KW - Keynote AB -

For years, test blueprints have told test developers how many items and what types of items will be included in a test. Adaptive testing adopted this approach from paper testing, and it is reasonably useful. Unfortunately, 'how many items and what types of items' are not all the elements one should consider when choosing items for an adaptive test. To fill in gaps, practitioners have developed tools to allow an adaptive test to behave appropriately (i.e. examining exposure control, content balancing, item drift procedures, etc.). Each of these tools involves the use of a separate process external to the primary item selection process.

The use of these subsidiary processes makes item selection less optimal and makes it difficult to prioritize aspects of selection. This discussion describes systems-based adaptive testing. This approach uses metadata concerning items, test takers and test elements to select items. These elements are weighted by the stakeholders to shape an expanded blueprint designed for adaptive testing. 

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1CBaAfH4ES7XivmvrMjPeKyFCsFZOpQMJ ER - TY - CONF T1 - Generating Rationales to Support Formative Feedback in Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Mark Gierl A1 - Okan Bulut KW - Adaptive Testing KW - formative feedback KW - Item generation AB -

Computer adaptive testing offers many important benefits to support and promote life-long learning. Computers permit testing on-demand thereby allowing students to take the test at any time during instruction; items on computerized tests are scored immediately thereby providing students with instant feedback; computerized tests permit continuous administration thereby allowing students to have more choice about when they write their exams. But despite these important benefits, the advent of computer adaptive testing has also raised formidable challenges, particularly in the area of item development. Educators must have access to large numbers of diverse, high-quality test items to implement computerize adaptive testing because items are continuously administered to students. Hence, hundreds or even thousands of items are needed to develop the test item banks necessary for computer adaptive testing. Unfortunately, educational test items, as they are currently created, are time consuming and expensive to develop because each individual item is written, initially, by a content specialist and, then, reviewed, edited, and revised by groups of content specialists to ensure the items yield reliable and valid information. Hence, item development is one of the most important problems that must be solved before we can migrate to computer adaptive testing to support life-long learning because large numbers of high-quality, content-specific, test items are required.

One promising item development method that may be used to address this challenge is with automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modelling practices are used produce hundreds of new test items with the aid of computer technology. The purpose of our presentation is to describe a new methodology for generating both the items and the rationales required to solve each generated item in order to produce the feedback needed to support life-long learning. Our item generation methodology will first be described. To ensure our description is practical, the method will also be demonstrated using generated items from the health sciences to demonstrate how item generation can promote life-long learning for medical educators and practitioners.

 

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1O5KDFtQlDLvhNoDr7X4JO4arpJkIHKUP ER - TY - CONF T1 - Grow a Tiger out of Your CAT T2 - IACAT 2017 Conference Y1 - 2017 A1 - Angela Verschoor KW - interoparability KW - Scalability KW - transparency AB -

The main focus in the community of test developers and researchers is on improving adaptive test procedures and methodologies. Yet, the transition from research projects to larger-scale operational CATs is facing its own challenges. Usually, these operational CATs find their origin in government tenders. “Scalability”, “Interoperability” and “Transparency” are three keywords often found in these documents. Scalability is concerned with parallel system architectures which are based upon stateless selection algorithms. Design capacities often range from 10,000 to well over 100,000 concurrent students. Interoperability is implemented in standards like QTI, standards that were not designed with adaptive testing in mind. Transparency is being realized by open source software: the adaptive test should not be a black box. These three requirements often complicate the development of an adaptive test, or sometimes even conflict.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - JOUR T1 - Heuristic Constraint Management Methods in Multidimensional Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2017 A1 - Sebastian Born A1 - Andreas Frey AB - Although multidimensional adaptive testing (MAT) has been proven to be highly advantageous with regard to measurement efficiency when several highly correlated dimensions are measured, there are few operational assessments that use MAT. This may be due to issues of constraint management, which is more complex in MAT than it is in unidimensional adaptive testing. Very few studies have examined the performance of existing constraint management methods (CMMs) in MAT. The present article focuses on the effectiveness of two promising heuristic CMMs in MAT for varying levels of imposed constraints and for various correlations between the measured dimensions. Through a simulation study, the multidimensional maximum priority index (MMPI) and multidimensional weighted penalty model (MWPM), as an extension of the weighted penalty model, are examined with regard to measurement precision and constraint violations. The results show that both CMMs are capable of addressing complex constraints in MAT. However, measurement precision losses were found to differ between the MMPI and MWPM. While the MMPI appears to be more suitable for use in assessment situations involving few to a moderate number of constraints, the MWPM should be used when numerous constraints are involved. VL - 77 UR - http://dx.doi.org/10.1177/0013164416643744 ER - TY - CONF T1 - How Adaptive is an Adaptive Test: Are all Adaptive Tests Adaptive? T2 - 2017 IACAT Conference Y1 - 2017 A1 - Mark D Reckase KW - Adaptive Testing KW - CAT AB -

There are many different kinds of adaptive tests but they all have the characteristic that some feature of the test is customized to the purpose of the test. In the time allotted, it is impossible to consider the adaptation of all of this types so this address will focus on the “classic” adaptive test that matches the difficulty of the test to the capabilities of the person being tested. This address will first present information on the maximum level of adaptation that can occur and then compare the amount of adaptation that typically occurs on an operational adaptive test to the maximum level of adaptation. An index is proposed to summarize the amount of adaptation and it is argued that this type of index should be reported for operational adaptive tests to show the amount of adaptation that typically occurs.

Click for Presentation Video 

JF - 2017 IACAT Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Nj-zDCKk3DvHA4Jlp1qkb2XovmHeQfxu ER - TY - CONF T1 - The Implementation of Nationwide High Stakes Computerized (adaptive) Testing in the Netherlands T2 - IACAT 2017 Conference Y1 - 2017 A1 - Mia van Boxel A1 - Theo Eggen KW - High stakes CAT KW - Netherlands KW - WISCAT AB -

In this presentation the challenges of implementation of (adaptive) digital testing in the Facet system in the Netherlands is discussed. In the Netherlands there is a long tradition of implementing adaptive testing in educational settings. Already since the late nineties of the last century adaptive testing was used mostly in low stakes testing. Several CATs were implemented in student monitoring systems for primary education and in the general subjects language and arithmetic in vocational education. The only nationwide implemented high stakes CAT is the WISCAT-pabo: an arithmetic test for students in the first year of primary school teacher colleges. The psychometric advantages of item based adaptive testing are obvious. For example efficiency and high measurement precision. But there are also some disadvantages such as the impossibility of reviewing items during and after the test. During the test the student is not in control of his own test; e.q . he can only navigate forward to the next item. This is one of the reasons other methods of testing, such as multistage-testing, with adaptivity not on the item level but on subtest level, has become more popular to use in high stakes testing.

A main challenge of computerized (adaptive) testing is the implementation of the item bank and the test workflow in a digital system. Since 2014 a nationwide new digital system (Facet) was introduced in the Netherlands, with connections to the digital systems of different parties based on international standards (LTI and QTI). The first nationwide tests in the Facet-system were flexible exams Dutch and arithmetic for vocational (and secondary) education, taken as item response theory-based equated linear multiple forms tests, which are administered during 5 periods in a year. Nowadays there are some implementations of different methods of (multistage) adaptive testing in the same Facet system (DTT en Acet).

In this conference, other presenters of Cito will elaborate on the psychometric characteristics of this other adaptive testing methods. In this contribution, the system architecture and interoperability of the Facet system will be explained. The emphasis is on the implementation and the problems to be solved by using this digital system in all phases of the (adaptive) testing process: item banking, test construction, designing, publication, test taking, analyzing and reporting to the student. An evaluation of the use of the system will be presented.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Kn1PvgioUYaOJ5pykq-_XWnwDU15rRsf ER - TY - CONF T1 - An Imputation Approach to Handling Incomplete Computerized Tests T2 - IACAT 2017 Conference Y1 - 2017 A1 - Troy Chen A1 - Chi-Yu Huang A1 - Chunyan Liu KW - CAT KW - imputation approach KW - incomplete computerized test AB -

As technology advances, computerized adaptive testing (CAT) is becoming increasingly popular as it allows tests to be tailored to an examinee’s ability.  Nevertheless, examinees might devise testing strategies to use CAT to their advantage.  For instance, if only the items that examinees answer count towards their score, then a higher theta score might be obtained by spending more time on items at the beginning of the test and skipping items at the end if time runs out. This type of gaming can be discouraged if examinees’ scores are lowered or “penalized” based on the amount of non-response.

The goal of this study was to devise a penalty function that would meet two criteria: 1) the greater the omit rate, the greater the penalty, and 2) examinees with the same ability and the same omit rate should receive the same penalty. To create the penalty, theta was calculated based on only the items the examinee responded to ( ).  Next, the expected number correct score (EXR) was obtained using  and the test characteristic curve. A penalized expected number correct score (E ) was obtained by multiplying EXR by the proportion of items the examinee responded to. Finally, the penalized theta ( ) was identified using the test characteristic curve. Based on   and the item parameters ( ) of an unanswered item, the likelihood of a correct response,  , is computed and employed to estimate the imputed score ( ) for the unanswered item.

Two datasets were used to generate tests with completion rates of 50%, 80%, and 90%.  The first dataset included real data where approximately 4,500 examinees responded to a 21 -item test which provided a baseline/truth. Sampling was done to achieve the three completion rate conditions. The second dataset consisted of simulated item scores for 50,000 simulees under a 1-2-4 multi-stage CAT design where each stage contained seven items. Imputed item scores for unanswered items were computed using a variety of values for G (and therefore T).  Three other approaches to handling unanswered items were also considered: all correct (i.e., T = 0), all incorrect (i.e., T = 1), and random scoring (i.e., T = 0.5).

The current study investigated the impact on theta estimates resulting from the proposed approach to handling unanswered items in a fixed-length CAT. In real testing situations, when examinees do not finish a test, it is hard to tell whether they tried diligently but ran out of time or whether they attempted to manipulate the scoring engine.  To handle unfinished tests with penalties, the proposed approach considers examinees’ abilities and incompletion rates. The results of this study provide direction for psychometric practitioners when considering penalties for omitted responses.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1vznZeO3nsZZK0k6_oyw5c9ZTP8uyGnXh ER - TY - JOUR T1 - The Information Product Methods: A Unified Approach to Dual-Purpose Computerized Adaptive Testing JF - Applied Psychological MeasurementApplied Psychological Measurement Y1 - 2017 A1 - Zheng, Chanjin A1 - He, Guanrui A1 - Gao, Chunlei AB - This article gives a brief summary of major approaches in dual-purpose computerized adaptive testing (CAT) in which the test is tailored interactively to both an examinee?s overall ability level, ?, and attribute mastery level, α. It also proposes an information product approach whose connections to the current methods are revealed. An updated comprehensive empirical study demonstrated that the information product approach not only can offer a unified framework to connect all other approaches but also can mitigate the weighting issue in the dual-information approach. VL - 42 SN - 0146-6216 UR - https://doi.org/10.1177/0146621617730392 IS - 4 JO - Applied Psychological Measurement ER - TY - CONF T1 - Issues in Trait Range Coverage for Patient Reported Outcome Measure CATs - Extending the Ceiling for Above-average Physical Functioning T2 - IACAT 2017 Conference Y1 - 2017 A1 - Richard C. Gershon KW - CAT KW - Issues KW - Patient Reported Outcome AB -

The use of a measure which fails to cover the upper range of functioning may produce results which can lead to serious misinterpretation. Scores produced by such a measure may fail to recognize significant improvement, or may not be able to demonstrate functioning commensurate with an important milestone. Accurate measurement of this range is critical for the assessment of physically active adults, e.g., athletes recovering from injury and active military personnel who wish to return to active service. Alternatively, a PF measure with a low ceiling might fail to differentiate patients in rehabilitation who continue to improve, but for whom their score ceilings due to the measurement used.

The assessment of physical function (PF) has greatly benefited from modern psychometric theory and resulting scales, such as the Patient-Reported Outcomes Measurement Information System (PROMIS®) PF instruments. While PROMIS PF has extended the range of function upwards relative to older “legacy” instruments, few PROMIS PF items asses high levels of function. We report here on the development of higher functioning items for the PROMIS PF bank.

An expert panel representing orthopedics, sports/military medicine, and rehabilitation reviewed existing instruments and wrote new items. After internal review, cognitive interviews were conducted with 24 individuals of average and high levels of physical function. The remaining candidate items were administered along with 50 existing PROMIS anchor items to an internet panel screened for low, average, and high levels of physical function (N = 1,600), as well as members of Boston-area gyms (N= 344). The resulting data was subjected to standard psychometric analysis, along with multiple linking methods to place the new items on the existing PF metric. The new items were added to the full PF bank for simulated computerized adaptive testing (CAT).

Item response data was collected on 54 candidate items. Items that exhibited local dependence (LD) or differential item functioning (DIF) related to gender, age, race, education, or PF status. These items were removed from consideration. Of the 50 existing PROMIS PF items, 31 were free of DIF and LD and used as anchors. The parameters for the remaining new candidate items were estimated twice: freelyestimated and linked with coefficients and fixed-anchor calibration. Both methods were comparable and had appropriate fit. The new items were added to the full PF bank for simulated CATs. The resulting CAT was able to extend the ceiling with high precision to a T-score of 68, suggesting accurate measurement for 97% of the general population.

Extending the range of items by which PF is measured will substantially improve measurement quality, applicability, and efficiency. The bank has incorporated these extension items and is available for use in research and clinics for brief CAT administration (see www.healthmeasures.net). Future research projects should focus on recovery trajectories of the measure for individuals with above average function who are recovering from injury.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1ZC02F-dIyYovEjzpeuRdoXDiXMLFRuKb ER - TY - CONF T1 - Item Parameter Drifting and Online Calibration T2 - IACAT 2017 Conference Y1 - 2017 A1 - Hua-Hua Chang A1 - Rui Guo KW - online calibration KW - Parameter Drift AB -

Item calibration is a part of the most important topics in item response theory (IRT). Since many largescale testing programs have switched from paper and pencil (P&P) testing mode to computerized adaptive testing (CAT) mode, developing methods for efficiently calibrating new items have become vital. Among many proposed item calibration processes in CAT, online calibration is the most cost-effective. This presentation introduces an online (re)calibration design to detect item parameter drift for computerized adaptive testing (CAT) in both unidimensional and multidimensional environments. Specifically, for online calibration optimal design in unidimensional computerized adaptive testing model, a two-stage design is proposed by implementing a proportional density index algorithm. For a multidimensional computerized adaptive testing model, a four-quadrant online calibration pretest item selection design with proportional density index algorithm is proposed. Comparisons were made between different online calibration item selection strategies. Results showed that under unidimensional computerized adaptive testing, the proposed modified two-stage item selection criterion with the proportional density algorithm outperformed the other existing methods in terms of item parameter calibration and item parameter drift detection, and under multidimensional computerized adaptive testing, the online (re)calibration technique with the proposed four-quadrant item selection design with proportional density index outperformed other methods.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Item Pool Design and Evaluation T2 - IACAT 2017 Conference Y1 - 2017 A1 - Mark D Reckase A1 - Wei He A1 - Jing-Ru Xu A1 - Xuechun Zhou KW - CAT KW - Item Pool Design AB -

Early work on CAT tended to use existing sets of items which came from fixed length test forms. These sets of items were selected to meet much different requirements than are needed for a CAT; decision making or covering a content domain. However, there was also some early work that suggested having items equally distributed over the range of proficiency that was of interest or concentrated at a decision point. There was also some work that showed that there was bias in proficiency estimates when an item pool was too easy or too hard. These early findings eventually led to work on item pool design and, more recently, on item pool evaluation. This presentation gives a brief overview of these topics to give some context for the following presentations in this symposium.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1ZAsqm1yNZlliqxEHcyyqQ_vOSu20xxZs ER - TY - CONF T1 - Item Response Time on Task Effect in CAT T2 - IACAT 2017 Conference Y1 - 2017 A1 - Yang Shi KW - CAT KW - Response time KW - Task effect AB -

Introduction. In addition to reduced test length and increased measurement efficiency, computerized adaptive testing (CAT) can provide new insights into the cognitive process of task completion that cannot be mined via conventional tests. Response time is a primary characteristic of the task completion procedure. It has the potential to inform us about underlying processes. In this study, the relationship between response time and response accuracy will be investigated.

Hypothesis. The present study argues that the relationship between response time on task and response accuracy, which may be positive, negative, or curvilinear, will depend on cognitive nature of task items, holding ability of the subjects and difficulty of the items constant. The interpretations regarding the associations are not uniform either.

Research question. Is there a homogeneous effect of response time on test outcome across Graduate

Proposed explanations. If the accuracy of cognitive test responses decreases with response time, then it is an indication that the underlying cognitive process is a degrading process such as knowledge retrieval. More accessible knowledge can be retrieved faster than less accessible knowledge. It is inherent to knowledge retrieval that the success rate declines with elapsing response time. For instance, in reading tasks, the time on task effect is negative and the more negative, the easier a task is. However, if the accuracy of cognitive test responses increases with response time, then the process is of an upgrading nature, with an increasing success rate as a function of response time. For example, problem-solving takes time, and fast responses are less likely to be well-founded responses. It is of course also possible that the relationship is curvilinear, as when an increasing success rate is followed by a decreasing success rate or vice versa.

Methodology. The data are from computer-based GRE quantitative and verbal tests and will be analyzed with generalized linear mixed models (GLMM) framework after controlling the effect of ability and item difficulty as possible confounding factors. A linear model means a linear combination of predictors determining the probability of person p for answering item i correctly. The models are equivalent with advanced IRT models that go beyond the regular modeling of test responses in terms of one or more latent variables and item parameters. The lme4 package for R will be utilized to conduct the statistical calculation.

Implications. The right amount of testing time in CAT is important—too much is wasteful and costly, too little impacts score validity. The study is expected to provide new perception on the relationship between response time and response accuracy, which in turn, contribute to a better understanding of time effects and relevant cognitive process in CA.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Item Selection Strategies for Developing CAT in Indonesia T2 - IACAT 2017 Conference Y1 - 2017 A1 - Istiani Chandra KW - CAT KW - Indonesia KW - item selection strategies AB -

Recently, development of computerized testing in Indonesia is quiet promising for the future. Many government institutions used the technology for recruitment. Starting from Indonesian Army acknowledged the benefits of computerized adaptive testing (CAT) over conventional test administration, ones of the issues of selection the first item have taken place of attention. Due to CAT’s basic philosophy, several methods can be used to select the first item such as educational level, ability estimation from item simulation, or other methods. In this case, the question is remains how apply the methods most effective in the context of constrained adaptive testing. This paper reviews such strategies that appeared in the relevant literature. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item selection strategies for dichotomous scoring. In this paper, also discusses the strength and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed in term of learning in the newcomer context, a wide view of first item selection strategies.

 

JF - IACAT 2017 Conference PB - Niiagata Seiryo University CY - Niigata Japan UR - https://www.youtube.com/watch?v=2KuFrRATq9Q ER - TY - JOUR T1 - Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life JF - Quality of Life Research Y1 - 2017 A1 - Paap, Muirne C. S. A1 - Kroeze, Karel A. A1 - Terwee, Caroline B. A1 - van der Palen, Job A1 - Veldkamp, Bernard P. VL - 26 UR - https://doi.org/10.1007/s11136-017-1624-3 ER - TY - CONF T1 - A Large-Scale Progress Monitoring Application with Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Okan Bulut A1 - Damien Cormier KW - CAT KW - Large-Scale tests KW - Process monitoring AB -

Many conventional assessment tools are available to teachers in schools for monitoring student progress in a formative manner. The outcomes of these assessment tools are essential to teachers’ instructional modifications and schools’ data-driven educational strategies, such as using remedial activities and planning instructional interventions for students with learning difficulties. When measuring student progress toward instructional goals or outcomes, assessments should be not only considerably precise but also sensitive to individual change in learning. Unlike conventional paper-pencil assessments that are usually not appropriate for every student, computerized adaptive tests (CATs) are highly capable of estimating growth consistently with minimum and consistent error. Therefore, CATs can be used as a progress monitoring tool in measuring student growth.

This study focuses on an operational CAT assessment that has been used for measuring student growth in reading during the academic school year. The sample of this study consists of nearly 7 million students from the 1st grade to the 12th grade in the US. The students received a CAT-based reading assessment periodically during the school year. The purpose of these periodical assessments is to measure the growth in students’ reading achievement and identify the students who may need additional instructional support (e.g., academic interventions). Using real data, this study aims to address the following research questions: (1) How many CAT administrations are necessary to make psychometrically sound decisions about the need for instructional changes in the classroom or when to provide academic interventions?; (2) What is the ideal amount of time between CAT administrations to capture student growth for the purpose of producing meaningful decisions from assessment results?

To address these research questions, we first used the Theil-Sen estimator for robustly fitting a regression line to each student’s test scores obtained from a series of CAT administrations. Next, we used the conditional standard error of measurement (cSEM) from the CAT administrations to create an error band around the Theil-Sen slope (i.e., student growth rate). This process resulted in the normative slope values across all the grade levels. The optimal number of CAT administrations was established from grade-level regression results. The amount of time needed for progress monitoring was determined by calculating the amount of time required for a student to show growth beyond the median cSEM value for each grade level. The results showed that the normative slope values were the highest for lower grades and declined steadily as grade level increased. The results also suggested that the CAT-based reading assessment is most useful for grades 1 through 4, since most struggling readers requiring an intervention appear to be within this grade range. Because CAT yielded very similar cSEM values across administrations, the amount of error in the progress monitoring decisions did not seem to depend on the number of CAT administrations.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1uGbCKenRLnqTxImX1fZicR2c7GRV6Udc ER - TY - JOUR T1 - Latent-Class-Based Item Selection for Computerized Adaptive Progress Tests JF - Journal of Computerized Adaptive Testing Y1 - 2017 A1 - van Buuren, Nikky A1 - Eggen, Theo J. H. M. KW - computerized adaptive progress test KW - item selection method KW - Kullback-Leibler information KW - Latent class analysis KW - log-odds scoring VL - 5 UR - http://iacat.org/jcat/index.php/jcat/article/view/62/29 IS - 2 ER - TY - CONF T1 - MHK-MST Design and the Related Simulation Study T2 - IACAT 2017 Conference Y1 - 2017 A1 - Ling Yuyu A1 - Zhou Chenglin A1 - Ren Jie KW - language testing KW - MHK KW - multistage testing AB -

The MHK is a national standardized exam that tests and rates Chinese language proficiency. It assesses non-native Chinese minorities’ abilities in using the Chinese language in their daily, academic and professional lives; Computerized multistage adaptive testing (MST) is a combination of conventional paper-and-pencil (P&P) and item level computerized adaptive test (CAT), it is a kind of test forms based on computerized technology, take the item set as the scoring unit. It can be said that, MST estimate the Ability extreme value more accurate than conventional paper-and-pencil (P&P), also used the CAT auto-adapted characteristic to reduce the examination length and the score time of report. At present, MST has used in some large test, like Uniform CPA Examination and Graduate Record Examination(GRE). Therefore, it is necessary to develop the MST of application in China.

Based on consideration of the MHK characteristics and its future development, the researchers start with design of MHK-MST. This simulation study is conducted to validate the performance of the MHK -MST system. Real difficulty parameters of MHK items and the simulated ability parameters of the candidates are used to generate the original score matrix and the item modules are delivered to the candidates following the adaptive procedures set according to the path rules. This simulation study provides a sound basis for the implementation of MHK-MST.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Multi-stage Testing for a Multi-disciplined End-of primary-school Test T2 - IACAT 2017 Conference Y1 - 2017 A1 - Hendrik Straat A1 - Maaike van Groen A1 - Wobbe Zijlstra A1 - Marie-Anne Keizer-Mittelhaëuser A1 - Michel Lamoré KW - mst KW - Multidisciplined KW - proficiency AB -

The Dutch secondary education system consists of five levels: basic, lower, and middle vocational education, general secondary education, and pre-academic education. The individual decision for level of secondary education is based on a combination of the teacher’s judgment and an end-of-primaryschool placement test.

This placement test encompasses the measurement of reading, language, mathematics and writing; each skill consisting of one to four subdomains. The Dutch end-of-primaryschool test is currently administered in two linear 200-item paper-based versions. The two versions differ in difficulty so as to motivate both less able and more able students, and measure both groups of students precisely. The primary goal of the test is providing a placement advice for five levels of secondary education. The secondary goal is the assessment of six different fundamental reference levels defined on reading, language, and mathematics. Because of the high stakes advice of the test, the Dutch parliament has instructed to change the format to a multistage test. A major advantage of multistage testing is that the tailoring of the tests is more strongly related to the ability of the students than to the teacher’s judgment. A separate multistage test is under development for each of the three skills measured by the reference levels to increase the classification accuracy for secondary education placement and to optimally measure the performance on the reference-level-related skills.

This symposium consists of three presentations discussing the challenges in transitioning from a linear paper-based test to a computer-based multistage test within an existing curriculum and the specification of the multistage test to meet the measurement purposes. The transitioning to a multistage test has to improve both classification accuracy and measurement precision.

First, we describe the Dutch educational system and the role of the end-of-primary-school placement test within this system. Special attention will be paid to the advantages of multistage testing over both linear testing and computerized adaptive testing, and on practical implications related to the transitioning from a linear to a multistage test.

Second, we discuss routing and reporting on the new multi-stage test. Both topics have a major impact on the quality of the placement advice and the reference mastery decisions. Several methods for routing and reporting are compared.

Third, the linear test contains 200 items to cover a broad range of different skills and to obtain a precise measurement of those skills separately. Multistage testing creates opportunities to reduce the cognitive burden for the students while maintaining the same quality of placement advice and assessment of mastering of reference levels. This presentation focuses on optimal allocation of items to test modules, optimal number of stages and modules per stage and test length reduction.

Session Video 1

Session Video 2

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1C5ys178p_Wl9eemQuIsI56IxDTck2z8P ER - TY - CONF T1 - New Challenges (With Solutions) and Innovative Applications of CAT T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chun Wang A1 - David J. Weiss A1 - Xue Zhang A1 - Jian Tao A1 - Yinhong He A1 - Ping Chen A1 - Shiyu Wang A1 - Susu Zhang A1 - Haiyan Lin A1 - Xiaohong Gao A1 - Hua-Hua Chang A1 - Zhuoran Shang KW - CAT KW - challenges KW - innovative applications AB -

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa ER - TY - CONF T1 - A New Cognitive Diagnostic Computerized Adaptive Testing for Simultaneously Diagnosing Skills and Misconceptions T2 - IACAT 2017 Conference Y1 - 2017 A1 - Bor-Chen Kuo A1 - Chun-Hua Chen KW - CD-CAT KW - Misconceptions KW - Simultaneous diagnosis AB -

In education diagnoses, diagnosing misconceptions is important as well as diagnosing skills. However, traditional cognitive diagnostic computerized adaptive testing (CD-CAT) is usually developed to diagnose skills. This study aims to propose a new CD-CAT that can simultaneously diagnose skills and misconceptions. The proposed CD-CAT is based on a recently published new CDM, called the simultaneously identifying skills and misconceptions (SISM) model (Kuo, Chen, & de la Torre, in press). A new item selection algorithm is also proposed in the proposed CD-CAT for achieving high adaptive testing performance. In simulation studies, we compare our new item selection algorithm with three existing item selection methods, including the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) proposed by Cheng (2009) and the modified PWKL (MPWKL) proposed by Kaplan, de la Torre, and Barrada (2015). The results show that our proposed CD-CAT can efficiently diagnose skills and misconceptions; the accuracy of our new item selection algorithm is close to the MPWKL but less computational burden; and our new item selection algorithm outperforms the KL and PWKL methods on diagnosing skills and misconceptions.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. doi: 10.1007/s11336-009-9123-2

Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. doi:10.1177/0146621614554650

Kuo, B.-C., Chen, C.-H., & de la Torre, J. (in press). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - New Results on Bias in Estimates due to Discontinue Rules in Intelligence Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Matthias von Davier A1 - Youngmi Cho A1 - Tianshu Pan KW - Bias KW - CAT KW - Intelligence Testing AB -

The presentation provides new results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty, and the presentation of items is adaptive in the sense that each subtest session is discontinued once a test taker produces a certain number of incorrect responses in sequence. The subsequent (not observed) responses are commonly scored as wrong for that subtest, even though the test taker has not seen these. Discontinuation rules allow a certain form of adaptiveness both in paper-based and computerbased testing, and help reducing testing time.

Two lines of research that are relevant are studies that directly assess the impact of discontinuation rules, and studies that more broadly look at the impact of scoring rules on test results with a large number of not administered or not reached items. He & Wolf (2012) compared different ability estimation methods for this type of discontinuation rule adaptation of test length in a simulation study. However, to our knowledge there has been no rigorous analytical study of the underlying distributional changes of the response variables under discontinuation rules. It is important to point out that the results obtained by He & Wolf (2012) agree with results presented by, for example, DeAyala, Plake & Impara (2001) as well as Rose, von Davier & Xu (2010) and Rose, von Davier & Nagengast (2016) in that ability estimates are biased most when scoring the not observed responses as wrong. Discontinuation rules combined with scoring the non-administered items as wrong is used operationally in several major intelligence tests, so more research is needed in order to improve this particular type of adaptiveness in the testing practice.

The presentation extends existing research on adaptiveness by discontinue-rules in intelligence tests in multiple ways: First, a rigorous analytical study of the distributional properties of discontinue-rule scored items is presented. Second, an extended simulation is presented that includes additional alternative scoring rules as well as bias-corrected ability estimators that may be suitable to improve results for discontinue-rule scored intelligence tests.

References: DeAyala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38, 213-234.

He, W. & Wolfe, E. W. (2012). Treatment of Not-Administered Items on Individually Administered Intelligence Tests. Educational and Psychological Measurement, Vol 72, Issue 5, pp. 808 – 826. DOI: 10.1177/0013164412441937

Rose, N., von Davier, M., & Xu, X. (2010). Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11). Princeton, NJ: Educational Testing Service.

Rose, N., von Davier, M., & Nagengast, B. (2016) Modeling omitted and not-reached items in irt models. Psychometrika. doi:10.1007/s11336-016-9544-7

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - JOUR T1 - Projection-Based Stopping Rules for Computerized Adaptive Testing in Licensure Testing JF - Applied Psychological MeasurementApplied Psychological Measurement Y1 - 2017 A1 - Luo, Xiao A1 - Kim, Doyoung A1 - Dickison, Philip AB - The confidence interval (CI) stopping rule is commonly used in licensure settings to make classification decisions with fewer items in computerized adaptive testing (CAT). However, it tends to be less efficient in the near-cut regions of the ? scale, as the CI often fails to be narrow enough for an early termination decision prior to reaching the maximum test length. To solve this problem, this study proposed the projection-based stopping rules that base the termination decisions on the algorithmically projected range of the final ? estimate at the hypothetical completion of the CAT. A simulation study and an empirical study were conducted to show the advantages of the projection-based rules over the CI rule, in which the projection-based rules reduced the test length without jeopardizing critical psychometric qualities of the test, such as the ? and classification precision. Operationally, these rules do not require additional regularization parameters, because the projection is simply a hypothetical extension of the current test within the existing CAT environment. Because these new rules are specifically designed to address the decreased efficiency in the near-cut regions as opposed to for the entire scale, the authors recommend using them in conjunction with the CI rule in practice. VL - 42 SN - 0146-6216 UR - https://doi.org/10.1177/0146621617726790 IS - 4 JO - Applied Psychological Measurement ER - TY - CONF T1 - Response Time and Response Accuracy in Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Yang Shi KW - CAT KW - response accuracy KW - Response time AB -

Introduction. This study explores the relationship between response speed and response accuracy in Computerized Adaptive Testing (CAT). CAT provides a score as well as item response times, which can offer additional diagnostic information regarding behavioral processes of task completion that cannot be uncovered by paper-based instruments. The goal of this study is to investigate how the accuracy rate evolves as a function of response time. If the accuracy of cognitive test responses decreases with response time, then it is an indication that the underlying cognitive process is a degrading process such as knowledge retrieval. More accessible knowledge can be retrieved faster than less accessible knowledge. For instance, in reading tasks, the time on task effect is negative and the more negative, the easier a task is. However, if the accuracy of cognitive test responses increases with response time, then the process is of an upgrading nature, with an increasing success rate as a function of response time. For example, problem-solving takes time, and fast responses are less likely to be well-founded responses. It is of course also possible that the relationship is curvilinear, as when an increasing success rate is followed by a decreasing success rate or vice versa.

Hypothesis. The present study argues the relationship between response time on task and response accuracy can be positive, negative, or curvilinear, which depends on cognitive nature of task items holding ability of the subjects and difficulty of the items constant.

Methodology. Data from a subsection of GRE quantitative test were available. We will use generalized linear mixed models. A linear model means a linear combination of predictors determining the probability of person p for answering item i correctly. Modeling mixed effects means both random effects and fixed effects are included. Fixed effects refer to constants across test takers. The models are equivalent with advanced IRT models that go beyond the regular modeling of test responses in terms of one or more latent variables and item parameters. The lme4 package for R will be utilized to conduct the statistical calculation.

Research questions. 1. What is the relationship between response accuracy and response speed? 2. What is the correlation between response accuracy and type of response time (fast response vs slow response) after controlling ability of people?

Preliminary Findings. 1. There is a negative relationship between response time and response accuracy. The success rate declines with elapsing response time. 2. The correlation between the two response latent variables (fast and slow) is 1.0, indicating the time on task effects between respond time types are not different.

Implications. The right amount of testing time in CAT is important—too much is wasteful and costly, too little impacts score validity. The study is expected to provide new perception on the relationship between response time and response accuracy, which in turn, contribute to the best timing strategy in CAT—with or without time constraints.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1yYP01bzGrKvJnfLwepcAoQQ2F4TdSvZ2 ER - TY - JOUR T1 - Robust Automated Test Assembly for Testlet-Based Tests: An Illustration with Analytical Reasoning Items JF - Frontiers in Education Y1 - 2017 A1 - Veldkamp, Bernard P. A1 - Paap, Muirne C. S. VL - 2 UR - https://www.frontiersin.org/article/10.3389/feduc.2017.00063 ER - TY - CONF T1 - Scripted On-the-fly Multistage Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Edison Choe A1 - Bruce Williams A1 - Sung-Hyuck Lee KW - CAT KW - multistage testing KW - On-the-fly testing AB -

On-the-fly multistage testing (OMST) was introduced recently as a promising alternative to preassembled MST. A decidedly appealing feature of both is the reviewability of items within the current stage. However, the fundamental difference is that, instead of routing to a preassembled module, OMST adaptively assembles a module at each stage according to an interim ability estimate. This produces more individualized forms with finer measurement precision, but imposing nonstatistical constraints and controlling item exposure become more cumbersome. One recommendation is to use the maximum priority index followed by a remediation step to satisfy content constraints, and the Sympson-Hetter method with a stratified item bank for exposure control.

However, these methods can be computationally expensive, thereby impeding practical implementation. Therefore, this study investigated the script method as a simpler solution to the challenge of strict content balancing and effective item exposure control in OMST. The script method was originally devised as an item selection algorithm for CAT and generally proceeds as follows: For a test with m items, there are m slots to be filled, and an item is selected according to pre-defined rules for each slot. For the first slot, randomly select an item from a designated content area (collection). For each subsequent slot, 1) Discard any enemies of items already administered in previous slots; 2) Draw a designated number of candidate items (selection length) from the designated collection according to the current ability estimate; 3) Randomly select one item from the set of candidates. There are two distinct features of the script method. First, a predetermined sequence of collections guarantees meeting content specifications. The specific ordering may be determined either randomly or deliberately by content experts. Second, steps 2 and 3 depict a method of exposure control, in which selection length balances item usage at the possible expense of ability estimation accuracy. The adaptation of the script method to OMST is straightforward. For the first module, randomly select each item from a designated collection. For each subsequent module, the process is the same as in scripted CAT (SCAT) except the same ability estimate is used for the selection of all items within the module. A series of simulations was conducted to evaluate the performance of scripted OMST (SOMST, with 3 or 4 evenly divided stages) relative to SCAT under various item exposure restrictions. In all conditions, reliability was maximized by programming an optimization algorithm that searches for the smallest possible selection length for each slot within the constraints. Preliminary results indicated that SOMST is certainly a capable design with performance comparable to that of SCAT. The encouraging findings and ease of implementation highly motivate the prospect of operational use for large-scale assessments.

Presentation Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1wKuAstITLXo6BM4APf2mPsth1BymNl-y ER - TY - CONF T1 - A Simulation Study to Compare Classification Method in Cognitive Diagnosis Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Jing Yang A1 - Jian Tao A1 - Hua-Hua Chang A1 - Ning-Zhong Shi AB -

Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models that can be viewed as restricted latent class models have been developed to classify the examinees into the correct profile of skills that have been mastered and those that have not so as to get more efficient remediation. Chiu & Douglas (2013) introduces a nonparametric procedure that only requires specification of Q-matrix to classify by proximity to ideal response pattern. In this article, we compare nonparametric procedure with common profile estimation method like maximum a posterior (MAP) in CD-CAT. Simulation studies consider a variety of Q-matrix structure, the number of attributes, ways to generate attribute profiles, and item quality. Results indicate that nonparametric procedure consistently gets the higher pattern and attribute recovery rate in nearly all conditions.

References

Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. doi: 10.1007/s00357-013-9132-9

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1jCL3fPZLgzIdwvEk20D-FliZ15OTUtpr ER - TY - CONF T1 - Using Automated Item Generation in a Large-scale Medical Licensure Exam Program: Lessons Learned. T2 - 2017 IACAT Conference Y1 - 2017 A1 - André F. De Champlain KW - Automated item generation KW - large scale KW - medical licensure AB -

On-demand testing has become commonplace with most large-scale testing programs. Continuous testing is appealing for candidates in that it affords greater flexibility in scheduling a session at the desired location. Furthermore, the push for more comprehensive systems of assessment (e.g. CBAL) is predicated on the availability of more frequently administered tasks given the purposeful link between instruction and assessment in these frameworks. However, continuous testing models impose several challenges to programs, including overexposure of items. Robust item banks are therefore needed to support routine retirement and replenishment of items. In a traditional approach to developing items, content experts select a topic and then develop an item consisting of a stem, lead-in question, a correct answer and list of distractors. The item then undergoes review by a panel of experts to validate the content and identify any potential flaws. The process involved in developing quality MCQ items can be time-consuming as well as costly, with estimates as high as $1500-$2500 USD per item (Rudner, 2010). The Medical Council of Canada (MCC) has been exploring a novel item development process to supplement traditional approaches. Specifically, the use of automated item generation (AIG), which uses technology to generate test items from cognitive models, has been studied for over five years. Cognitive models are representations of the knowledge and skills that are required to solve any given problem. While developing a cognitive model for a medical scenario, for example, content experts are asked to deconstruct the (clinical) reasoning process involved via clearly stated variables and related elements. The latter information is then entered into a computer program that uses algorithms to generate MCQs. The MCC has been piloting AIG –based items for over five years with the MCC Qualifying Examination Part I (MCCQE I), a pre-requisite for licensure in Canada. The aim of this presentation is to provide an overview of the practical lessons learned in the use and operational rollout of AIG with the MCCQE I. Psychometrically, the quality of the items is at least equal, and in many instances superior, to that of traditionally written MCQs, based on difficulty, discrimination, and information. In fact, 96% of the AIG based items piloted in a recent administration were retained for future operational scoring based on pre-defined inclusion criteria. AIG also offers a framework for the systematic creation of plausible distractors, in that the content experts not only need to provide the clinical reasoning underlying a correct response but also the cognitive errors associated with each of the distractors (Lai et al. 2016). Consequently, AIG holds great promise in regard to improving and tailoring diagnostic feedback for remedial purposes (Pugh, De Champlain, Gierl, Lai, Touchie, 2016). Furthermore, our test development process has been greatly enhanced by the addition of AIG as it requires that item writers use metacognitive skills to describe how they solve problems. We are hopeful that sharing our experiences with attendees might not only help other testing organizations interested in adopting AIG, but also foster discussion which might benefit all participants.

References

Lai, H., Gierl, M.J., Touchie, C., Pugh, D., Boulais, A.P., & De Champlain, A.F. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching and Learning in Medicine, 28, 166-173.

Pugh, D., De Champlain, A.F., Lai, H., Gierl, M., & Touchie, C. (2016). Using cognitive models to develop quality multiple choice questions. Medical Teacher, 38, 838-843.

Rudner, L. (2010). Implementing the Graduate Management Admission Test Computerized Adaptive Test. In W. van der Linden & C. Glass (Eds.), Elements of adaptive testing (pp. 151-165). New York, NY: Springer. 

Presentation Video

JF - 2017 IACAT Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=14N8hUc8qexAy5W_94TykEDABGVIJHG1h ER - TY - CONF T1 - Using Bayesian Decision Theory in Cognitive Diagnosis Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chia-Ling Hsu A1 - Wen-Chung Wang A1 - ShuYing Chen KW - Bayesian Decision Theory KW - CD-CAT AB -

Cognitive diagnosis computerized adaptive testing (CD-CAT) purports to provide each individual a profile about the strengths and weaknesses of attributes or skills with computerized adaptive testing. In the CD-CAT literature, researchers dedicated to evolving item selection algorithms to improve measurement efficiency, and most algorithms were developed based on information theory. By the discontinuous nature of the latent variables in CD-CAT, this study introduced an alternative for item selection, called the minimum expected cost (MEC) method, which was derived based on Bayesian decision theory. Using simulations, the MEC method was evaluated against the posterior weighted Kullback-Leibler (PWKL) information, the modified PWKL (MPWKL), and the mutual information (MI) methods by manipulating item bank quality, item selection algorithm, and termination rule. Results indicated that, regardless of item quality and termination criterion, the MEC, MPWKL, and MI methods performed very similarly and they all outperformed the PWKL method in classification accuracy and test efficiency, especially in short tests; the MEC method had more efficient item bank usage than the MPWKL and MI methods. Moreover, the MEC method could consider the costs of incorrect decisions and improve classification accuracy and test efficiency when a particular profile was of concern. All the results suggest the practicability of the MEC method in CD-CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata Japan ER - TY - CONF T1 - Using Computerized Adaptive Testing to Detect Students’ Misconceptions: Exploration of Item Selection T2 - IACAT 2017 Conference Y1 - 2017 A1 - Yawei Shen A1 - Yu Bao A1 - Shiyu Wang A1 - Laine Bradshaw KW - CAT KW - incorrect answering KW - Student Misconception AB -

Owning misconceptions impedes learning, thus detecting misconceptions through assessments is crucial to facilitate teaching. However, most computerized adaptive testing (CAT) applications to diagnose examinees’ attribute profiles focus on whether examinees mastering correct concepts or not. In educational scenario, teachers and students have to figure out the misconceptions underlying incorrect answers after obtaining the scores from assessments and then correct the corresponding misconceptions. The Scaling Individuals and Classifying Misconceptions (SICM) models proposed by Bradshaw and Templin (2014) fill this gap. SICMs can identify a student’s misconceptions directly from the distractors of multiple-choice questions and report whether s/he own the misconceptions or not. Simultaneously, SICM models are able to estimate a continuous ability within the item response theory (IRT) framework to fulfill the needs of policy-driven assessment systems relying on scaling examinees’ ability. However, the advantage of providing estimations for two types of latent variables also causes complexity of model estimation. More items are required to achieve the same accuracies for both classification and estimation compared to dichotomous DCMs and to IRT, respectively. Thus, we aim to develop a CAT using the SICM models (SICM-CAT) to estimate students’ misconceptions and continuous abilities simultaneously using fewer items than a linear test.

To achieve this goal, in this study, our research questions mainly focus on establishing several item selection rules that target on providing both accurate classification results and continuous ability estimations using SICM-CAT. The first research question is which information criterion to be used. The Kullback–Leibler (KL) divergence is the first choice, as it can naturally combine the continuous and discrete latent variables. Based on this criterion, we propose an item selection index that can nicely integrate the two types of information. Based on this index, the items selected in real time could discriminate the examinee’s current misconception profile and ability estimates from other possible estimates to the most extent. The second research question is about how to adaptively balance the estimations of the misconception profile and the continuous latent ability. Mimic the idea of the Hybrid Design proposed by Wang et al. (2016), we propose a design framework which makes the item selection transition from the group-level to the item-level. We aim to explore several design questions, such as how to select the transiting point and which latent variable estimation should be targeted first.

Preliminary results indicated that the SICM-CAT based on the proposed item selection index could classify examinees into different latent classes and measure their latent abilities compared with the random selection method more accurately and reliably under all the simulation conditions. We plan to compare different CAT designs based on our proposed item selection rules with the best linear test as the next step. We expect that the SICM-CAT is able to use shorter test length while retaining the same accuracies and reliabilities.

References

Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403-425.

Wang, S., Lin, H., Chang, H. H., & Douglas, J. (2016). Hybrid computerized adaptive testing: from group sequential design to fully sequential design. Journal of Educational Measurement, 53(1), 45-62.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Using Determinantal Point Processes for Multistage Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Jill-Jênn Vie KW - Multidimentional CAT KW - multistage testing AB -

Multistage tests are a generalization of computerized adaptive tests (CATs), that allow to ask batches of questions before starting to adapt the process, instead of asking questions one by one. In order to be provided in real-world scenarios, they should be assembled on the fly, and recent models have been designed accordingly (Zheng & Chang, 2015). We will present a new algorithm for assembling multistage tests, based on a recent technique in machine learning called determinantal point processes. We will illustrate this technique on various student data that come from fraction subtraction items, or massive online open courses.

In multidimensional CATs, feature vectors are estimated for students and questions, and the probability that a student gets a question correct depends on how much their feature vector is correlated with the question feature vector. In other words, questions that are close in space lead to similar response patterns from the students. Therefore, in order to maximize the information of a batch of questions, the volume spanned by their feature vectors should be as large as possible. Determinantal point processes allow to sample efficiently batches of items from a bank that are diverse, i.e., that span a large volume: it is actually possible to draw k items among n with a O(nk3 ) complexity, which is convenient for large databases of 10,000s of items.

References

Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104-118.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1GkJkKTEFWK3srDX8TL4ra_Xbsliemu1R ER - TY - JOUR T1 - The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study JF - Quality of Life Research Y1 - 2017 A1 - Barthel, D. A1 - Otto, C. A1 - Nolte, S. A1 - Meyrose, A.-K. A1 - Fischer, F. A1 - Devine, J. A1 - Walter, O. A1 - Mierke, A. A1 - Fischer, K. I. A1 - Thyen, U. A1 - Klein, M. A1 - Ankermann, T. A1 - Rose, M. A1 - Ravens-Sieberer, U. AB - Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. VL - 26 UR - https://doi.org/10.1007/s11136-016-1437-9 ER - TY - JOUR T1 - Bayesian Networks in Educational Assessment: The State of the Field JF - Applied Psychological Measurement Y1 - 2016 A1 - Culbertson, Michael J. AB - Bayesian networks (BN) provide a convenient and intuitive framework for specifying complex joint probability distributions and are thus well suited for modeling content domains of educational assessments at a diagnostic level. BN have been used extensively in the artificial intelligence community as student models for intelligent tutoring systems (ITS) but have received less attention among psychometricians. This critical review outlines the existing research on BN in educational assessment, providing an introduction to the ITS literature for the psychometric community, and points out several promising research paths. The online appendix lists 40 assessment systems that serve as empirical examples of the use of BN for educational assessment in a variety of domains. VL - 40 UR - http://apm.sagepub.com/content/40/1/3.abstract ER - TY - JOUR T1 - A Comparison of Constrained Item Selection Methods in Multidimensional Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2016 A1 - Su, Ya-Hui AB - The construction of assessments in computerized adaptive testing (CAT) usually involves fulfilling a large number of statistical and non-statistical constraints to meet test specifications. To improve measurement precision and test validity, the multidimensional priority index (MPI) and the modified MPI (MMPI) can be used to monitor many constraints simultaneously under a between-item and a within-item multidimensional framework, respectively. As both item selection methods can be implemented easily and computed efficiently, they are important and useful for operational CATs; however, no thorough simulation study has compared the performance of these two item selection methods under two different item bank structures. The purpose of this study was to investigate the efficiency of the MMPI and the MPI item selection methods under the between-item and within-item multidimensional CAT through simulations. The MMPI and the MPI item selection methods yielded similar performance in measurement precision for both multidimensional pools and yielded similar performance in exposure control and constraint management for the between-item multidimensional pool. For the within-item multidimensional pool, the MPI method yielded slightly better performance in exposure control but yielded slightly worse performance in constraint management than the MMPI method. VL - 40 UR - http://apm.sagepub.com/content/40/5/346.abstract ER - TY - JOUR T1 - On Computing the Key Probability in the Stochastically Curtailed Sequential Probability Ratio Test JF - Applied Psychological Measurement Y1 - 2016 A1 - Huebner, Alan R. A1 - Finkelman, Matthew D. AB - The Stochastically Curtailed Sequential Probability Ratio Test (SCSPRT) is a termination criterion for computerized classification tests (CCTs) that has been shown to be more efficient than the well-known Sequential Probability Ratio Test (SPRT). The performance of the SCSPRT depends on computing the probability that at a given stage in the test, an examinee’s current interim classification status will not change before the end of the test. Previous work discusses two methods of computing this probability, an exact method in which all potential responses to remaining items are considered and an approximation based on the central limit theorem (CLT) requiring less computation. Generally, the CLT method should be used early in the test when the number of remaining items is large, and the exact method is more appropriate at later stages of the test when few items remain. However, there is currently a dearth of information as to the performance of the SCSPRT when using the two methods. For the first time, the exact and CLT methods of computing the crucial probability are compared in a simulation study to explore whether there is any effect on the accuracy or efficiency of the CCT. The article is focused toward practitioners and researchers interested in using the SCSPRT as a termination criterion in an operational CCT. VL - 40 UR - http://apm.sagepub.com/content/40/2/142.abstract ER - TY - JOUR T1 - On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: a simulation study JF - Quality of Life Research Y1 - 2016 A1 - Smits, Niels AB - To increase the precision of estimated item parameters of item response theory models for patient-reported outcomes, general population samples are often enriched with samples of clinical respondents. Calibration studies provide little information on how this sampling scheme is incorporated into model estimation. In a small simulation study the impact of ignoring the oversampling of clinical respondents on item and person parameters is illustrated. VL - 25 UR - https://doi.org/10.1007/s11136-015-1199-9 ER - TY - JOUR T1 - Effect of Imprecise Parameter Estimation on Ability Estimation in a Multistage Test in an Automatic Item Generation Context JF - Journal of Computerized Adaptive Testing Y1 - 2016 A1 - Colvin, Kimberly A1 - Keller, Lisa A A1 - Robin, Frederic KW - Adaptive Testing KW - automatic item generation KW - errors in item parameters KW - item clones KW - multistage testing VL - 4 UR - http://iacat.org/jcat/index.php/jcat/article/view/59/27 IS - 1 ER - TY - JOUR T1 - Exploration of Item Selection in Dual-Purpose Cognitive Diagnostic Computerized Adaptive Testing: Based on the RRUM JF - Applied Psychological Measurement Y1 - 2016 A1 - Dai, Buyun A1 - Zhang, Minqiang A1 - Li, Guangming AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) can be divided into two broad categories: (a) single-purpose tests, which are based on the subject’s knowledge state (KS) alone, and (b) dual-purpose tests, which are based on both the subject’s KS and traditional ability level ( ). This article seeks to identify the most efficient item selection method for the latter type of CD-CAT corresponding to various conditions and various evaluation criteria, respectively, based on the reduced reparameterized unified model (RRUM) and the two-parameter logistic model of item response theory (IRT-2PLM). The Shannon entropy (SHE) and Fisher information methods were combined to produce a new synthetic item selection index, that is, the “dapperness with information (DWI)” index, which concurrently considers both KS and within one step. The new method was compared with four other methods. The results showed that, in most conditions, the new method exhibited the best performance in terms of KS estimation and the second-best performance in terms of estimation. Item utilization uniformity and computing time are also considered for all the competing methods. VL - 40 UR - http://apm.sagepub.com/content/40/8/625.abstract ER - TY - JOUR T1 - High-Efficiency Response Distribution–Based Item Selection Algorithms for Short-Length Cognitive Diagnostic Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2016 A1 - Zheng, Chanjin A1 - Chang, Hua-Hua AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to obtain useful diagnostic information with great efficiency brought by CAT technology. Most of the existing CD-CAT item selection algorithms are evaluated when test length is fixed and relatively long, but some applications of CD-CAT, such as in interim assessment, require to obtain the cognitive pattern with a short test. The mutual information (MI) algorithm proposed by Wang is the first endeavor to accommodate this need. To reduce the computational burden, Wang provided a simplified scheme, but at the price of scale/sign change in the original index. As a result, it is very difficult to combine it with some popular constraint management methods. The current study proposes two high-efficiency algorithms, posterior-weighted cognitive diagnostic model (CDM) discrimination index (PWCDI) and posterior-weighted attribute-level CDM discrimination index (PWACDI), by modifying the CDM discrimination index. They can be considered as an extension of the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) methods. A pre-calculation strategy has also been developed to address the computational issue. Simulation studies indicate that the newly developed methods can produce results comparable with or better than the MI and PWKL in both short and long tests. The other major advantage is that the computational issue has been addressed more elegantly than MI. PWCDI and PWACDI can run as fast as PWKL. More importantly, they do not suffer from the problem of scale/sign change as MI and, thus, can be used with constraint management methods together in a straightforward manner. VL - 40 UR - http://apm.sagepub.com/content/40/8/608.abstract ER - TY - JOUR T1 - Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design JF - Journal of Educational Measurement Y1 - 2016 A1 - Wang, Shiyu A1 - Lin, Haiyan A1 - Chang, Hua-Hua A1 - Douglas, Jeff AB - Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing.  Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range. VL - 53 UR - http://dx.doi.org/10.1111/jedm.12100 ER - TY - JOUR T1 - On the Issue of Item Selection in Computerized Adaptive Testing With Response Times JF - Journal of Educational Measurement Y1 - 2016 A1 - Veldkamp, Bernard P. AB - Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second advantage is the ability to record not only the test taker's response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Combining these two advantages, various methods were explored for utilizing response time data in selecting appropriate items for an individual test taker.Four strategies for incorporating response time data were evaluated, and the precision of the final test-taker score was assessed by comparing it to a benchmark value that did not take response time information into account. While differences in measurement precision and testing times were expected, results showed that the strategies did not differ much with respect to measurement precision but that there were differences with regard to the total testing time. VL - 53 UR - http://dx.doi.org/10.1111/jedm.12110 ER - TY - JOUR T1 - Maximum Likelihood Score Estimation Method With Fences for Short-Length Tests and Computerized Adaptive Tests JF - Applied Psychological Measurement Y1 - 2016 A1 - Han, Kyung T. AB - A critical shortcoming of the maximum likelihood estimation (MLE) method for test score estimation is that it does not work with certain response patterns, including ones consisting only of all 0s or all 1s. This can be problematic in the early stages of computerized adaptive testing (CAT) administration and for tests short in length. To overcome this challenge, test practitioners often set lower and upper bounds of theta estimation and truncate the score estimation to be one of those bounds when the log likelihood function fails to yield a peak due to responses consisting only of 0s or 1s. Even so, this MLE with truncation (MLET) method still cannot handle response patterns in which all harder items are correct and all easy items are incorrect. Bayesian-based estimation methods such as the modal a posteriori (MAP) method or the expected a posteriori (EAP) method can be viable alternatives to MLE. The MAP or EAP methods, however, are known to result in estimates biased toward the center of a prior distribution, resulting in a shrunken score scale. This study introduces an alternative approach to MLE, called MLE with fences (MLEF). In MLEF, several imaginary “fence” items with fixed responses are introduced to form a workable log likelihood function even with abnormal response patterns. The findings of this study suggest that, unlike MLET, the MLEF can handle any response patterns and, unlike both MAP and EAP, results in score estimates that do not cause shrinkage of the theta scale. VL - 40 UR - http://apm.sagepub.com/content/40/4/289.abstract ER - TY - JOUR T1 - Modeling Student Test-Taking Motivation in the Context of an Adaptive Achievement Test JF - Journal of Educational Measurement Y1 - 2016 A1 - Wise, Steven L. A1 - Kingsbury, G. Gage AB - This study examined the utility of response time-based analyses in understanding the behavior of unmotivated test takers. For the data from an adaptive achievement test, patterns of observed rapid-guessing behavior and item response accuracy were compared to the behavior expected under several types of models that have been proposed to represent unmotivated test taking behavior. Test taker behavior was found to be inconsistent with these models, with the exception of the effort-moderated model. Effort-moderated scoring was found to both yield scores that were more accurate than those found under traditional scoring, and exhibit improved person fit statistics. In addition, an effort-guided adaptive test was proposed and shown by a simulation study to alleviate item difficulty mistargeting caused by unmotivated test taking. VL - 53 UR - http://dx.doi.org/10.1111/jedm.12102 ER - TY - JOUR T1 - Monitoring Items in Real Time to Enhance CAT Security JF - Journal of Educational Measurement Y1 - 2016 A1 - Zhang, Jinming A1 - Li, Jie AB - An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed CTT-based procedure through simulation studies. The results show that when the total number of examinees is fixed both procedures can control the rate of type I errors at any reasonable significance level by choosing an appropriate cutoff point and meanwhile maintain a low rate of type II errors. Further, the IRT-based method has a much lower type II error rate or more power than the CTT-based method when the number of compromised items is small (e.g., 5), which can be achieved if the IRT-based procedure can be applied in an active mode in the sense that flagged items can be replaced with new items. VL - 53 UR - http://dx.doi.org/10.1111/jedm.12104 ER - TY - JOUR T1 - Multidimensional Computerized Adaptive Testing for Classifying Examinees With Within-Dimensionality JF - Applied Psychological Measurement Y1 - 2016 A1 - van Groen, Maaike M. A1 - Eggen, Theo J. H. M. A1 - Veldkamp, Bernard P. AB - A classification method is presented for adaptive classification testing with a multidimensional item response theory (IRT) model in which items are intended to measure multiple traits, that is, within-dimensionality. The reference composite is used with the sequential probability ratio test (SPRT) to make decisions and decide whether testing can be stopped before reaching the maximum test length. Item-selection methods are provided that maximize the determinant of the information matrix at the cutoff point or at the projected ability estimate. A simulation study illustrates the efficiency and effectiveness of the classification method. Simulations were run with the new item-selection methods, random item selection, and maximization of the determinant of the information matrix at the ability estimate. The study also showed that the SPRT with multidimensional IRT has the same characteristics as the SPRT with unidimensional IRT and results in more accurate classifications than the latter when used for multidimensional data. VL - 40 UR - http://apm.sagepub.com/content/40/6/387.abstract ER - TY - JOUR T1 - Online Calibration of Polytomous Items Under the Generalized Partial Credit Model JF - Applied Psychological Measurement Y1 - 2016 A1 - Zheng, Yi AB - Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. VL - 40 UR - http://apm.sagepub.com/content/40/6/434.abstract ER - TY - JOUR T1 - Optimal Reassembly of Shadow Tests in CAT JF - Applied Psychological Measurement Y1 - 2016 A1 - Choi, Seung W. A1 - Moellering, Karin T. A1 - Li, Jie A1 - van der Linden, Wim J. AB - Even in the age of abundant and fast computing resources, concurrency requirements for large-scale online testing programs still put an uninterrupted delivery of computer-adaptive tests at risk. In this study, to increase the concurrency for operational programs that use the shadow-test approach to adaptive testing, we explored various strategies aiming for reducing the number of reassembled shadow tests without compromising the measurement quality. Strategies requiring fixed intervals between reassemblies, a certain minimal change in the interim ability estimate since the last assembly before triggering a reassembly, and a hybrid of the two strategies yielded substantial reductions in the number of reassemblies without degradation in the measurement accuracy. The strategies effectively prevented unnecessary reassemblies due to adapting to the noise in the early test stages. They also highlighted the practicality of the shadow-test approach by minimizing the computational load involved in its use of mixed-integer programming. VL - 40 UR - http://apm.sagepub.com/content/40/7/469.abstract ER - TY - JOUR T1 - Parameter Drift Detection in Multidimensional Computerized Adaptive Testing Based on Informational Distance/Divergence Measures JF - Applied Psychological Measurement Y1 - 2016 A1 - Kang, Hyeon-Ah A1 - Chang, Hua-Hua AB - An informational distance/divergence-based approach is proposed to detect the presence of parameter drift in multidimensional computerized adaptive testing (MCAT). The study presents significance testing procedures for identifying changes in multidimensional item response functions (MIRFs) over time based on informational distance/divergence measures that capture the discrepancy between two probability functions. To approximate the MIRFs from the observed response data, the k-nearest neighbors algorithm is used with the random search method. A simulation study suggests that the distance/divergence-based drift measures perform effectively in identifying the instances of parameter drift in MCAT. They showed moderate power with small samples of 500 examinees and excellent power when the sample size was as large as 1,000. The proposed drift measures also adequately controlled for Type I error at the nominal level under the null hypothesis. VL - 40 UR - http://apm.sagepub.com/content/40/7/534.abstract ER - TY - JOUR T1 - Stochastic Curtailment of Questionnaires for Three-Level Classification: Shortening the CES-D for Assessing Low, Moderate, and High Risk of Depression JF - Applied Psychological Measurement Y1 - 2016 A1 - Smits, Niels A1 - Finkelman, Matthew D. A1 - Kelderman, Henk AB - In clinical assessment, efficient screeners are needed to ensure low respondent burden. In this article, Stochastic Curtailment (SC), a method for efficient computerized testing for classification into two classes for observable outcomes, was extended to three classes. In a post hoc simulation study using the item scores on the Center for Epidemiologic Studies–Depression Scale (CES-D) of a large sample, three versions of SC, SC via Empirical Proportions (SC-EP), SC via Simple Ordinal Regression (SC-SOR), and SC via Multiple Ordinal Regression (SC-MOR) were compared at both respondent burden and classification accuracy. All methods were applied under the regular item order of the CES-D and under an ordering that was optimal in terms of the predictive power of the items. Under the regular item ordering, the three methods were equally accurate, but SC-SOR and SC-MOR needed less items. Under the optimal ordering, additional gains in efficiency were found, but SC-MOR suffered from capitalization on chance substantially. It was concluded that SC-SOR is an efficient and accurate method for clinical screening. Strengths and weaknesses of the methods are discussed. VL - 40 UR - http://apm.sagepub.com/content/40/1/22.abstract ER - TY - JOUR T1 - Using Response Time to Detect Item Preknowledge in Computer?Based Licensure Examinations JF - Educational Measurement: Issues and Practice. Y1 - 2016 A1 - Qian H. A1 - Staniewska, D. A1 - Reckase, M. A1 - Woo, A. AB - This article addresses the issue of how to detect item preknowledge using item response time data in two computer-based large-scale licensure examinations. Item preknowledge is indicated by an unexpected short response time and a correct response. Two samples were used for detecting item preknowledge for each examination. The first sample was from the early stage of the operational test and was used for item calibration. The second sample was from the late stage of the operational test, which may feature item preknowledge. The purpose of this research was to explore whether there was evidence of item preknowledge and compromised items in the second sample using the parameters estimated from the first sample. The results showed that for one nonadaptive operational examination, two items (of 111) were potentially exposed, and two candidates (of 1,172) showed some indications of preknowledge on multiple items. For another licensure examination that featured computerized adaptive testing, there was no indication of item preknowledge or compromised items. Implications for detected aberrant examinees and compromised items are discussed in the article. VL - 35 IS - 1 ER - TY - JOUR T1 - Assessing Individual-Level Impact of Interruptions During Online Testing JF - Journal of Educational Measurement Y1 - 2015 A1 - Sinharay, Sandip A1 - Wan, Ping A1 - Choi, Seung W. A1 - Kim, Dong-In AB - With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as Hill and Sinharay et al. examined the impact of interruptions at an aggregate level. However, there is a lack of research on the assessment of impact of interruptions at an individual level. We attempt to fill that void. We suggest four methodological approaches, primarily based on statistical hypothesis testing, linear regression, and item response theory, which can provide evidence on the individual-level impact of interruptions. We perform a realistic simulation study to compare the Type I error rate and power of the suggested approaches. We then apply the approaches to data from the 2013 Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions. VL - 52 UR - http://dx.doi.org/10.1111/jedm.12064 ER - TY - JOUR T1 - a-Stratified Computerized Adaptive Testing in the Presence of Calibration Error JF - Educational and Psychological Measurement Y1 - 2015 A1 - Cheng, Ying A1 - Patton, Jeffrey M. A1 - Shao, Can AB - a-Stratified computerized adaptive testing with b-blocking (AST), as an alternative to the widely used maximum Fisher information (MFI) item selection method, can effectively balance item pool usage while providing accurate latent trait estimates in computerized adaptive testing (CAT). However, previous comparisons of these methods have treated item parameter estimates as if they are the true population parameter values. Consequently, capitalization on chance may occur. In this article, we examined the performance of the AST method under more realistic conditions where item parameter estimates instead of true parameter values are used in the CAT. Its performance was compared against that of the MFI method when the latter is used in conjunction with Sympson–Hetter or randomesque exposure control. Results indicate that the MFI method, even when combined with exposure control, is susceptible to capitalization on chance. This is particularly true when the calibration sample size is small. On the other hand, AST is more robust to capitalization on chance. Consistent with previous investigations using true item parameter values, AST yields much more balanced item pool usage, with a small loss in the precision of latent trait estimates. The loss is negligible when the test is as long as 40 items. VL - 75 UR - http://epm.sagepub.com/content/75/2/260.abstract ER - TY - JOUR T1 - Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model JF - Educational and Psychological Measurement Y1 - 2015 A1 - Seo, Dong Gi A1 - Weiss, David J. AB - Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection. VL - 75 UR - http://epm.sagepub.com/content/75/6/954.abstract ER - TY - JOUR T1 - Comparing Simple Scoring With IRT Scoring of Personality Measures: The Navy Computer Adaptive Personality Scales JF - Applied Psychological Measurement Y1 - 2015 A1 - Oswald, Frederick L. A1 - Shaw, Amy A1 - Farmer, William L. AB -

This article analyzes data from U.S. Navy sailors (N = 8,956), with the central measure being the Navy Computer Adaptive Personality Scales (NCAPS). Analyses and results from this article extend and qualify those from previous research efforts by examining the properties of the NCAPS and its adaptive structure in more detail. Specifically, this article examines item exposure rates, the efficiency of item use based on item response theory (IRT)–based Expected A Posteriori (EAP) scoring, and a comparison of IRT-EAP scoring with much more parsimonious scoring methods that appear to work just as well (stem-level scoring and dichotomous scoring). The cutting-edge nature of adaptive personality testing will necessitate a series of future efforts like this: to examine the benefits of adaptive scoring schemes and novel measurement methods continually, while pushing testing technology further ahead.

VL - 39 UR - http://apm.sagepub.com/content/39/2/144.abstract ER - TY - JOUR T1 - A Comparison of IRT Proficiency Estimation Methods Under Adaptive Multistage Testing JF - Journal of Educational Measurement Y1 - 2015 A1 - Kim, Sooyeon A1 - Moses, Tim A1 - Yoo, Hanwook (Henry) AB - This inquiry is an investigation of item response theory (IRT) proficiency estimators’ accuracy under multistage testing (MST). We chose a two-stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two-stage MST panels (i.e., forms) by manipulating two assembly conditions in each module, such as difficulty level and module length. For each panel, we investigated the accuracy of examinees’ proficiency levels derived from seven IRT proficiency estimators. The choice of Bayesian (prior) versus non-Bayesian (no prior) estimators was of more practical significance than the choice of number-correct versus item-pattern scoring estimators. The Bayesian estimators were slightly more efficient than the non-Bayesian estimators, resulting in smaller overall error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for low- and high-performing examinees. VL - 52 UR - http://dx.doi.org/10.1111/jedm.12063 ER - TY - JOUR T1 - Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing JF - Applied Measurement in Education Y1 - 2015 A1 - Wyse, A. E. A1 - Albano, A. D. AB - This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for students with disabilities that have typically taken alternate assessments based on modified achievement standards (AA-MAS). A simulation study indicated that the abilities of AA-MAS students can be underestimated or overestimated by the mixed-item CAT, depending on students’ location on the underlying ability scale. These findings held across grade levels and test lengths. The mixed-item CAT appeared to function well for non-AA-MAS students. VL - 28 IS - 2 ER - TY - JOUR T1 - The Effect of Upper and Lower Asymptotes of IRT Models on Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Cheng, Ying A1 - Liu, Cheng AB - In this article, the effect of the upper and lower asymptotes in item response theory models on computerized adaptive testing is shown analytically. This is done by deriving the step size between adjacent latent trait estimates under the four-parameter logistic model (4PLM) and two models it subsumes, the usual three-parameter logistic model (3PLM) and the 3PLM with upper asymptote (3PLMU). The authors show analytically that the large effect of the discrimination parameter on the step size holds true for the 4PLM and the two models it subsumes under both the maximum information method and the b-matching method for item selection. Furthermore, the lower asymptote helps reduce the positive bias of ability estimates associated with early guessing, and the upper asymptote helps reduce the negative bias induced by early slipping. Relative step size between modeling versus not modeling the upper or lower asymptote under the maximum Fisher information method (MI) and the b-matching method is also derived. It is also shown analytically why the gain from early guessing is smaller than the loss from early slipping when the lower asymptote is modeled, and vice versa when the upper asymptote is modeled. The benefit to loss ratio is quantified under both the MI and the b-matching method. Implications of the analytical results are discussed. VL - 39 UR - http://apm.sagepub.com/content/39/7/551.abstract ER - TY - JOUR T1 - Evaluating Content Alignment in Computerized Adaptive Testing JF - Educational Measurement: Issues and Practice Y1 - 2015 A1 - Wise, S. L. A1 - Kingsbury, G. G. A1 - Webb, N. L. AB - The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process. VL - 34 IS - 4 ER - TY - JOUR T1 - Implementing a CAT: The AMC Experience JF - Journal of Computerized Adaptive Testing Y1 - 2015 A1 - Barnard, John J KW - adaptive KW - Assessment KW - computer KW - medical KW - online KW - Testing VL - 3 UR - http://www.iacat.org/jcat/index.php/jcat/article/view/52/25 IS - 1 ER - TY - JOUR T1 - Investigation of Response Changes in the GRE Revised General Test JF - Educational and Psychological Measurement Y1 - 2015 A1 - Liu, Ou Lydia A1 - Bridgeman, Brent A1 - Gu, Lixiong A1 - Xu, Jun A1 - Kong, Nan AB - Research on examinees’ response changes on multiple-choice tests over the past 80 years has yielded some consistent findings, including that most examinees make score gains by changing answers. This study expands the research on response changes by focusing on a high-stakes admissions test—the Verbal Reasoning and Quantitative Reasoning measures of the GRE revised General Test. We analyzed data from 8,538 examinees for Quantitative and 9,140 for Verbal sections who took the GRE revised General Test in 12 countries. The analyses yielded findings consistent with prior research. In addition, as examinees’ ability increases, the benefit of response changing increases. The study yielded significant implications for both test agencies and test takers. Computer adaptive tests often do not allow the test takers to review and revise. Findings from this study confirm the benefit of such features. VL - 75 UR - http://epm.sagepub.com/content/75/6/1002.abstract ER - TY - JOUR T1 - New Item Selection Methods for Cognitive Diagnosis Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Kaplan, Mehmet A1 - de la Torre, Jimmy A1 - Barrada, Juan Ramón AB - This article introduces two new item selection methods, the modified posterior-weighted Kullback–Leibler index (MPWKL) and the generalized deterministic inputs, noisy “and” gate (G-DINA) model discrimination index (GDI), that can be used in cognitive diagnosis computerized adaptive testing. The efficiency of the new methods is compared with the posterior-weighted Kullback–Leibler (PWKL) item selection index using a simulation study in the context of the G-DINA model. The impact of item quality, generating models, and test termination rules on attribute classification accuracy or test length is also investigated. The results of the study show that the MPWKL and GDI perform very similarly, and have higher correct attribute classification rates or shorter mean test lengths compared with the PWKL. In addition, the GDI has the shortest implementation time among the three indices. The proportion of item usage with respect to the required attributes across the different conditions is also tracked and discussed. VL - 39 UR - http://apm.sagepub.com/content/39/3/167.abstract ER - TY - JOUR T1 - Online Item Calibration for Q-Matrix in CD-CAT JF - Applied Psychological Measurement Y1 - 2015 A1 - Chen, Yunxiao A1 - Liu, Jingchen A1 - Ying, Zhiliang AB -

Item replenishment is important for maintaining a large-scale item bank. In this article, the authors consider calibrating new items based on pre-calibrated operational items under the deterministic inputs, noisy-and-gate model, the specification of which includes the so-called -matrix, as well as the slipping and guessing parameters. Making use of the maximum likelihood and Bayesian estimators for the latent knowledge states, the authors propose two methods for the calibration. These methods are applicable to both traditional paper–pencil–based tests, for which the selection of operational items is prefixed, and computerized adaptive tests, for which the selection of operational items is sequential and random. Extensive simulations are done to assess and to compare the performance of these approaches. Extensions to other diagnostic classification models are also discussed.

VL - 39 UR - http://apm.sagepub.com/content/39/1/5.abstract ER - TY - JOUR T1 - On-the-Fly Assembled Multistage Adaptive Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Zheng, Yi A1 - Chang, Hua-Hua AB -

Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing (CAT), the rapidly growing MST alleviates several major problems of earlier CAT applications. Nevertheless, MST is only one among all possible solutions to these problems. This article presents a new adaptive testing design, “on-the-fly assembled multistage adaptive testing” (OMST), which combines the benefits of CAT and MST and offsets their limitations. Moreover, OMST also provides some unique advantages over both CAT and MST. A simulation study was conducted to compare OMST with MST and CAT, and the results demonstrated the promising features of OMST. Finally, the “Discussion” section provides suggestions on possible future adaptive testing designs based on the OMST framework, which could provide great flexibility for adaptive tests in the digital future and open an avenue for all types of hybrid designs based on the different needs of specific tests.

VL - 39 UR - http://apm.sagepub.com/content/39/2/104.abstract ER - TY - JOUR T1 - Stochastic Curtailment in Adaptive Mastery Testing: Improving the Efficiency of Confidence Interval–Based Stopping Rules JF - Applied Psychological Measurement Y1 - 2015 A1 - Sie, Haskell A1 - Finkelman, Matthew D. A1 - Bartroff, Jay A1 - Thompson, Nathan A. AB - A well-known stopping rule in adaptive mastery testing is to terminate the assessment once the examinee’s ability confidence interval lies entirely above or below the cut-off score. This article proposes new procedures that seek to improve such a variable-length stopping rule by coupling it with curtailment and stochastic curtailment. Under the new procedures, test termination can occur earlier if the probability is high enough that the current classification decision remains the same should the test continue. Computation of this probability utilizes normality of an asymptotically equivalent version of the maximum likelihood ability estimate. In two simulation sets, the new procedures showed a substantial reduction in average test length while maintaining similar classification accuracy to the original method. VL - 39 UR - http://apm.sagepub.com/content/39/4/278.abstract ER - TY - JOUR T1 - Using Out-of-Level Items in Computerized Adaptive Testing JF - International Journal of Testing Y1 - 2015 A1 - Wei,H. A1 - Lin,J. AB - Out-of-level testing refers to the practice of assessing a student with a test that is intended for students at a higher or lower grade level. Although the appropriateness of out-of-level testing for accountability purposes has been questioned by educators and policymakers, incorporating out-of-level items in formative assessments for accurate feedback is recommended. This study made use of a commercial item bank with vertically scaled items across grades and simulated student responses in a computerized adaptive testing (CAT) environment. Results of the study suggested that administration of out-of-level items improved measurement accuracy and test efficiency for students who perform significantly above or below their grade-level peers. This study has direct implications with regards to the relevance, applicability, and benefits of using out-of-level items in CAT. VL - 15 IS - 1 ER - TY - JOUR T1 - Utilizing Response Times in Computerized Classification Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Sie, Haskell A1 - Finkelman, Matthew D. A1 - Riley, Barth A1 - Smits, Niels AB - A well-known approach in computerized mastery testing is to combine the Sequential Probability Ratio Test (SPRT) stopping rule with item selection to maximize Fisher information at the mastery threshold. This article proposes a new approach in which a time limit is defined for the test and examinees’ response times are considered in both item selection and test termination. Item selection is performed by maximizing Fisher information per time unit, rather than Fisher information itself. The test is terminated once the SPRT makes a classification decision, the time limit is exceeded, or there is no remaining item that has a high enough probability of being answered before the time limit. In a simulation study, the new procedure showed a substantial reduction in average testing time while slightly improving classification accuracy compared with the original method. In addition, the new procedure reduced the percentage of examinees who exceeded the time limit. VL - 39 UR - http://apm.sagepub.com/content/39/5/389.abstract ER - TY - JOUR T1 - Variable-Length Computerized Adaptive Testing Using the Higher Order DINA Model JF - Journal of Educational Measurement Y1 - 2015 A1 - Hsu, Chia-Ling A1 - Wang, Wen-Chung AB - Cognitive diagnosis models provide profile information about a set of latent binary attributes, whereas item response models yield a summary report on a latent continuous trait. To utilize the advantages of both models, higher order cognitive diagnosis models were developed in which information about both latent binary attributes and latent continuous traits is available. To facilitate the utility of cognitive diagnosis models, corresponding computerized adaptive testing (CAT) algorithms were developed. Most of them adopt the fixed-length rule to terminate CAT and are limited to ordinary cognitive diagnosis models. In this study, the higher order deterministic-input, noisy-and-gate (DINA) model was used as an example, and three criteria based on the minimum-precision termination rule were implemented: one for the latent class, one for the latent trait, and the other for both. The simulation results demonstrated that all of the termination criteria were successful when items were selected according to the Kullback-Leibler information and the posterior-weighted Kullback-Leibler information, and the minimum-precision rule outperformed the fixed-length rule with a similar test length in recovering the latent attributes and the latent trait. VL - 52 UR - http://dx.doi.org/10.1111/jedm.12069 ER - TY - JOUR T1 - Cognitive Diagnostic Models and Computerized Adaptive Testing: Two New Item-Selection Methods That Incorporate Response Times JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Finkelman, M. D. A1 - Kim, W. A1 - Weissman, A. A1 - Cook, R.J. VL - 2 UR - http://www.iacat.org/jcat/index.php/jcat/article/view/43/21 IS - 4 ER - TY - JOUR T1 - A Comparison of Four Item-Selection Methods for Severely Constrained CATs JF - Educational and Psychological Measurement Y1 - 2014 A1 - He, Wei A1 - Diao, Qi A1 - Hauser, Carl AB -

This study compared four item-selection procedures developed for use with severely constrained computerized adaptive tests (CATs). Severely constrained CATs refer to those adaptive tests that seek to meet a complex set of constraints that are often not conclusive to each other (i.e., an item may contribute to the satisfaction of several constraints at the same time). The procedures examined in the study included the weighted deviation model (WDM), the weighted penalty model (WPM), the maximum priority index (MPI), and the shadow test approach (STA). In addition, two modified versions of the MPI procedure were introduced to deal with an edge case condition that results in the item selection procedure becoming dysfunctional during a test. The results suggest that the STA worked best among all candidate methods in terms of measurement accuracy and constraint management. For the other three heuristic approaches, they did not differ significantly in measurement accuracy and constraint management at the lower bound level. However, the WPM method appears to perform considerably better in overall constraint management than either the WDM or MPI method. Limitations and future research directions were also discussed.

VL - 74 UR - http://epm.sagepub.com/content/74/4/677.abstract ER - TY - JOUR T1 - A Comparison of Multi-Stage and Linear Test Designs for Medium-Size Licensure and Certification Examinations JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Brossman, Bradley. G. A1 - Guille, R.A. VL - 2 IS - 2 ER - TY - JOUR T1 - Computerized Adaptive Testing for the Random Weights Linear Logistic Test Model JF - Applied Psychological Measurement Y1 - 2014 A1 - Crabbe, Marjolein A1 - Vandebroek, Martina AB -

This article discusses four-item selection rules to design efficient individualized tests for the random weights linear logistic test model (RWLLTM): minimum posterior-weighted -error minimum expected posterior-weighted -error maximum expected Kullback–Leibler divergence between subsequent posteriors (KLP), and maximum mutual information (MUI). The RWLLTM decomposes test items into a set of subtasks or cognitive features and assumes individual-specific effects of the features on the difficulty of the items. The model extends and improves the well-known linear logistic test model in which feature effects are only estimated at the aggregate level. Simulations show that the efficiencies of the designs obtained with the different criteria appear to be equivalent. However, KLP and MUI are given preference over and due to their lesser complexity, which significantly reduces the computational burden.

VL - 38 UR - http://apm.sagepub.com/content/38/6/415.abstract ER - TY - BOOK T1 - Computerized multistage testing: Theory and applications Y1 - 2014 A1 - Duanli Yan A1 - Alina A von Davier A1 - Charles Lewis PB - CRC Press CY - Boca Raton FL SN - 13-978-1-4665-0577-3 ER - TY - JOUR T1 - Detecting Item Preknowledge in Computerized Adaptive Testing Using Information Theory and Combinatorial Optimization JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Belov, D. I. KW - combinatorial optimization KW - hypothesis testing KW - item preknowledge KW - Kullback-Leibler divergence KW - simulated annealing. KW - test security VL - 2 UR - http://www.iacat.org/jcat/index.php/jcat/article/view/36/18 IS - 3 ER - TY - JOUR T1 - Determining the Overall Impact of Interruptions During Online Testing JF - Journal of Educational Measurement Y1 - 2014 A1 - Sinharay, Sandip A1 - Wan, Ping A1 - Whitaker, Mike A1 - Kim, Dong-In A1 - Zhang, Litong A1 - Choi, Seung W. AB -

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees’ scores. There is a lack of research on this topic due to the novelty of the problem. This article is an attempt to fill that void. Several methods, primarily based on propensity score matching, linear regression, and item response theory, were suggested to determine the overall impact of the interruptions on the examinees’ scores. A realistic simulation study shows that the suggested methods have satisfactory Type I error rate and power. Then the methods were applied to data from the Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions in 2013. The results indicate that the interruptions did not have a significant overall impact on the student scores for the ISTEP+ test.

VL - 51 UR - http://dx.doi.org/10.1111/jedm.12052 ER - TY - JOUR T1 - An Enhanced Approach to Combine Item Response Theory With Cognitive Diagnosis in Adaptive Testing JF - Journal of Educational Measurement Y1 - 2014 A1 - Wang, Chun A1 - Zheng, Chanjin A1 - Chang, Hua-Hua AB -

Computerized adaptive testing offers the possibility of gaining information on both the overall ability and cognitive profile in a single assessment administration. Some algorithms aiming for these dual purposes have been proposed, including the shadow test approach, the dual information method (DIM), and the constraint weighted method. The current study proposed two new methods, aggregate ranked information index (ARI) and aggregate standardized information index (ASI), which appropriately addressed the noncompatibility issue inherent in the original DIM method. More flexible weighting schemes that put different emphasis on information about general ability (i.e., θ in item response theory) and information about cognitive profile (i.e., α in cognitive diagnostic modeling) were also explored. Two simulation studies were carried out to investigate the effectiveness of the new methods and weighting schemes. Results showed that the new methods with the flexible weighting schemes could produce more accurate estimation of both overall ability and cognitive profile than the original DIM. Among them, the ASI with both empirical and theoretical weights is recommended, and attribute-level weighting scheme is preferred if some attributes are considered more important from a substantive perspective.

VL - 51 UR - http://dx.doi.org/10.1111/jedm.12057 ER - TY - JOUR T1 - Enhancing Pool Utilization in Constructing the Multistage Test Using Mixed-Format Tests JF - Applied Psychological Measurement Y1 - 2014 A1 - Park, Ryoungsun A1 - Kim, Jiseon A1 - Chung, Hyewon A1 - Dodd, Barbara G. AB -

This study investigated a new pool utilization method of constructing multistage tests (MST) using the mixed-format test based on the generalized partial credit model (GPCM). MST simulations of a classification test were performed to evaluate the MST design. A linear programming (LP) model was applied to perform MST reassemblies based on the initial MST construction. Three subsequent MST reassemblies were performed. For each reassembly, three test unit replacement ratios (TRRs; 0.22, 0.44, and 0.66) were investigated. The conditions of the three passing rates (30%, 50%, and 70%) were also considered in the classification testing. The results demonstrated that various MST reassembly conditions increased the overall pool utilization rates, while maintaining the desired MST construction. All MST testing conditions performed equally well in terms of the precision of the classification decision.

VL - 38 UR - http://apm.sagepub.com/content/38/4/268.abstract ER - TY - JOUR T1 - General Test Overlap Control: Improved Algorithm for CAT and CCT JF - Applied Psychological Measurement Y1 - 2014 A1 - Chen, Shu-Ying A1 - Lei, Pui-Wa A1 - Chen, Jyun-Hong A1 - Liu, Tzu-Chen AB -

This article proposed a new online test overlap control algorithm that is an improvement of Chen’s algorithm in controlling general test overlap rate for item pooling among a group of examinees. Chen’s algorithm is not very efficient in that not only item pooling between current examinee and prior examinees is controlled for but also item pooling between previous examinees, which would have been controlled for when they were current examinees. The proposed improvement increases efficiency by only considering item pooling between current and previous examinees, and its improved performance over Chen is demonstrated in a simulated computerized adaptive testing (CAT) environment. Moreover, the proposed algorithm is adapted for computerized classification testing (CCT) using the sequential probability ratio test procedure and is evaluated against some existing exposure control procedures. The proposed algorithm appears to work best in controlling general test overlap rate among the exposure control procedures examined without sacrificing much classification precision, though longer tests might be required for more stringent control of item pooling among larger groups. Given the capability of the proposed algorithm in controlling item pooling among a group of examinees of any size and its ease of implementation, it appears to be a good test overlap control method.

VL - 38 UR - http://apm.sagepub.com/content/38/3/229.abstract ER - TY - JOUR T1 - Improving Measurement Precision of Hierarchical Latent Traits Using Adaptive Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2014 A1 - Wang, Chun AB -

Many latent traits in social sciences display a hierarchical structure, such as intelligence, cognitive ability, or personality. Usually a second-order factor is linearly related to a group of first-order factors (also called domain abilities in cognitive ability measures), and the first-order factors directly govern the actual item responses. Because only a subtest of items is used to measure each domain, the lack of sufficient reliability becomes the primary impediment for generating and reporting domain abilities. In recent years, several item response theory (IRT) models have been proposed to account for hierarchical factor structures, and these models are also shown to alleviate the low reliability issue by using in-test collateral information to improve measurement precision. This article advocates using adaptive item selection together with a higher order IRT model to further increase the reliability of hierarchical latent trait estimation. Two item selection algorithms are proposed—the constrained D-optimal method and the sequencing domain method. Both are shown to yield improved measurement precision as compared to the unidimensional item selection (by treating each dimension separately). The improvement is more prominent when the test length is short and when the correlation between dimensions is high (e.g., higher than .64). Moreover, two reliability indices for hierarchical latent traits are discussed and their use for quantifying the reliability of hierarchical traits measured by adaptive testing is demonstrated.

VL - 39 UR - http://jeb.sagepub.com/cgi/content/abstract/39/6/452 ER - TY - JOUR T1 - Item Pool Design for an Operational Variable-Length Computerized Adaptive Test JF - Educational and Psychological Measurement Y1 - 2014 A1 - He, Wei A1 - Reckase, Mark D. AB -

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution and item exposure issues. Yet, there is little research on how to design item pools to have those desirable features. The research reported in this article provided step-by-step hands-on guidance on the item pool design process by applying the bin-and-union method to design item pools for a large-scale licensure CAT employing complex adaptive testing algorithm with variable test length, a decision based on stopping rule, content balancing, and exposure control. The design process involved extensive simulations to identify several alternative item pool designs and evaluate their performance against a series of criteria. The design output included the desired item pool size and item parameter distribution. The results indicate that the mechanism used to identify the desirable item pool features functions well and that two recommended item pool designs would support satisfactory performance of the operational testing program.

VL - 74 UR - http://epm.sagepub.com/content/74/3/473.abstract ER - TY - JOUR T1 - Item Selection Methods Based on Multiple Objective Approaches for Classifying Respondents Into Multiple Levels JF - Applied Psychological Measurement Y1 - 2014 A1 - van Groen, Maaike M. A1 - Eggen, Theo J. H. M. A1 - Veldkamp, Bernard P. AB -

Computerized classification tests classify examinees into two or more levels while maximizing accuracy and minimizing test length. The majority of currently available item selection methods maximize information at one point on the ability scale, but in a test with multiple cutting points selection methods could take all these points simultaneously into account. If for each cutting point one objective is specified, the objectives can be combined into one optimization function using multiple objective approaches. Simulation studies were used to compare the efficiency and accuracy of eight selection methods in a test based on the sequential probability ratio test. Small differences were found in accuracy and efficiency between different methods depending on the item pool and settings of the classification method. The size of the indifference region had little influence on accuracy but considerable influence on efficiency. Content and exposure control had little influence on accuracy and efficiency.

VL - 38 UR - http://apm.sagepub.com/content/38/3/187.abstract ER - TY - JOUR T1 - Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores With Item Exposure Control and Content Constraints JF - Journal of Educational Measurement Y1 - 2014 A1 - Yao, Lihua AB -

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle; volume; minimum error variance of the linear combination; minimum error variance of the composite score with optimized weight; and Kullback-Leibler information) were studied and compared with two methods for item exposure control (the Sympson-Hetter procedure and the fixed-rate procedure, the latter simply refers to putting a limit on the item exposure rate) using simulated data. The maximum priority index method was used for the content constraints. Results showed that the Sympson-Hetter procedure yielded better precision than the fixed-rate procedure but had much lower item pool usage and took more time. The five item selection procedures performed similarly under Sympson-Hetter. For the fixed-rate procedure, there was a trade-off between the precision of the ability estimates and the item pool usage: the five procedures had different patterns. It was found that (1) Kullback-Leibler had better precision but lower item pool usage; (2) minimum angle and volume had balanced precision and item pool usage; and (3) the two methods minimizing the error variance had the best item pool usage and comparable overall score recovery but less precision for certain domains. The priority index for content constraints and item exposure was implemented successfully.

VL - 51 UR - http://dx.doi.org/10.1111/jedm.12032 ER - TY - JOUR T1 - A Numerical Investigation of the Recovery of Point Patterns With Minimal Information JF - Applied Psychological Measurement Y1 - 2014 A1 - Cox, M. A. A. AB -

A method has been proposed (Tsogo et al. 2001) in order to reconstruct the geometrical configuration of a large point set using minimal information. This paper employs numerical examples to investigate the proposed procedure. The suggested method has two great advantages. It reduces the volume of the data collection exercise and eases the computational effort involved in analyzing the data. It is suggested, however, that the method while possibly providing a useful starting point for a solution, does not provide a universal panacea.

VL - 38 UR - http://apm.sagepub.com/content/38/4/329.abstract ER - TY - JOUR T1 - The Sequential Probability Ratio Test and Binary Item Response Models JF - Journal of Educational and Behavioral Statistics Y1 - 2014 A1 - Nydick, Steven W. AB -

The sequential probability ratio test (SPRT) is a common method for terminating item response theory (IRT)-based adaptive classification tests. To decide whether a classification test should stop, the SPRT compares a simple log-likelihood ratio, based on the classification bound separating two categories, to prespecified critical values. As has been previously noted (Spray & Reckase, 1994), the SPRT test statistic is not necessarily monotonic with respect to the classification bound when item response functions have nonzero lower asymptotes. Because of nonmonotonicity, several researchers (including Spray & Reckase, 1994) have recommended selecting items at the classification bound rather than the current ability estimate when terminating SPRT-based classification tests. Unfortunately, this well-worn advice is a bit too simplistic. Items yielding optimal evidence for classification depend on the IRT model, item parameters, and location of an examinee with respect to the classification bound. The current study illustrates, in depth, the relationship between the SPRT test statistic and classification evidence in binary IRT models. Unlike earlier studies, we examine the form of the SPRT-based log-likelihood ratio while altering the classification bound and item difficulty. These investigations motivate a novel item selection algorithm based on optimizing the expected SPRT criterion given the current ability estimate. The new expected log-likelihood ratio algorithm results in test lengths noticeably shorter than current, commonly used algorithms, and with no loss in classification accuracy.

VL - 39 UR - http://jeb.sagepub.com/cgi/content/abstract/39/3/203 ER - TY - JOUR T1 - A Sequential Procedure for Detecting Compromised Items in the Item Pool of a CAT System JF - Applied Psychological Measurement Y1 - 2014 A1 - Zhang, Jinming AB -

To maintain the validity of a continuous testing system, such as computerized adaptive testing (CAT), items should be monitored to ensure that the performance of test items has not gone through any significant changes during their lifetime in an item pool. In this article, the author developed a sequentially monitoring procedure based on a series of statistical hypothesis tests to examine whether the statistical characteristics of individual items have changed significantly during test administration. Simulation studies show that under the simulated setting, by choosing an appropriate cutoff point, the procedure can control the rate of Type I errors at any reasonable significance level and meanwhile, has a very low rate of Type II errors.

VL - 38 UR - http://apm.sagepub.com/content/38/2/87.abstract ER - TY - JOUR T1 - Stratified Item Selection and Exposure Control in Unidimensional Adaptive Testing in the Presence of Two-Dimensional Data JF - Applied Psychological Measurement Y1 - 2014 A1 - Kalinowski, Kevin E. A1 - Natesan, Prathiba A1 - Henson, Robin K. AB -

It is not uncommon to use unidimensional item response theory models to estimate ability in multidimensional data with computerized adaptive testing (CAT). The current Monte Carlo study investigated the penalty of this model misspecification in CAT implementations using different item selection methods and exposure control strategies. Three item selection methods—maximum information (MAXI), a-stratification (STRA), and a-stratification with b-blocking (STRB) with and without Sympson–Hetter (SH) exposure control strategy—were investigated. Calibrating multidimensional items as unidimensional items resulted in inaccurate item parameter estimates. Therefore, MAXI performed better than STRA and STRB in estimating the ability parameters. However, all three methods had relatively large standard errors. SH exposure control had no impact on the number of overexposed items. Existing unidimensional CAT implementations might consider using MAXI only if recalibration as multidimensional model is too expensive. Otherwise, building a CAT pool by calibrating multidimensional data as unidimensional is not recommended.

VL - 38 UR - http://apm.sagepub.com/content/38/7/563.abstract ER - TY - JOUR T1 - Using Multidimensional CAT to Administer a Short, Yet Precise, Screening Test JF - Applied Psychological Measurement Y1 - 2014 A1 - Yao, Lihua A1 - Pommerich, Mary A1 - Segall, Daniel O. AB -

Multidimensional computerized adaptive testing (MCAT) provides a mechanism by which the simultaneous goals of accurate prediction and minimal testing time for a screening test could both be met. This article demonstrates the use of MCAT to administer a screening test for the Computerized Adaptive Testing–Armed Services Vocational Aptitude Battery (CAT-ASVAB) under a variety of manipulated conditions. CAT-ASVAB is a test battery administered via unidimensional CAT (UCAT) that is used to qualify applicants for entry into the U.S. military and assign them to jobs. The primary research question being evaluated is whether the use of MCAT to administer a screening test can lead to significant reductions in testing time from the full-length selection test, without significant losses in score precision. Different stopping rules, item selection methods, content constraints, time constraints, and population distributions for the MCAT administration are evaluated through simulation, and compared with results from a regular full-length UCAT administration.

VL - 38 UR - http://apm.sagepub.com/content/38/8/614.abstract ER - TY - JOUR T1 - The Utility of Adaptive Testing in Addressing the Problem of Unmotivated Examinees JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Steven L. Wise VL - 2 IS - 1 ER - TY - JOUR T1 - The Applicability of Multidimensional Computerized Adaptive Testing for Cognitive Ability Measurement in Organizational Assessment JF - International Journal of Testing Y1 - 2013 A1 - Makransky, Guido A1 - Glas, Cees A. W. VL - 13 UR - http://www.tandfonline.com/doi/abs/10.1080/15305058.2012.672352 ER - TY - JOUR T1 - The applicability of multidimensional computerized adaptive testing to cognitive ability measurement in organizational assessment JF - International Journal of Testing Y1 - 2013 A1 - Makransky, G. A1 - Glas, C. A. W. VL - 13 IS - 2 ER - TY - JOUR T1 - The Application of the Monte Carlo Approach to Cognitive Diagnostic Computerized Adaptive Testing With Content Constraints JF - Applied Psychological Measurement Y1 - 2013 A1 - Mao, Xiuzhen A1 - Xin, Tao AB -

The Monte Carlo approach which has previously been implemented in traditional computerized adaptive testing (CAT) is applied here to cognitive diagnostic CAT to test the ability of this approach to address multiple content constraints. The performance of the Monte Carlo approach is compared with the performance of the modified maximum global discrimination index (MMGDI) method on simulations in which the only content constraint is on the number of items that measure each attribute. The results of the two simulation experiments show that (a) the Monte Carlo method fulfills all the test requirements and produces satisfactory measurement precision and item exposure results and (b) the Monte Carlo method outperforms the MMGDI method when the Monte Carlo method applies either the posterior-weighted Kullback–Leibler algorithm or the hybrid Kullback–Leibler information as the item selection index. Overall, the recovery rate of the knowledge states, the distribution of the item exposure, and the utilization rate of the item bank are improved when the Monte Carlo method is used.

VL - 37 UR - http://apm.sagepub.com/content/37/6/482.abstract ER - TY - JOUR T1 - Comparing the Performance of Five Multidimensional CAT Selection Procedures With Different Stopping Rules JF - Applied Psychological Measurement Y1 - 2013 A1 - Yao, Lihua AB -

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error (SE) and predicted standard error reduction (PSER), are proposed; each MCAT selection process is stopped if either the required precision has been achieved or the selected number of items has reached the maximum limit. The five procedures are as follows: minimum angle (Ag), volume (Vm), minimize the error variance of the linear combination (V 1), minimize the error variance of the composite score with the optimized weight (V 2), and Kullback–Leibler (KL) information. The recovery for the domain scores or content scores and their overall score, test length, and test reliability are compared across the five MCAT procedures and between the two stopping rules. It is found that the two stopping rules are implemented successfully and that KL uses the least number of items to reach the same precision level, followed by Vm; Ag uses the largest number of items. On average, to reach a precision of SE = .35, 40, 55, 63, 63, and 82 items are needed for KL, Vm, V 1, V 2, and Ag, respectively, for the SE stopping rule. PSER yields 38, 45, 53, 58, and 68 items for KL, Vm, V 1, V 2, and Ag, respectively; PSER yields only slightly worse results than SE, but with much fewer items. Overall, KL is recommended for varying-length MCAT.

VL - 37 UR - http://apm.sagepub.com/content/37/1/3.abstract ER - TY - JOUR T1 - A Comparison of Computerized Classification Testing and Computerized Adaptive Testing in Clinical Psychology JF - Journal of Computerized Adaptive Testing Y1 - 2013 A1 - Smits, N. A1 - Finkelman, M. D. VL - 1 IS - 2 ER - TY - JOUR T1 - A Comparison of Exposure Control Procedures in CAT Systems Based on Different Measurement Models for Testlets JF - Applied Measurement in Education Y1 - 2013 A1 - Boyd, Aimee M. A1 - Dodd, Barbara A1 - Fitzpatrick, Steven VL - 26 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2013.765434 ER - TY - JOUR T1 - A Comparison of Exposure Control Procedures in CATs Using the 3PL Model JF - Educational and Psychological Measurement Y1 - 2013 A1 - Leroux, Audrey J. A1 - Lopez, Myriam A1 - Hembry, Ian A1 - Dodd, Barbara G. AB -

This study compares the progressive-restricted standard error (PR-SE) exposure control procedure to three commonly used procedures in computerized adaptive testing, the randomesque, Sympson–Hetter (SH), and no exposure control methods. The performance of these four procedures is evaluated using the three-parameter logistic model under the manipulated conditions of item pool size (small vs. large) and stopping rules (fixed-length vs. variable-length). PR-SE provides the advantage of similar constraints to SH, without the need for a preceding simulation study to execute it. Overall for the large and small item banks, the PR-SE method administered almost all of the items from the item pool, whereas the other procedures administered about 52% or less of the large item bank and 80% or less of the small item bank. The PR-SE yielded the smallest amount of item overlap between tests across conditions and administered fewer items on average than SH. PR-SE obtained these results with similar, and acceptable, measurement precision compared to the other exposure control procedures while vastly improving on item pool usage.

VL - 73 UR - http://epm.sagepub.com/content/73/5/857.abstract ER - TY - JOUR T1 - A Comparison of Four Methods for Obtaining Information Functions for Scores From Computerized Adaptive Tests With Normally Distributed Item Difficulties and Discriminations JF - Journal of Computerized Adaptive Testing Y1 - 2013 A1 - Ito, K. A1 - Segall, D.O. VL - 1 IS - 5 ER - TY - JOUR T1 - Deriving Stopping Rules for Multidimensional Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Wang, Chun A1 - Chang, Hua-Hua A1 - Boughton, Keith A. AB -

Multidimensional computerized adaptive testing (MCAT) is able to provide a vector of ability estimates for each examinee, which could be used to provide a more informative profile of an examinee’s performance. The current literature on MCAT focuses on the fixed-length tests, which can generate less accurate results for those examinees whose abilities are quite different from the average difficulty level of the item bank when there are only a limited number of items in the item bank. Therefore, instead of stopping the test with a predetermined fixed test length, the authors use a more informative stopping criterion that is directly related to measurement accuracy. Specifically, this research derives four stopping rules that either quantify the measurement precision of the ability vector (i.e., minimum determinant rule [D-rule], minimum eigenvalue rule [E-rule], and maximum trace rule [T-rule]) or quantify the amount of available information carried by each item (i.e., maximum Kullback–Leibler divergence rule [K-rule]). The simulation results showed that all four stopping rules successfully terminated the test when the mean squared error of ability estimation is within a desired range, regardless of examinees’ true abilities. It was found that when using the D-, E-, or T-rule, examinees with extreme abilities tended to have tests that were twice as long as the tests received by examinees with moderate abilities. However, the test length difference with K-rule is not very dramatic, indicating that K-rule may not be very sensitive to measurement precision. In all cases, the cutoff value for each stopping rule needs to be adjusted on a case-by-case basis to find an optimal solution.

VL - 37 UR - http://apm.sagepub.com/content/37/2/99.abstract ER - TY - JOUR T1 - Estimating Measurement Precision in Reduced-Length Multi-Stage Adaptive Testing JF - Journal of Computerized Adaptive Testing Y1 - 2013 A1 - Crotts, K.M. A1 - Zenisky, A. L. A1 - Sireci, S.G. A1 - Li, X. VL - 1 IS - 4 ER - TY - JOUR T1 - The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Patton, Jeffrey M. A1 - Ying Cheng, A1 - Yuan, Ke-Hai A1 - Diao, Qi AB -

Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be “tailored” to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends on the item parameter values. However, items are chosen on the basis of their parameter estimates, and capitalization on chance may occur. In this article, the authors investigated the effects of capitalization on chance on test length and classification accuracy in several VL-CAT simulations. The results confirm that capitalization on chance occurs in VL-CAT and has complex effects on test length, ability estimation, and classification accuracy. These results have important implications for the design and implementation of VL-CATs.

VL - 37 UR - http://apm.sagepub.com/content/37/1/24.abstract ER - TY - JOUR T1 - Integrating Test-Form Formatting Into Automated Test Assembly JF - Applied Psychological Measurement Y1 - 2013 A1 - Diao, Qi A1 - van der Linden, Wim J. AB -

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using a simultaneous optimization model is more attractive than any of the current, more time-consuming two-stage processes. The goal of this study was to provide such simultaneous models both for computer-delivered and paper forms, as well as explore their performances relative to two-stage optimization. Empirical examples are presented to show that it is possible to automatically produce fully formatted optimal test forms directly from item pools up to some 2,000 items on a regular PC in realistic times.

VL - 37 UR - http://apm.sagepub.com/content/37/5/361.abstract ER - TY - JOUR T1 - Item Ordering in Stochastically Curtailed Health Questionnaires With an Observable Outcome JF - Journal of Computerized Adaptive Testing Y1 - 2013 A1 - Finkelman, M. D. A1 - Kim, W. A1 - He, Y. A1 - Lai, A.M. VL - 1 IS - 3 ER - TY - JOUR T1 - Item Pocket Method to Allow Response Review and Change in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Han, Kyung T. AB -

Most computerized adaptive testing (CAT) programs do not allow test takers to review and change their responses because it could seriously deteriorate the efficiency of measurement and make tests vulnerable to manipulative test-taking strategies. Several modified testing methods have been developed that provide restricted review options while limiting the trade-off in CAT efficiency. The extent to which these methods provided test takers with options to review test items, however, still was quite limited. This study proposes the item pocket (IP) method, a new testing approach that allows test takers greater flexibility in changing their responses by eliminating restrictions that prevent them from moving across test sections to review their answers. A series of simulations were conducted to evaluate the robustness of the IP method against various manipulative test-taking strategies. Findings and implications of the study suggest that the IP method may be an effective solution for many CAT programs when the IP size and test time limit are properly set.

VL - 37 UR - http://apm.sagepub.com/content/37/4/259.abstract ER - TY - JOUR T1 - Longitudinal Multistage Testing JF - Journal of Educational Measurement Y1 - 2013 A1 - Pohl, Steffi AB -

This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for testing in paper and pencil mode, lMST may represent an alternative to conventional testing (CT) in assessments for which other adaptive testing designs are not applicable. In this article the performance of lMST is compared to CT in terms of test targeting as well as bias and efficiency of ability and change estimates. Using a simulation study, the effect of the stability of ability across waves, the difficulty level of the different test forms, and the number of link items between the test forms were investigated.

VL - 50 UR - http://dx.doi.org/10.1111/jedm.12028 ER - TY - JOUR T1 - Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing With Short Test Length JF - Educational and Psychological Measurement Y1 - 2013 A1 - Wang, Chun AB -

Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to combine the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models aim at classifying examinees into the correct mastery profile group so as to pinpoint the strengths and weakness of each examinee whereas CAT algorithms choose items to determine those strengths and weakness as efficiently as possible. Most of the existing CD-CAT item selection algorithms are evaluated when test length is relatively long whereas several applications of CD-CAT, such as in interim assessment, require an item selection algorithm that is able to accurately recover examinees’ mastery profile with short test length. In this article, we introduce the mutual information item selection method in the context of CD-CAT and then provide a computationally easier formula to make the method more amenable in real time. Mutual information is then evaluated against common item selection methods, such as Kullback–Leibler information, posterior weighted Kullback–Leibler information, and Shannon entropy. Based on our simulations, mutual information consistently results in nearly the highest attribute and pattern recovery rate in more than half of the conditions. We conclude by discussing how the number of attributes, Q-matrix structure, correlations among the attributes, and item quality affect estimation accuracy.

VL - 73 UR - http://epm.sagepub.com/content/73/6/1017.abstract ER - TY - JOUR T1 - The Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large-Scale Assessments JF - Educational Measurement: Issues and Practice Y1 - 2013 A1 - Taherbhai, Husein A1 - Seo, Daeryong KW - cohort growth KW - construct-relevant drift KW - evaluation of scale drift KW - philosophical aspects of IRT equating VL - 32 UR - http://dx.doi.org/10.1111/emip.12000 ER - TY - JOUR T1 - The Random-Threshold Generalized Unfolding Model and Its Application of Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Wang, Wen-Chung A1 - Liu, Chen-Wei A1 - Wu, Shiu-Lien AB -

The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs Sampler) freeware, which adopts a Bayesian approach for estimation. A series of simulations was conducted to evaluate the parameter recovery of the new model and the consequences of ignoring the randomness in thresholds. The results showed that the parameters of RTGUM were recovered fairly well and that ignoring the randomness in thresholds led to biased estimates. Computerized adaptive testing was also implemented on RTGUM, where the Fisher information criterion was used for item selection and the maximum a posteriori method was used for ability estimation. The simulation study showed that the longer the test length, the smaller the randomness in thresholds, and the more categories in an item, the more precise the ability estimates would be.

VL - 37 UR - http://apm.sagepub.com/content/37/3/179.abstract ER - TY - CHAP T1 - Reporting differentiated literacy results in PISA by using multidimensional adaptive testing. T2 - Research on PISA. Y1 - 2013 A1 - Frey, A. A1 - Seitz, N-N. A1 - Kröhne, U. JF - Research on PISA. PB - Dodrecht: Springer ER - TY - JOUR T1 - A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2013 A1 - Wang, Chun A1 - Fan, Zhewen A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A. AB -

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the advantage of conciseness, but may suffer from reduced flexibility to fit real data. We propose a semiparametric approach, specifically, the Cox proportional hazards model with a latent speed covariate to model the RTs, embedded within the hierarchical framework proposed by van der Linden to model the RTs and response accuracy simultaneously. This semiparametric approach combines the flexibility of nonparametric modeling and the brevity and interpretability of the parametric modeling. A Markov chain Monte Carlo method for parameter estimation is given and may be used with sparse data obtained by computerized adaptive testing. Both simulation studies and real data analysis are carried out to demonstrate the applicability of the new model.

VL - 38 UR - http://jeb.sagepub.com/cgi/content/abstract/38/4/381 ER - TY - JOUR T1 - Speededness and Adaptive Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2013 A1 - van der Linden, Wim J. A1 - Xiong, Xinhui AB -

Two simple constraints on the item parameters in a response–time model are proposed to control the speededness of an adaptive test. As the constraints are additive, they can easily be included in the constraint set for a shadow-test approach (STA) to adaptive testing. Alternatively, a simple heuristic is presented to control speededness in plain adaptive testing without any constraints. Both types of control are easy to implement and do not require any other real-time parameter estimation during the test than the regular update of the test taker’s ability estimate. Evaluation of the two approaches using simulated adaptive testing showed that the STA was especially effective. It guaranteed testing times that differed less than 10 seconds from a reference test across a variety of conditions.

VL - 38 UR - http://jeb.sagepub.com/cgi/content/abstract/38/4/418 ER - TY - JOUR T1 - Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly JF - Applied Psychological Measurement Y1 - 2013 A1 - Veldkamp, Bernard P. A1 - Matteucci, Mariagiulia A1 - de Jong, Martijn G. AB -

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values, and uncertainty is not taken into account. As a consequence, resulting tests might be off target or less informative than expected. In this article, the process of parameter estimation is described to provide insight into the causes of uncertainty in the item parameters. The consequences of uncertainty are studied. Besides, an alternative automated test assembly algorithm is presented that is robust against uncertainties in the data. Several numerical examples demonstrate the performance of the robust test assembly algorithm, and illustrate the consequences of not taking this uncertainty into account. Finally, some recommendations about the use of robust test assembly and some directions for further research are given.

VL - 37 UR - http://apm.sagepub.com/content/37/2/123.abstract ER - TY - JOUR T1 - Variable-Length Computerized Adaptive Testing Based on Cognitive Diagnosis Models JF - Applied Psychological Measurement Y1 - 2013 A1 - Hsu, Chia-Ling A1 - Wang, Wen-Chung A1 - Chen, Shu-Ying AB -

Interest in developing computerized adaptive testing (CAT) under cognitive diagnosis models (CDMs) has increased recently. CAT algorithms that use a fixed-length termination rule frequently lead to different degrees of measurement precision for different examinees. Fixed precision, in which the examinees receive the same degree of measurement precision, is a major advantage of CAT over nonadaptive testing. In addition to the precision issue, test security is another important issue in practical CAT programs. In this study, the authors implemented two termination criteria for the fixed-precision rule and evaluated their performance under two popular CDMs using simulations. The results showed that using the two criteria with the posterior-weighted Kullback–Leibler information procedure for selecting items could achieve the prespecified measurement precision. A control procedure was developed to control item exposure and test overlap simultaneously among examinees. The simulation results indicated that in contrast to no method of controlling exposure, the control procedure developed in this study could maintain item exposure and test overlap at the prespecified level at the expense of only a few more items.

VL - 37 UR - http://apm.sagepub.com/content/37/7/563.abstract ER - TY - CHAP T1 - Adaptives Testen [Adaptive testing]. T2 - Testtheorie und Fragebogenkonstruktion Y1 - 2012 A1 - Frey, A. JF - Testtheorie und Fragebogenkonstruktion PB - Heidelberg: Springer CY - Berlin ER - TY - JOUR T1 - Balancing Flexible Constraints and Measurement Precision in Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2012 A1 - Moyer, Eric L. A1 - Galindo, Jennifer L. A1 - Dodd, Barbara G. AB -

Managing test specifications—both multiple nonstatistical constraints and flexibly defined constraints—has become an important part of designing item selection procedures for computerized adaptive tests (CATs) in achievement testing. This study compared the effectiveness of three procedures: constrained CAT, flexible modified constrained CAT, and the weighted penalty model in balancing multiple flexible constraints and maximizing measurement precision in a fixed-length CAT. The study also addressed the effect of two different test lengths—25 items and 50 items—and of including or excluding the randomesque item exposure control procedure with the three methods, all of which were found effective in selecting items that met flexible test constraints when used in the item selection process for longer tests. When the randomesque method was included to control for item exposure, the weighted penalty model and the flexible modified constrained CAT models performed better than did the constrained CAT procedure in maintaining measurement precision. When no item exposure control method was used in the item selection process, no practical difference was found in the measurement precision of each balancing method.

VL - 72 UR - http://epm.sagepub.com/content/72/4/629.abstract ER - TY - JOUR T1 - Comparison Between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test JF - Educational and Psychological Measurement Y1 - 2012 A1 - Jiao, H. A1 - Liu, J. A1 - Haynie, K. A1 - Woo, A. A1 - Gorham, J. AB -

This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test settings were explored in one real data analysis and two simulation studies when two different polytomous scoring algorithms, automated polytomous scoring and rater-generated polytomous scoring, were applied. For the real data analyses, the ability estimates from dichotomous and polytomous scoring were highly correlated; the classification consistency between different scoring algorithms was nearly perfect. Information distribution changed slightly in the operational item bank. In the two simulation studies comparing each polytomous scoring with dichotomous scoring, the ability estimates resulting from polytomous scoring had slightly higher measurement precision than those resulting from dichotomous scoring. The practical impact related to classification decision was minor because of the extremely small number of items that could be scored polytomously in this current study.

VL - 72 ER - TY - JOUR T1 - Comparison of Exposure Controls, Item Pool Characteristics, and Population Distributions for CAT Using the Partial Credit Model JF - Educational and Psychological Measurement Y1 - 2012 A1 - Lee, HwaYoung A1 - Dodd, Barbara G. AB -

This study investigated item exposure control procedures under various combinations of item pool characteristics and ability distributions in computerized adaptive testing based on the partial credit model. Three variables were manipulated: item pool characteristics (120 items for each of easy, medium, and hard item pools), two ability distributions (normally distributed and negatively skewed data), and three exposure control procedures (randomesque procedure, progressive–restricted procedure, and maximum information procedure). A number of measurement precision indexes such as descriptive statistics, correlations between known and estimated ability levels, bias, root mean squared error, and average absolute difference, exposure rates, item usage, and item overlap were computed to assess the impact of matched or nonmatched item pool and ability distributions on the accuracy of ability estimation and the performance of exposure control procedures. As expected, the medium item pool produced better precision of measurement than both the easy and hard item pools. The progressive–restricted procedure performed better in terms of maximum exposure rates, item average overlap, and pool utilization than both the randomesque procedure and the maximum information procedure. The easy item pool with the negatively skewed data as a mismatched condition produced the worst performance.

VL - 72 UR - http://epm.sagepub.com/content/72/1/159.abstract ER - TY - JOUR T1 - Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study. JF - BMC Med Res Methodol Y1 - 2012 A1 - Riley, Barth B A1 - Carle, Adam C KW - Bayes Theorem KW - Data Interpretation, Statistical KW - Humans KW - Mathematical Computing KW - Monte Carlo Method KW - Outcome Assessment (Health Care) AB -

BACKGROUND: Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.

METHOD: Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust Z (RZ) test, and (2) 95% credible intervals (CrI) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in θ estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition.

RESULTS: Both methods evidenced good to excellent false positive control, with RZ providing better control of false positives and with slightly higher power for CrI, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. RZ outperformed CrI, due to better control of false positive DIF.

CONCLUSIONS: Whereas false positives were well controlled, particularly for RZ, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.

VL - 12 ER - TY - JOUR T1 - Computerized Adaptive Testing for Student Selection to Higher Education JF - Journal of Higher Education Y1 - 2012 A1 - Kalender, I. AB -

The purpose of the present study is to discuss applicability of computerized adaptive testing format as an alternative for current student selection examinations to higher education in Turkey. In the study, first problems associated with current student selection system are given. These problems exerts pressure on students that results in test anxiety, produce measurement experiences that can be criticized, and lessen credibility of student selection system. Next, computerized adaptive test are introduced and advantages they provide are presented. Then results of a study that used two research designs (simulation and live testing) were presented. Results revealed that (i) computerized adaptive format provided a reduction up to 80% in the number of items given to students compared to paper and pencil format of student selection examination, (ii) ability estimations have high reliabilities. Correlations between ability estimations obtained from simulation and traditional format were higher than 0.80. At the end of the study solutions provided by computerized adaptive testing implementation to the current problems were discussed. Also some issues for application of CAT format for student selection examinations in Turkey are given.

ER - TY - THES T1 - Computerized adaptive testing in industrial and organizational psychology Y1 - 2012 A1 - Makransky, G. PB - University of Twente CY - Twente, The Netherlands VL - Ph.D. ER - TY - JOUR T1 - Computerized Adaptive Testing Using a Class of High-Order Item Response Theory Models JF - Applied Psychological Measurement Y1 - 2012 A1 - Huang, Hung-Yu A1 - Chen, Po-Hsi A1 - Wang, Wen-Chung AB -

In the human sciences, a common assumption is that latent traits have a hierarchical structure. Higher order item response theory models have been developed to account for this hierarchy. In this study, computerized adaptive testing (CAT) algorithms based on these kinds of models were implemented, and their performance under a variety of situations was examined using simulations. The results showed that the CAT algorithms were very effective. The progressive method for item selection, the Sympson and Hetter method with online and freeze procedure for item exposure control, and the multinomial model for content balancing can simultaneously maintain good measurement precision, item exposure control, content balance, test security, and pool usage.

VL - 36 UR - http://apm.sagepub.com/content/36/8/689.abstract ER - TY - JOUR T1 - Detecting Local Item Dependence in Polytomous Adaptive Data JF - Journal of Educational Measurement Y1 - 2012 A1 - Mislevy, Jessica L. A1 - Rupp, André A. A1 - Harring, Jeffrey R. AB -

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health-outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed-form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q3 Statistic and Pearson's Statistic X2, in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient-Reported Outcomes Measurement Information System (PROMIS).

VL - 49 UR - http://dx.doi.org/10.1111/j.1745-3984.2012.00165.x ER - TY - JOUR T1 - Development of a computerized adaptive test for depression JF - Archives of General Psychiatry Y1 - 2012 A1 - Robert D. Gibbons A1 - David .J. Weiss A1 - Paul A. Pilkonis A1 - Ellen Frank A1 - Tara Moore A1 - Jong Bae Kim A1 - David J. Kupfer VL - 69 UR - WWW.ARCHGENPSYCHIATRY.COM IS - 11 ER - TY - JOUR T1 - An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing JF - Journal of Educational Measurement Y1 - 2012 A1 - Han, Kyung T. AB -

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long-term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a-values while eliminating the need for item pool stratification.

VL - 49 UR - http://dx.doi.org/10.1111/j.1745-3984.2012.00173.x ER - TY - JOUR T1 - An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2012 A1 - Yen, Yung-Chin A1 - Ho, Rong-Guey A1 - Laio, Wen-Wei A1 - Chen, Li-Ju A1 - Kuo, Ching-Chin AB -

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item administration. To enhance the robust performance of CAT against aberrant responses, Barton and Lord proposed the four-parameter logistic (4PL) item response theory (IRT) model. However, most studies relevant to the 4PL IRT model were conducted based on simulation experiments. This study attempts to investigate the performance of the 4PL IRT model as a slip-correction mechanism with an empirical experiment. The results showed that the 4PL IRT model could not only reduce the problematic underestimation of the examinees’ ability introduced by careless mistakes in practical situations but also improve measurement efficiency.

VL - 36 UR - http://apm.sagepub.com/content/36/2/75.abstract ER - TY - JOUR T1 - Improving personality facet scores with multidimensional computerized adaptive testing: An illustration with the NEO PI-R JF - Assessment Y1 - 2012 A1 - Makransky, G. A1 - Mortensen, E. L. A1 - Glas, C. A. W. ER - TY - JOUR T1 - Investigating the Effect of Item Position in Computer-Based Tests JF - Journal of Educational Measurement Y1 - 2012 A1 - Li, Feiming A1 - Cohen, Allan A1 - Shen, Linjun AB -

Computer-based tests (CBTs) often use random ordering of items in order to minimize item exposure and reduce the potential for answer copying. Little research has been done, however, to examine item position effects for these tests. In this study, different versions of a Rasch model and different response time models were examined and applied to data from a CBT administration of a medical licensure examination. The models specifically were used to investigate whether item position affected item difficulty and item intensity estimates. Results indicated that the position effect was negligible.

VL - 49 UR - http://dx.doi.org/10.1111/j.1745-3984.2012.00181.x ER - TY - JOUR T1 - Item Overexposure in Computerized Classification Tests Using Sequential Item Selection JF - Practical Assessment, Research & Evaluation Y1 - 2012 A1 - Huebner, A. AB -

Computerized classification tests (CCTs) often use sequential item selection which administers items according to maximizing psychometric information at a cut point demarcating passing and failing scores. This paper illustrates why this method of item selection leads to the overexposure of a significant number of items, and the performances of three different methods for controlling maximum item exposure rates in CCTs are compared. Specifically, the Sympson-Hetter, restricted, and item eligibility methods are examined in two studies realistically simulating different types of CCTs and are evaluated based upon criteria including classification accuracy, the number of items exceeding the desired maximum exposure rate, and test overlap. The pros and cons of each method are discussed from a practical perspective.

VL - 17 IS - 12 ER - TY - JOUR T1 - Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test JF - Applied Measurement in Education Y1 - 2012 A1 - Ho, Tsung-Han A1 - Dodd, Barbara G. VL - 25 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.714686 ER - TY - JOUR T1 - A Mixture Rasch Model–Based Computerized Adaptive Test for Latent Class Identification JF - Applied Psychological Measurement Y1 - 2012 A1 - Hong Jiao, A1 - Macready, George A1 - Liu, Junhui A1 - Cho, Youngmi AB -

This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback–Leibler (KL) information were proposed and compared with the reversed and the adaptive KL information under simulated testing conditions. When item separation was large, all item selection methods did not differ evidently in terms of accuracy in classifying examinees into different latent classes and estimating latent ability. However, when item separation was small, two methods with class-specific ability estimates performed better than the other two methods based on a single latent ability estimate across all latent classes. The three types of KL information distributions were compared. The KL and the reversed KL information could be the same or different depending on the ability level and the item difficulty difference between latent classes. Although the KL information and the reversed KL information were different at some ability levels and item difficulty difference levels, the use of the KL, the reversed KL, or the adaptive KL information did not affect the results substantially due to the symmetric distribution of item difficulty differences between latent classes in the simulated item pools. Item pool usage and classification convergence points were examined as well.

VL - 36 UR - http://apm.sagepub.com/content/36/6/469.abstract ER - TY - JOUR T1 - Multistage Computerized Adaptive Testing With Uniform Item Exposure JF - Applied Measurement in Education Y1 - 2012 A1 - Edwards, Michael C. A1 - Flora, David B. A1 - Thissen, David VL - 25 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.660363 ER - TY - JOUR T1 - Panel Design Variations in the Multistage Test Using the Mixed-Format Tests JF - Educational and Psychological Measurement Y1 - 2012 A1 - Kim, Jiseon A1 - Chung, Hyewon A1 - Dodd, Barbara G. A1 - Park, Ryoungsun AB -

This study compared various panel designs of the multistage test (MST) using mixed-format tests in the context of classification testing. Simulations varied the design of the first-stage module. The first stage was constructed according to three levels of test information functions (TIFs) with three different TIF centers. Additional computerized adaptive test (CAT) conditions provided baseline comparisons. Three passing rate conditions were also included. The various MST conditions using mixed-format tests were constructed properly and performed well. When the levels of TIFs at the first stage were higher, the simulations produced a greater number of correct classifications. CAT with the randomesque-10 procedure yielded comparable results to the MST with increased levels of TIFs. Finally, all MST conditions achieved better test security results compared with CAT’s maximum information conditions.

VL - 72 UR - http://epm.sagepub.com/content/72/4/574.abstract ER - TY - JOUR T1 - The Problem of Bias in Person Parameter Estimation in Adaptive Testing JF - Applied Psychological Measurement Y1 - 2012 A1 - Doebler, Anna AB -

It is shown that deviations of estimated from true values of item difficulty parameters, caused for example by item calibration errors, the neglect of randomness of item difficulty parameters, testlet effects, or rule-based item generation, can lead to systematic bias in point estimation of person parameters in the context of adaptive testing. This effect occurs even when the errors of the item difficulty parameters are themselves unbiased. Analytical calculations as well as simulation studies are discussed.

VL - 36 UR - http://apm.sagepub.com/content/36/4/255.abstract ER - TY - JOUR T1 - On the Reliability and Validity of a Numerical Reasoning Speed Dimension Derived From Response Times Collected in Computerized Testing JF - Educational and Psychological Measurement Y1 - 2012 A1 - Davison, Mark L. A1 - Semmes, Robert A1 - Huang, Lan A1 - Close, Catherine N. AB -

Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy of speeded responses. Speed factor scores were significantly correlated with performance on the ACT math scale. Results suggest that the speed dimension underlying response times can be reliably measured and that the dimension is related to the accuracy of performance under the pressure of time limits.

VL - 72 UR - http://epm.sagepub.com/content/72/2/245.abstract ER - TY - JOUR T1 - A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests JF - Applied Psychological Measurement Y1 - 2012 A1 - Huebner, Alan A1 - Li, Zhushan AB -

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

VL - 36 UR - http://apm.sagepub.com/content/36/3/181.abstract ER - TY - JOUR T1 - A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests JF - Applied Psychological Measurement Y1 - 2012 A1 - Huebner, Alan A1 - Li, Zhushan AB -

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

VL - 36 UR - http://apm.sagepub.com/content/36/3/181.abstract ER - TY - JOUR T1 - Termination Criteria in Computerized Adaptive Tests: Do Variable-Length CATs Provide Efficient and Effective Measurement? JF - Journal of Computerized Adaptive Testing Y1 - 2012 A1 - Babcock, B. A1 - Weiss, D. J. VL - 1 IS - 1 ER - TY - CONF T1 - Adaptive Item Calibration and Norming: Unique Considerations of a Global Deployment T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Alexander Schwall A1 - Evan Sinar KW - CAT KW - common item equating KW - Figural Reasoning Test KW - item calibration KW - norming JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Applying computerized adaptive testing to the CES-D scale: A simulation study JF - Psychiatry Research Y1 - 2011 A1 - Smits, N. A1 - Cuijpers, P. A1 - van Straten, A. AB - In this paper we studied the appropriateness of developing an adaptive version of the Center of Epidemiological Studies-Depression (CES-D, Radloff, 1977) scale. Computerized Adaptive Testing (CAT) involves the computerized administration of a test in which each item is dynamically selected from a pool of items until a pre-specified measurement precision is reached. Two types of analyses were performed using the CES-D responses of a large sample of adolescents (N=1392). First, it was shown that the items met the psychometric requirements needed for CAT. Second, CATs were simulated by using the existing item responses as if they had been collected adaptively. CATs selecting only a small number of items gave results which, in terms of depression measurement and criterion validity, were only marginally different from the results of full CES-D assessment. It was concluded that CAT is a very fruitful way of improving the efficiency of the CES-D questionnaire. The discussion addresses the strengths and limitations of the application of CAT in mental health research. SN - 0165-1781 (Print)0165-1781 (Linking) N1 - Psychiatry Res. 2011 Jan 3. ER - TY - JOUR T1 - Applying computerized adaptive testing to the CES-D scale: A simulation study JF - Psychiatry Research Y1 - 2011 A1 - Smits, N. A1 - Cuijpers, P. A1 - van Straten, A. VL - 188 IS - 1 ER - TY - JOUR T1 - Better Data From Better Measurements Using Computerized Adaptive Testing JF - Journal of Methods and Measurement in the Social Sciences Y1 - 2011 A1 - Weiss, D. J. AB - The process of constructing a fixed-length conventional test frequently focuses on maximizing internal consistency reliability by selecting test items that are of average difficulty and high discrimination (a ―peaked‖ test). The effect of constructing such a test, when viewed from the perspective of item response theory, is test scores that are precise for examinees whose trait levels are near the point at which the test is peaked; as examinee trait levels deviate from the mean, the precision of their scores decreases substantially. Results of a small simulation study demonstrate that when peaked tests are ―off target‖ for an examinee, their scores are biased and have spuriously high standard deviations, reflecting substantial amounts of error. These errors can reduce the correlations of these kinds of scores with other variables and adversely affect the results of standard statistical tests. By contrast, scores from adaptive tests are essentially unbiased and have standard deviations that are much closer to true values. Basic concepts of adaptive testing are introduced and fully adaptive computerized tests (CATs) based on IRT are described. Several examples of response records from CATs are discussed to illustrate how CATs function. Some operational issues, including item exposure, content balancing, and enemy items are also briefly discussed. It is concluded that because CAT constructs a unique test for examinee, scores from CATs will be more precise and should provide better data for social science research and applications. VL - Vol. 2 IS - No. 1 ER - TY - CONF T1 - Building Affordable CD-CAT Systems for Schools To Address Today's Challenges In Assessment T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Chang, Hua-Hua KW - affordability KW - CAT KW - cost JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - catR: An R Package for Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2011 A1 - Magis, D. A1 - Raîche, G. KW - computer program KW - computerized adaptive testing KW - Estimation KW - Item Response Theory AB -

Computerized adaptive testing (CAT) is an active current research field in psychometrics and educational measurement. However, there is very little software available to handle such adaptive tasks. The R package catR was developed to perform adaptive testing with as much flexibility as possible, in an attempt to provide a developmental and testing platform to the interested user. Several item-selection rules and ability estimators are implemented. The item bank can be provided by the user or randomly generated from parent distributions of item parameters. Three stopping rules are available. The output can be graphically displayed.

ER - TY - JOUR T1 - A Comment on Early Student Blunders on Computer-Based Adaptive Tests JF - Applied Psychological Measurement Y1 - 2011 A1 - Green, Bert F. AB -

This article refutes a recent claim that computer-based tests produce biased scores for very proficient test takers who make mistakes on one or two initial items and that the ‘‘bias’’ can be reduced by using a four-parameter IRT model. Because the same effect occurs with pattern scores on nonadaptive tests, the effect results from IRT scoring, not from adaptive testing. Because very proficient test takers rarely err on items of middle difficulty, the so-called bias is one of selective data analysis. Furthermore, the apparently large score penalty for one error on an otherwise perfect response pattern is shown to result from the relative stretching of the IRT scale at very high and very low proficiencies. The recommended use of a four-parameter IRT model is shown to have drawbacks.

VL - 35 UR - http://apm.sagepub.com/content/35/2/165.abstract ER - TY - JOUR T1 - A Comment on Early Student Blunders on Computer-Based Adaptive Tests JF - Applied Psychological Measurement Y1 - 2011 A1 - Green, B. F. AB -

This article refutes a recent claim that computer-based tests produce biased scores for very proficient test takers who make mistakes on one or two initial items and that they can be reduced by using a four-parameter IRT model. Because the same effect occurs with pattern scores on nonadaptive tests, the effect results from IRT scoring, not from adaptive testing. Because very proficient test takers rarely err on items of middle difficulty, the so-called bias is one of selective data analysis. Furthermore, the apparently large score penalty for one error on an otherwise perfect response pattern is shown to result from the relative stretching of the IRT scale at very high and very low proficiencies. The recommended use of a four-parameter IRT model is shown to have drawbacks.

VL - 35 IS - 2 ER - TY - JOUR T1 - Computer adaptive testing for small scale programs and instructional systems JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Rudner, L. M. A1 - Guo, F. AB -

This study investigates measurement decision theory (MDT) as an underlying model for computer adaptive testing when the goal is to classify examinees into one of a finite number of groups. The first analysis compares MDT with a popular item response theory model and finds little difference in terms of the percentage of correct classifications. The second analysis examines the number of examinees needed to calibrate MDT item parameters and finds accurate classifications even with calibration sample sizes as small as 100 examinees.

VL - 12 IS - 1 ER - TY - JOUR T1 - Computerized adaptive assessment of personality disorder: Introducing the CAT–PD project JF - Journal of Personality Assessment Y1 - 2011 A1 - Simms, L. J. A1 - Goldberg, L .R. A1 - Roberts, J. E. A1 - Watson, D. A1 - Welte, J. A1 - Rotterman, J. H. AB - Assessment of personality disorders (PD) has been hindered by reliance on the problematic categorical model embodied in the most recent Diagnostic and Statistical Model of Mental Disorders (DSM), lack of consensus among alternative dimensional models, and inefficient measurement methods. This article describes the rationale for and early results from a multiyear study funded by the National Institute of Mental Health that was designed to develop an integrative and comprehensive model and efficient measure of PD trait dimensions. To accomplish these goals, we are in the midst of a 5-phase project to develop and validate the model and measure. The results of Phase 1 of the project—which was focused on developing the PD traits to be assessed and the initial item pool—resulted in a candidate list of 59 PD traits and an initial item pool of 2,589 items. Data collection and structural analyses in community and patient samples will inform the ultimate structure of the measure, and computerized adaptive testing will permit efficient measurement of the resultant traits. The resultant Computerized Adaptive Test of Personality Disorder (CAT–PD) will be well positioned as a measure of the proposed DSM–5 PD traits. Implications for both applied and basic personality research are discussed. VL - 93 SN - 0022-3891 ER - TY - JOUR T1 - Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model JF - International Journal of Testing Y1 - 2011 A1 - Stark, Stephen A1 - Chernyshenko, Oleksandr S. VL - 11 UR - http://www.tandfonline.com/doi/abs/10.1080/15305058.2011.561459 ER - TY - JOUR T1 - Computerized Classification Testing Under the Generalized Graded Unfolding Model JF - Educational and Psychological Measurement Y1 - 2011 A1 - Wang, Wen-Chung A1 - Liu, Chen-Wei AB -

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree—disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut score/sequential probability ratio test method and (b) evaluated their accuracy and efficiency in classification through simulations. The results indicated that both methods were very accurate and efficient. The more points each item had and the fewer the classification categories, the more accurate and efficient the classification would be. However, the latter method may yield a very low accuracy in dichotomous items with a short maximum test length. Thus, if it is to be used to classify examinees with dichotomous items, the maximum text length should be increased.

VL - 71 UR - http://epm.sagepub.com/content/71/1/114.abstract ER - TY - JOUR T1 - Computerized Classification Testing Under the One-Parameter Logistic Response Model With Ability-Based Guessing JF - Educational and Psychological Measurement Y1 - 2011 A1 - Wang, Wen-Chung A1 - Huang, Sheng-Yun AB -

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their performances. Four item selection methods (the Fisher information, the Fisher information with a posterior distribution, the progressive method, and the adjusted progressive method) and two termination criteria (the ability confidence interval [ACI] method and the sequential probability ratio test [SPRT]) were developed. In addition, the Sympson–Hetter online method with freeze (SHOF) was implemented for item exposure control. Major results include the following: (a) when no item exposure control was made, all the four item selection methods yielded very similar correct classification rates, but the Fisher information method had the worst item bank usage and the highest item exposure rate; (b) SHOF can successfully maintain the item exposure rate at a prespecified level, without compromising substantial accuracy and efficiency in classification; (c) once SHOF was implemented, all the four methods performed almost identically; (d) ACI appeared to be slightly more efficient than SPRT; and (e) in general, a higher weight of ability in guessing led to a slightly higher accuracy and efficiency, and a lower forced classification rate.

VL - 71 UR - http://epm.sagepub.com/content/71/6/925.abstract ER - TY - JOUR T1 - Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy JF - Physical & Occupational Therapy in Pediatrics Y1 - 2011 A1 - Montpetit, K. A1 - Haley, S. A1 - Bilodeau, N. A1 - Ni, P. A1 - Tian, F. A1 - Gorton, G., 3rd A1 - Mulcahey, M. J. AB - This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function. VL - 31 SN - 1541-3144 (Electronic)0194-2638 (Linking) N1 - Montpetit, KathleenHaley, StephenBilodeau, NathalieNi, PengshengTian, FengGorton, George 3rdMulcahey, M JEnglandPhys Occup Ther Pediatr. 2011 Feb;31(1):90-102. Epub 2010 Oct 13. JO - Phys Occup Ther Pediatr ER - TY - CONF T1 - Continuous Testing (an avenue for CAT research) T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - G. Gage Kingsbury KW - CAT KW - item filter KW - item filtration AB -

Publishing an Adaptive Test

Problems with Publishing

Research Questions

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Creating a K-12 Adaptive Test: Examining the Stability of Item Parameter Estimates and Measurement Scales JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Kingsbury, G. G. A1 - Wise, S. L. AB -

Development of adaptive tests used in K-12 settings requires the creation of stable measurement scales to measure the growth of individual students from one grade to the next, and to measure change in groups from one year to the next. Accountability systems
like No Child Left Behind require stable measurement scales so that accountability has meaning across time. This study examined the stability of the measurement scales used with the Measures of Academic Progress. Difficulty estimates for test questions from the reading and mathematics scales were examined over a period ranging from 7 to 22 years. Results showed high correlations between item difficulty estimates from the time at which they where originally calibrated and the current calibration. The average drift in item difficulty estimates was less than .01 standard deviations. The average impact of change in item difficulty estimates was less than the smallest reported difference on the score scale for two actual tests. The findings of the study indicate that an IRT scale can be stable enough to allow consistent measurement of student achievement.

VL - 12 UR - http://www.testpublishers.org/journal-of-applied-testing-technology ER - TY - ABST T1 - Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients Y1 - 2011 A1 - Giesinger, J. M. A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Arraras, J. I. A1 - Conroy, T. A1 - Gamper, E. M. A1 - Kemmler, G. A1 - King, M. T. A1 - Oberguggenberger, A. S. A1 - Velikova, G. A1 - Young, T. A1 - Holzner, B. A1 - Eortc-Qlg, E. O. AB - ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. JF - Health and Quality of Life Outcomes VL - 9 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Health Qual Life Outcomes. 2011 Mar 29;9(1):19. ER - TY - JOUR T1 - Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Jacobsen, J. A1 - Ackermann, R. A1 - Egüez, J. A1 - Ganguli, D. A1 - Rickard, P. A1 - Taylor, L. AB -

A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.

VL - 12 UR - http://www.testpublishers.org/journal-of-applied-testing-technology IS - 1 ER - TY - CONF T1 - Detecting DIF between Conventional and Computerized Adaptive Testing: A Monte Carlo Study T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Barth B. Riley A1 - Adam C. Carle KW - 95% Credible Interval KW - CAT KW - DIF KW - differential item function KW - modified robust Z statistic KW - Monte Carlo methodologies AB -

A comparison od two procedures, Modified Robust Z and 95% Credible Interval, were compared in a Monte Carlo study. Both procedures evidenced adequate control of false positive DIF results.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - THES T1 - Effects of Different Computerized Adaptive Testing Strategies of Recovery of Ability T2 - THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY Y1 - 2011 A1 - Kalender, I. AB -

The purpose of the present study is to compare ability estimations obtained from computerized adaptive testing (CAT) procedure with the paper and pencil test administration results of Student Selection Examination (SSE) science subtest considering different ability estimation methods and test termination rules. There are two phases in the present study. In the first phase, a post-hoc simulation was conducted to find out relationships between examinee ability levels estimated by CAT and paper and pencil test versions of the SSE. Maximum Likelihood Estimation and Expected A Posteriori were used as ability estimation method. Test termination rules were standard error threshold and fixed number of items. Second phase was actualized by implementing a CAT administration to a group of examinees to investigate performance of CAT administration in an environment other than simulated administration. Findings of post-hoc simulations indicated CAT could be implemented by using Expected A Posteriori estimation method with standard error threshold value of 0.30 or higher for SSE. Correlation between ability estimates obtained by CAT and real SSE was found to be 0.95. Mean of number of items given to examinees by CAT is 18.4. Correlation between live CAT and real SSE ability estimations was 0.74. Number of items used for CAT administration is approximately 50% of the items in paper and pencil SSE science subtest. Results indicated that CAT for SSE science subtest provided ability estimations with higher reliability with fewer items compared to paper and pencil format.

JF - THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY VL - Ph.D. ER - TY - JOUR T1 - A framework for the development of computerized adaptive tests JF - Practical Assessment Research & Evaluation Y1 - 2011 A1 - Thompson, N. A. A1 - Weiss, D. J. AB - A substantial amount of research has been conducted over the past 40 years on technical aspects of computerized adaptive testing (CAT), such as item selection algorithms, item exposure controls, and termination criteria. However, there is little literature providing practical guidance on the development of a CAT. This paper seeks to collate some of the available research methodologies into a general framework for the development of any CAT assessment. PB - Practical Assessment Research & Evaluation VL - 16 ER - TY - CONF T1 - From Reliability to Validity: Expanding Adaptive Testing Practice to Find the Most Valid Score for Each Test Taker T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Steven L. Wise KW - CAT KW - CIV KW - construct-irrelevant variance KW - Individual Score Validity KW - ISV KW - low test taking motivation KW - Reliability KW - validity AB -

CAT is an exception to the traditional conception of validity. It is one of the few examples of individualized testing. Item difficulty is tailored to each examinee. The intent, however, is increased efficiency. Focus on reliability (reduced standard error); Equivalence with paper & pencil tests is valued; Validity is enhanced through improved reliability.

How Else Might We Individualize Testing Using CAT?

An ISV-Based View of Validity

Test Event -- An examinee encounters a series of items in a particular context.

CAT Goal: individualize testing to address CIV threats to score validity (i.e., maximize ISV).

Some Research Issues:

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CONF T1 - A Heuristic Of CAT Item Selection Procedure For Testlets T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Yuehmei Chien A1 - David Shin A1 - Walter Denny Way KW - CAT KW - shadow test KW - testlets JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CONF T1 - High-throughput Health Status Measurement using CAT in the Era of Personal Genomics: Opportunities and Challenges T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Eswar Krishnan KW - CAT KW - health applications KW - PROMIS JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Hypothetical use of multidimensional adaptive testing for the assessment of student achievement in PISA. JF - Educational and Psychological Measurement Y1 - 2011 A1 - Frey, A. A1 - Seitz, N-N. VL - 71 ER - TY - CONF T1 - Impact of Item Drift on Candidate Ability Estimation T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Sarah Hagge A1 - Ada Woo A1 - Phil Dickison KW - item drift AB -

For large operational pools, candidate ability estimates appear robust to item drift, especially under conditions that may represent ‘normal’ amounts of drift. Even with ‘extreme’ conditions of drift (e.g., 20% of items drifting 1.00 logits), decision consistency was still high.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - ABST T1 - Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger Y1 - 2011 A1 - Pilkonis, P. A. A1 - Choi, S. W. A1 - Reise, S. P. A1 - Stover, A. M. A1 - Riley, W. T. A1 - Cella, D. JF - Assessment SN - 1073-1911 ER - TY - JOUR T1 - Item Selection Criteria With Practical Constraints for Computerized Classification Testing JF - Educational and Psychological Measurement Y1 - 2011 A1 - Lin, Chuan-Ju AB -

This study compares four item selection criteria for a two-category computerized classification testing: (1) Fisher information (FI), (2) Kullback—Leibler information (KLI), (3) weighted log-odds ratio (WLOR), and (4) mutual information (MI), with respect to the efficiency and accuracy of classification decision using the sequential probability ratio test as well as the extent of item usage. The comparability of the four item selection criteria are examined primarily under three types of item selection conditions: (1) using only the four item selection algorithms, (2) using the four item selection algorithms and content balancing control, and (3) using the four item selection algorithms, content balancing control, and item exposure control. The comparability of the four item selection criteria is also evaluated in two types of proficiency distribution and three levels of indifference region width. The results show that the differences of the four item selection criteria are washed out as more realistic constraints are imposed. Moreover, within two-category classification testing, the use of MI does not necessarily generate greater efficiency than FI, WLOR, and KLI, although MI might seem attractive for its general form of formula in item selection.

VL - 71 UR - http://epm.sagepub.com/content/71/1/20.abstract ER - TY - CONF T1 - Item Selection Methods based on Multiple Objective Approaches for Classification of Respondents into Multiple Levels T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Maaike van Groen A1 - Theo Eggen A1 - Bernard Veldkamp KW - adaptive classification test KW - CAT KW - item selection KW - sequential classification test AB -

Is it possible to develop new item selection methods which take advantage of the fact that we want to classify into multiple categories? New methods: Taking multiple points on the ability scale into account; Based on multiple objective approaches.

Conclusions

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - JATT Special Issue on Adaptive Testing: Welcome and Overview JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Thompson, N. A. VL - 12 UR - http://www.testpublishers.org/journal-of-applied-testing-technology ER - TY - JOUR T1 - Measuring Individual Growth With Conventional and Adaptive Tests JF - Journal of Methods and Measurement in the Social Sciences Y1 - 2011 A1 - Weiss, D. J. A1 - Von Minden, S. VL - 2 IS - 1 ER - TY - CONF T1 - Moving beyond Efficiency to Allow CAT to Provide Better Diagnostic Information T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Brian D. Bontempo KW - CAT KW - dianostic information KW - MIRT KW - Multiple unidimensional scales KW - psychomagic KW - smart CAT AB -
Future CATs will provide better diagnostic information to
–Examinees
–Regulators, Educators, Employers
–Test Developers
This goal will be accomplished by
–Smart CATs which collect additional information during the test
–Psychomagic
The time is now for Reporting
JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - National Tests in Denmark – CAT as a Pedagogic Tool JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Wandall, J. AB -

Testing and test results can be used in different ways. They can be used for regulation and control, but they can also be a pedagogic tool for assessment of student proficiency in order to target teaching, improve learning and facilitate local pedagogical leadership. To serve these purposes the test has to be used for low stakes purposes, and to ensure this, the Danish National test results are made strictly confidential by law. The only test results that are made public are the overall national results. Because of the test design, test results are directly comparable, offering potential for monitoring added value and developing new ways of using test results in a pedagogical context. This article gives the background and status for the development of the Danish national tests, describes what is special about these tests (e.g., Information Technology [IT]-based, 3 tests in 1, adaptive), how the national test are carried out, and what
is tested. Furthermore, it describes strategies for disseminating the results to the pupil, parents, teacher, headmaster and municipality; and how the results can be used by the teacher and headmaster.

VL - 12 IS - 1 ER - TY - JOUR T1 - A new adaptive testing algorithm for shortening health literacy assessments JF - BMC Medical Informatics and Decision Making Y1 - 2011 A1 - Kandula, S. A1 - Ancker, J.S. A1 - Kaufman, D.R. A1 - Currie, L.M. A1 - Qing, Z.-T. AB -

 

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178473/?tool=pmcentrez
VL - 11 IS - 52 ER - TY - JOUR T1 - A New Stopping Rule for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2011 A1 - Choi, Seung W. A1 - Grady, Matthew W. A1 - Dodd, Barbara G. AB -

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing (CAT). The predicted standard error reduction (PSER) stopping rule uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared with that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant.

VL - 71 UR - http://epm.sagepub.com/content/71/1/37.abstract ER - TY - CONF T1 - Optimal Calibration Designs for Computerized Adaptive Testing T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Angela Verschoor KW - balanced block design KW - CAT KW - item calibration KW - optimization KW - Rasch AB -

Optimaztion

How can we exploit the advantages of Balanced Block Design while keeping the logistics manageable?

Homogeneous Designs: Overlap between test booklets as regular as possible

Conclusions:

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CONF T1 - A Paradigm for Multinational Adaptive Testing T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - A Zara KW - CAT KW - multinational adaptive testing AB -

Impact of Issues in “Exported” Adaptive Testing

Goal is construct equivalency in the new environment

Research Questions

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores JF - Educational and Psychological Measurement Y1 - 2011 A1 - Gnambs, Timo A1 - Batinic, Bernad AB -

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification tests for two- and three-group scenarios (e.g., inferior, mediocre, and superior proficiencies). Results of two simulation experiments with generated and real responses (N = 2,000) to established personality scales of different length (12, 20, or 29 items) demonstrate that adaptive item presentations significantly reduce the number of items required to make such classification decisions while maintaining a consistent classification accuracy. Furthermore, the simulations highlight the importance of the selected test termination criterion, which has a significant impact on the average test length.

VL - 71 UR - http://epm.sagepub.com/content/71/6/1006.abstract ER - TY - CONF T1 - Practitioner’s Approach to Identify Item Drift in CAT T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Huijuan Meng A1 - Susan Steinkamp A1 - Paul Jones A1 - Joy Matthews-Lopez KW - CUSUM method KW - G2 statistic KW - IPA KW - item drift KW - item parameter drift KW - Lord's chi-square statistic KW - Raju's NCDIF JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Restrictive Stochastic Item Selection Methods in Cognitive Diagnostic Computerized Adaptive Testing JF - Journal of Educational Measurement Y1 - 2011 A1 - Wang, Chun A1 - Chang, Hua-Hua A1 - Huebner, Alan AB -

This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback-Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson-Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.

VL - 48 UR - http://dx.doi.org/10.1111/j.1745-3984.2011.00145.x ER - TY - CONF T1 - Small-Sample Shadow Testing T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Wallace Judd KW - CAT KW - shadow test JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CONF T1 - A Test Assembly Model for MST T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Angela Verschoor A1 - Ingrid Radtke A1 - Theo Eggen KW - CAT KW - mst KW - multistage testing KW - Rasch KW - routing KW - tif AB -

This study is just a short exploration in the matter of optimization of a MST. It is extremely hard or maybe impossible to chart influence of item pool and test specifications on optimization process. Simulations are very helpful in finding an acceptable MST.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Unproctored Internet test verification: Using adaptive confirmation testing JF - Organizational Research Methods Y1 - 2011 A1 - Makransky, G. A1 - Glas, C. A. W. VL - 14 ER - TY - CONF T1 - The Use of Decision Trees for Adaptive Item Selection and Score Estimation T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Barth B. Riley A1 - Rodney Funk A1 - Michael L. Dennis A1 - Richard D. Lennox A1 - Matthew Finkelman KW - adaptive item selection KW - CAT KW - decision tree AB -

Conducted post-hoc simulations comparing the relative efficiency, and precision of decision trees (using CHAID and CART) vs. IRT-based CAT.

Conclusions

Decision tree methods were more efficient than CAT

But,...

Conclusions

CAT selects items based on two criteria: Item location relative to current estimate of theta, Item discrimination

Decision Trees select items that best discriminate between groups defined by the total score.

CAT is optimal only when trait level is well estimated.
Findings suggest that combining decision tree followed by CAT item selection may be advantageous.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Using Item Response Theory and Adaptive Testing in Online Career Assessment JF - Journal of Career Assessment Y1 - 2011 A1 - Betz, Nancy E. A1 - Turner, Brandon M. AB -

The present article describes the potential utility of item response theory (IRT) and adaptive testing for scale evaluation and for web-based career assessment. The article describes the principles of both IRT and adaptive testing and then illustrates these with reference to data analyses and simulation studies of the Career Confidence Inventory (CCI). The kinds of information provided by IRT are shown to give a more precise look at scale quality across the trait continuum and also to permit the use of adaptive testing, where the items administered are tailored to the individual being tested. Such tailoring can significantly reduce testing time while maintaining high quality of measurement. This efficiency is especially useful when multiscale inventories and/or a large number of scales are to be administered. Readers are encouraged to consider using these advances in career assessment.

VL - 19 UR - http://jca.sagepub.com/cgi/content/abstract/19/3/274 ER - TY - CONF T1 - Walking the Tightrope: Using Better Content Control to Improve CAT T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Kathleen A. Gialluca KW - CAT KW - CAT evolution KW - test content AB -

All testing involves a balance between measurement precision and content considerations. CAT item-selection algorithms have evolved to accommodate content considerations. Reviews CAT evolution including: Original/”Pure” adaptive exams, Constrained CAT, Weighted-deviations method, Shadow-Test Approach, Testlets instead of fully adapted tests, Administration of one item may preclude the administration of other item(s), and item relationships.

Research Questions

 

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CHAP T1 - Adaptive Mastery Testing Using a Multidimensional IRT Model T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Glas, C. A. W. A1 - Vos, H. J. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Adaptive Tests for Measuring Anxiety and Depression T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Walter, O. B. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Assembling an Inventory of Multistage Adaptive Testing Systems T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Breithaupt, K A1 - Ariel, A. A1 - Hare, D. R. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - An automatic online calibration design in adaptive testing JF - Journal of Applied Testing Technology Y1 - 2010 A1 - Makransky, G. A1 - Glas, C. A. W. VL - 11 ER - TY - JOUR T1 - Bayesian item selection in constrained adaptive testing JF - Psicologica Y1 - 2010 A1 - Veldkamp, B. P. KW - computerized adaptive testing AB - Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item selection process. The Shadow Test Approach is a general purpose algorithm for administering constrained CAT. In this paper it is shown how the approach can be slightly modified to handle Bayesian item selection criteria. No differences in performance were found between the shadow test approach and the modifiedapproach. In a simulation study of the LSAT, the effects of Bayesian item selection criteria are illustrated. The results are compared to item selection based on Fisher Information. General recommendations about the use of Bayesian item selection criteria are provided. VL - 31 ER - TY - JOUR T1 - A Comparison of Content-Balancing Procedures for Estimating Multiple Clinical Domains in Computerized Adaptive Testing: Relative Precision, Validity, and Detection of Persons With Misfitting Responses JF - Applied Psychological Measurement Y1 - 2010 A1 - Barth B. Riley A1 - Michael L. Dennis A1 - Conrad, Kendon J. AB -

This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target number of items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.

VL - 34 UR - http://apm.sagepub.com/content/34/6/410.abstract ER - TY - JOUR T1 - A comparison of content-balancing procedures for estimating multiple clinical domains in computerized adaptive testing: Relative precision, validity, and detection of persons with misfitting responses JF - Applied Psychological Measurement Y1 - 2010 A1 - Riley, B. B. A1 - Dennis, M. L. A1 - Conrad, K. J. AB - This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target numberof items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.Keywords VL - 34 SN - 0146-62161552-3497 ER - TY - JOUR T1 - A Comparison of Item Selection Techniques for Testlets JF - Applied Psychological Measurement Y1 - 2010 A1 - Murphy, Daniel L. A1 - Dodd, Barbara G. A1 - Vaughn, Brandon K. AB -

This study examined the performance of the maximum Fisher’s information, the maximum posterior weighted information, and the minimum expected posterior variance methods for selecting items in a computerized adaptive testing system when the items were grouped in testlets. A simulation study compared the efficiency of ability estimation among the item selection techniques under varying conditions of local-item dependency when the response model was either the three-parameter-logistic item response theory or the three-parameter-logistic testlet response theory. The item selection techniques performed similarly within any particular condition, the practical implications of which are discussed within the article.

VL - 34 UR - http://apm.sagepub.com/content/34/6/424.abstract ER - TY - CONF T1 - Computerized adaptive testing based on decision trees T2 - 10th IEEE International Conference on Advanced Learning Technologies Y1 - 2010 A1 - Ueno, M. A1 - Songmuang, P. JF - 10th IEEE International Conference on Advanced Learning Technologies PB - IEEE Computer Sience CY - Sousse, Tunisia VL - 58 ER - TY - CHAP T1 - Constrained Adaptive Testing with Shadow Tests T2 - Elements of Adaptive Testing Y1 - 2010 A1 - van der Linden, W. J. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Designing and Implementing a Multistage Adaptive Test: The Uniform CPA Exam T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Melican, G.J. A1 - Breithaupt, K A1 - Zhang, Y. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Designing Item Pools for Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Designing item pools to optimize the functioning of a computerized adaptive test JF - Psychological Test and Assessment Modeling Y1 - 2010 A1 - Reckase, M. D. AB - Computerized adaptive testing (CAT) is a testing procedure that can result in improved precision for a specified test length or reduced test length with no loss of precision. However, these attractive psychometric features of CATs are only achieved if appropriate test items are available for administration. This set of test items is commonly called an “item pool.” This paper discusses the optimal characteristics for an item pool that will lead to the desired properties for a CAT. Then, a procedure is described for designing the statistical characteristics of the item parameters for an optimal item pool within an item response theory framework. Because true optimality is impractical, methods for achieving practical approximations to optimality are described. The results of this approach are shown for an operational testing program including comparisons to the results from the item pool currently used in that testing program.Key VL - 52 SN - 2190-0507 ER - TY - CHAP T1 - Detecting Person Misfit in Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. L. A. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM JF - Personality and Individual Differences Y1 - 2010 A1 - Egberink, I. J. L. A1 - Meijer, R. R. A1 - Veldkamp, B. P. A1 - Schakel, L. A1 - Smid, N. G. KW - CAT KW - computerized adaptive testing KW - CUSUM approach KW - person Fit AB - The scalability of individual trait scores on a computerized adaptive test (CAT) was assessed through investigating the consistency of individual item score patterns. A sample of N = 428 persons completed a personality CAT as part of a career development procedure. To detect inconsistent item score patterns, we used a cumulative sum (CUSUM) procedure. Combined information from the CUSUM, other personality measures, and interviews showed that similar estimated trait values may have a different interpretation.Implications for computer-based assessment are discussed. VL - 48 SN - 01918869 ER - TY - JOUR T1 - Deterioro de parámetros de los ítems en tests adaptativos informatizados: estudio con eCAT [Item parameter drift in computerized adaptive testing: Study with eCAT] JF - Psicothema Y1 - 2010 A1 - Abad, F. J. A1 - Olea, J. A1 - Aguado, D. A1 - Ponsoda, V. A1 - Barrada, J KW - *Software KW - Educational Measurement/*methods/*statistics & numerical data KW - Humans KW - Language AB -

En el presente trabajo se muestra el análisis realizado sobre un Test Adaptativo Informatizado (TAI) diseñado para la evaluación del nivel de inglés, denominado eCAT, con el objetivo de estudiar el deterioro de parámetros (parameter drift) producido desde la calibración inicial del banco de ítems. Se ha comparado la calibración original desarrollada para la puesta en servicio del TAI (N= 3224) y la calibración actual obtenida con las aplicaciones reales del TAI (N= 7254). Se ha analizado el Funcionamiento Diferencial de los Ítems (FDI) en función de los parámetros utilizados y se ha simulado el impacto que sobre el nivel de rasgo estimado tiene la variación en los parámetros. Los resultados muestran que se produce especialmente un deterioro de los parámetros a y c, que hay unimportante número de ítems del banco para los que existe FDI y que la variación de los parámetros produce un impacto moderado en la estimación de θ de los evaluados con nivel de inglés alto. Se concluye que los parámetros de los ítems se han deteriorado y deben ser actualizados.Item parameter drift in computerized adaptive testing: Study with eCAT. This study describes the parameter drift analysis conducted on eCAT (a Computerized Adaptive Test to assess the written English level of Spanish speakers). The original calibration of the item bank (N = 3224) was compared to a new calibration obtained from the data provided by most eCAT operative administrations (N =7254). A Differential Item Functioning (DIF) study was conducted between the original and the new calibrations. The impact that the new parameters have on the trait level estimates was obtained by simulation. Results show that parameter drift is found especially for a and c parameters, an important number of bank items show DIF, and the parameter change has a moderate impact on high-level-English θ estimates. It is then recommended to replace the original estimates by the new set. by the new set.

VL - 22 SN - 0214-9915 (Print)0214-9915 (Linking) N1 - Abad, Francisco JOlea, JulioAguado, DavidPonsoda, VicenteBarrada, Juan REnglish AbstractSpainPsicothemaPsicothema. 2010 May;22(2):340-7. ER - TY - JOUR T1 - Development and evaluation of a confidence-weighting computerized adaptive testing JF - Educational Technology & Society Y1 - 2010 A1 - Yen, Y. C. A1 - Ho, R. G. A1 - Chen, L. J. A1 - Chou, K. Y. A1 - Chen, Y. L. VL - 13(3) ER - TY - JOUR T1 - Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments JF - Sleep Y1 - 2010 A1 - Buysse, D. J. A1 - Yu, L. A1 - Moul, D. E. A1 - Germain, A. A1 - Stover, A. A1 - Dodds, N. E. A1 - Johnston, K. L. A1 - Shablesky-Cade, M. A. A1 - Pilkonis, P. A. KW - *Outcome Assessment (Health Care) KW - *Self Disclosure KW - Adult KW - Aged KW - Aged, 80 and over KW - Cross-Sectional Studies KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Psychometrics KW - Questionnaires KW - Reproducibility of Results KW - Sleep Disorders/*diagnosis KW - Young Adult AB - STUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions. VL - 33 SN - 0161-8105 (Print)0161-8105 (Linking) N1 - Buysse, Daniel JYu, LanMoul, Douglas EGermain, AnneStover, AngelaDodds, Nathan EJohnston, Kelly LShablesky-Cade, Melissa APilkonis, Paul AAR052155/AR/NIAMS NIH HHS/United StatesU01AR52155/AR/NIAMS NIH HHS/United StatesU01AR52158/AR/NIAMS NIH HHS/United StatesU01AR52170/AR/NIAMS NIH HHS/United StatesU01AR52171/AR/NIAMS NIH HHS/United StatesU01AR52177/AR/NIAMS NIH HHS/United StatesU01AR52181/AR/NIAMS NIH HHS/United StatesU01AR52186/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralValidation StudiesUnited StatesSleepSleep. 2010 Jun 1;33(6):781-92. U2 - 2880437 ER - TY - JOUR T1 - Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension JF - Quality of Life Research Y1 - 2010 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Chie, W. C. A1 - Conroy, T. A1 - Costantini, A. A1 - Fayers, P. A1 - Helbostad, J. A1 - Holzner, B. A1 - Kaasa, S. A1 - Singer, S. A1 - Velikova, G. A1 - Young, T. AB - PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. VL - 20 SN - 1573-2649 (Electronic)0962-9343 (Linking) N1 - Qual Life Res. 2010 Oct 23. ER - TY - JOUR T1 - Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms JF - Quality of Life Research Y1 - 2010 A1 - Choi, S. A1 - Reise, S. P. A1 - Pilkonis, P. A. A1 - Hays, R. D. A1 - Cella, D. VL - 19(1) ER - TY - BOOK T1 - Elements of Adaptive Testing Y1 - 2010 A1 - van der Linden, W. J. A1 - Glas, C. A. W. PB - Springer CY - New York ER - TY - CHAP T1 - Estimation of the Parameters in an Item-Cloning Model for Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Glas, C. A. W. A1 - van der Linden, W. J. A1 - Geerlings, H. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Features of the sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules JF - Journal of Applied Measurement Y1 - 2010 A1 - Blais, J. G. A1 - Raiche, G. AB - Whether paper and pencil or computerized adaptive, tests are usually described by a set of rules managing how they are administered: which item will be first, which should follow any given item, when to administer the last one. This article focus on the latter and looks at the effect of two stopping rules on the estimated sampling distribution of the ability estimate in a CAT: the number of items administered and the a priori determined size of the standard error of the ability estimate. VL - 11 SN - 1529-7713 (Print)1529-7713 (Linking) N1 - Blais, Jean-GuyRaiche, GillesUnited StatesJournal of applied measurementJ Appl Meas. 2010;11(4):424-31. ER - TY - CHAP T1 - Implementing the Graduate Management Admission Test Computerized Adaptive Test T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Rudner, L. M. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Improving Cognitive Diagnostic Computerized Adaptive Testing by Balancing Attribute Coverage: The Modified Maximum Global Discrimination Index Method JF - Educational and Psychological Measurement Y1 - 2010 A1 - Ying Cheng, AB -

This article proposes a new item selection method, namely, the modified maximum global discrimination index (MMGDI) method, for cognitive diagnostic computerized adaptive testing (CD-CAT). The new method captures two aspects of the appeal of an item: (a) the amount of contribution it can make toward adequate coverage of every attribute and (b) the amount of contribution it can make toward recovering the latent cognitive profile. A simulation study shows that the new method ensures adequate coverage of every attribute, which improves the validity of the test scores, and defensibility of the proposed uses of the test. Furthermore, compared with the original global discrimination index method, the MMGDI method improves the recovery rate of each attribute and of the entire cognitive profile, especially the latter. Therefore, the new method improves both the validity and reliability of the test scores from a CD-CAT program.

VL - 70 UR - http://epm.sagepub.com/content/70/6/902.abstract ER - TY - CHAP T1 - Innovative Items for Computerized Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Parshall, C. G. A1 - Harmes, J. C. A1 - Davey, T. A1 - Pashley, P. J. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - The Investigation of Differential Item Functioning in Adaptive Tests T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Zwick, R. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Item Parameter Estimation and Item Fit Analysis T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Glas, C. A. W. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Item Selection and Ability Estimation in Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - van der Linden, W. J. A1 - Pashley, P. J. JF - Elements of Adaptive Testing PB - Springer CY - New York ER - TY - JOUR T1 - Item Selection and Hypothesis Testing for the Adaptive Measurement of Change JF - Applied Psychological Measurement Y1 - 2010 A1 - Finkelman, M. D. A1 - Weiss, D. J. A1 - Kim-Kang, G. KW - change KW - computerized adaptive testing KW - individual change KW - Kullback–Leibler information KW - likelihood ratio KW - measuring change AB -

Assessing individual change is an important topic in both psychological and educational measurement. An adaptive measurement of change (AMC) method had previously been shown to exhibit greater efficiency in detecting change than conventional nonadaptive methods. However, little work had been done to compare different procedures within the AMC framework. This study introduced a new item selection criterion and two new test statistics for detecting change with AMC that were specifically designed for the paradigm of hypothesis testing. In two simulation sets, the new methods for detecting significant change improved on existing procedures by demonstrating better adherence to Type I error rates and substantially better power for detecting relatively small change. 

VL - 34 IS - 4 ER - TY - CHAP T1 - A Japanese Adaptive Test of English as a Foreign Language: Developmental and Operational Aspects T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Nogami, Y. A1 - Hayashi, N. JF - Elements of Adaptive Testing ER - TY - COMP T1 - Manual for CATSim: Comprehensive simulation of computerized adaptive testing Y1 - 2010 A1 - Weiss, D. J. A1 - Guyer, R. D. PB - Assessment Systems Corporation CY - St. Paul, MN ER - TY - JOUR T1 - Marginal likelihood inference for a model for item responses and response times JF - British Journal of Mathematical and Statistical Psychology Y1 - 2010 A1 - Glas, C. A. W. A1 - van der Linden, W. J. AB -

Marginal maximum-likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first-level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.

VL - 63 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Glas, Cees A Wvan der Linden, Wim JResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2010 Nov;63(Pt 3):603-26. Epub 2010 Jan 28. ER - TY - CHAP T1 - MATHCAT: A Flexible Testing System in Mathematics Education for Adults T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Verschoor, Angela J. A1 - Straetmans, G. J. J. M. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2010 A1 - Barrada, Juan Ramón A1 - Olea, Julio A1 - Ponsoda, Vicente A1 - Abad, Francisco José AB -

In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or more selection rules. A plot showing the performance of each selection rule for several maximum exposure rates is obtained and the whole plot is compared with other rule plots. The strategy was applied in a simulation study with fixed-length CATs for the comparison of six item selection rules: the point Fisher information, Fisher information weighted by likelihood, Kullback-Leibler weighted by likelihood, maximum information stratification with blocking, progressive and proportional methods. Our results show that there is no optimal rule for any overlap value or root mean square error (RMSE). The fact that a rule, for a given level of overlap, has lower RMSE than another does not imply that this pattern holds for another overlap rate. A fair comparison of the rules requires extensive manipulation of the maximum exposure rates. The best methods were the Kullback-Leibler weighted by likelihood, the proportional method, and the maximum information stratification method with blocking.

VL - 34 UR - http://apm.sagepub.com/content/34/6/438.abstract ER - TY - JOUR T1 - A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests JF - International Journal of Testing Y1 - 2010 A1 - Schmitt, T. A. A1 - Sass, D. A. A1 - Sullivan, J. R. A1 - Walker, C. M. VL - 10 UR - http://www.tandfonline.com/doi/abs/10.1080/15305058.2010.488098 ER - TY - CHAP T1 - Multidimensional Adaptive Testing with Kullback–Leibler Information Item Selection T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Mulder, J. A1 - van der Linden, W. J. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Multidimensionale adaptive Kompetenzdiagnostik: Ergebnisse zur Messeffizienz [Multidimensional adaptive testing of competencies: Results regarding measurement efficiency]. JF - Zeitschrift für Pädagogik Y1 - 2010 A1 - Frey, A. A1 - Seitz, N-N. VL - 56 ER - TY - CHAP T1 - Multistage Testing: Issues, Designs, and Research T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Zenisky, A. L. A1 - Hambleton, R. K. A1 - Luecht, RM JF - Elements of Adaptive Testing ER - TY - JOUR T1 - A new stopping rule for computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 2010 A1 - Choi, S. W. A1 - Grady, M. W. A1 - Dodd, B. G. AB - The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. VL - 70 SN - 0013-1644 (Print)0013-1644 (Linking) N1 - U01 AR052177-04/NIAMS NIH HHS/Educ Psychol Meas. 2010 Dec 1;70(6):1-17. U2 - 3028267 ER - TY - JOUR T1 - Online calibration via variable length computerized adaptive testing JF - Psychometrika Y1 - 2010 A1 - Chang, Y. I. A1 - Lu, H. Y. AB - Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration processes being proposed and discussed from both theoretical and practical perspectives. Among them, the online calibration may be one of the most cost effective processes. In this paper, under a variable length computerized adaptive testing scenario, we integrate the methods of adaptive design, sequential estimation, and measurement error models to solve online item calibration problems. The proposed sequential estimate of item parameters is shown to be strongly consistent and asymptotically normally distributed with a prechosen accuracy. Numerical results show that the proposed method is very promising in terms of both estimation accuracy and efficiency. The results of using calibrated items to estimate the latent trait levels are also reported. VL - 75 SN - 0033-3123 ER - TY - CHAP T1 - Principles of Multidimensional Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Segall, D. O. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - A Procedure for Controlling General Test Overlap in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2010 A1 - Chen, Shu-Ying AB -

To date, exposure control procedures that are designed to control test overlap in computerized adaptive tests (CATs) are based on the assumption of item sharing between pairs of examinees. However, in practice, examinees may obtain test information from more than one previous test taker. This larger scope of information sharing needs to be considered in conducting test overlap control. The purpose of this study is to propose a test overlap control method such that the proportion of overlapping items encountered by an examinee with a group of previous examinees (described as general test overlap rate) can be controlled. Results indicated that item exposure rate and general test overlap rate could be simultaneously controlled by implementing the procedure. In addition, these two indices were controlled on the fly without any iterative simulations conducted prior to operational CATs. Thus, the proposed procedure would be an efficient method for controlling both the item exposure and general test overlap in CATs.

VL - 34 UR - http://apm.sagepub.com/content/34/6/393.abstract ER - TY - CHAP T1 - Sequencing an Adaptive Test Battery T2 - Elements of Adaptive Testing Y1 - 2010 A1 - van der Linden, W. J. JF - Elements of Adaptive Testing ER - TY - COMP T1 - SimulCAT: Windows application that simulates computerized adaptive test administration Y1 - 2010 A1 - Han, K. T. UR - http://www.hantest.net/simulcat ER - TY - JOUR T1 - Stratified and maximum information item selection procedures in computer adaptive testing JF - Journal of Educational Measurement Y1 - 2010 A1 - Deng, H. A1 - Ansley, T. A1 - Chang, H.-H. VL - 47 ER - TY - JOUR T1 - Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing JF - Journal of Educational Measurement Y1 - 2010 A1 - Deng, Hui A1 - Ansley, Timothy A1 - Chang, Hua-Hua AB -

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.

VL - 47 UR - http://dx.doi.org/10.1111/j.1745-3984.2010.00109.x ER - TY - CHAP T1 - Testlet-Based Adaptive Mastery Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Vos, H. J. A1 - Glas, C. A. W. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Tests informatizados y otros nuevos tipos de tests [Computerized and other new types of tests] JF - Papeles del Psicólogo Y1 - 2010 A1 - Olea, J. A1 - Abad, F. J. A1 - Barrada, J AB - Recientemente se ha producido un considerable desarrollo de los tests adaptativos informatizados, en los que el test se adapta progresivamente al rendimiento del evaluando, y de otros tipos de tests: a) los test basados en modelos (se dispone de un modelo o teoría de cómo se responde a cada ítem, lo que permite predecir su dificultad), b) los tests ipsativos (el evaluado ha de elegir entre opciones que tienen parecida deseabilidad social, por lo que pueden resultar eficaces para controlar algunos sesgos de respuestas), c) los tests conductuales (miden rasgos que ordinariamente se han venido midiendo con autoinformes, mediante tareas que requieren respuestas no verbales) y d) los tests situacionales (en los que se presenta al evaluado una situación de conflicto laboral, por ejemplo, con varias posibles soluciones, y ha de elegir la que le parece la mejor descripción de lo que el haría en esa situación). El artículo comenta las características, ventajas e inconvenientes de todos ellos y muestra algunos ejemplos de tests concretos. Palabras clave: Test adaptativo informatizado, Test situacional, Test comportamental, Test ipsativo y generación automática de ítems.The paper provides a short description of some test types that are earning considerable interest in both research and applied areas. The main feature of a computerized adaptive test is that in despite of the examinees receiving different sets of items, their test scores are in the same metric and can be directly compared. Four other test types are considered: a) model-based tests (a model or theory is available to explain the item response process and this makes the prediction of item difficulties possible), b) ipsative tests (the examinee has to select one among two or more options with similar social desirability; so, these tests can help to control faking or other examinee’s response biases), c) behavioral tests (personality traits are measured from non-verbal responses rather than from self-reports), and d) situational tests (the examinee faces a conflictive situation and has to select the option that best describes what he or she will do). The paper evaluates these types of tests, comments on their pros and cons and provides some specific examples. Key words: Computerized adaptive test, Situational test, Behavioral test, Ipsative test and y automatic item generation. VL - 31 ER - TY - CHAP T1 - Three-Category Adaptive Classification Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Theo Eggen JF - Elements of Adaptive Testing ER - TY - JOUR T1 - The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research JF - Journal of Applied Measurement Y1 - 2010 A1 - Gershon, R. C. A1 - Rothrock, N. A1 - Hanrahan, R. A1 - Bass, M. A1 - Cella, D. AB - The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed as one of the first projects funded by the NIH Roadmap for Medical Research Initiative to re-engineer the clinical research enterprise. The primary goal of PROMIS is to build item banks and short forms that measure key health outcome domains that are manifested in a variety of chronic diseases which could be used as a "common currency" across research projects. To date, item banks, short forms and computerized adaptive tests (CAT) have been developed for 13 domains with relevance to pediatric and adult subjects. To enable easy delivery of these new instruments, PROMIS built a web-based resource (Assessment Center) for administering CATs and other self-report data, tracking item and instrument development, monitoring accrual, managing data, and storing statistical analysis results. Assessment Center can also be used to deliver custom researcher developed content, and has numerous features that support both simple and complicated accrual designs (branching, multiple arms, multiple time points, etc.). This paper provides an overview of the development of the PROMIS item banks and details Assessment Center functionality. VL - 11 SN - 1529-7713 ER - TY - ABST T1 - Validation of a computer-adaptive test to evaluate generic health-related quality of life Y1 - 2010 A1 - Rebollo, P. A1 - Castejon, I. A1 - Cuervo, J. A1 - Villa, G. A1 - Garcia-Cueto, E. A1 - Diaz-Cuervo, H. A1 - Zardain, P. C. A1 - Muniz, J. A1 - Alonso, J. AB - BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. JF - Health and Quality of Life Outcomes VL - 8 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Rebollo, PabloCastejon, IgnacioCuervo, JesusVilla, GuillermoGarcia-Cueto, EduardoDiaz-Cuervo, HelenaZardain, Pilar CMuniz, JoseAlonso, JordiSpanish CAT-Health Research GroupEnglandHealth Qual Life Outcomes. 2010 Dec 3;8:147. U2 - 3022567 ER - TY - JOUR T1 - Variations on Stochastic Curtailment in Sequential Mastery Testing JF - Applied Psychological Measurement Y1 - 2010 A1 - Finkelman, Matthew David AB -

In sequential mastery testing (SMT), assessment via computer is used to classify examinees into one of two mutually exclusive categories. Unlike paper-and-pencil tests, SMT has the capability to use variable-length stopping rules. One approach to shortening variable-length tests is stochastic curtailment, which halts examination if the probability of changing classification decisions is low. The estimation of such a probability is therefore a critical component of a stochastically curtailed test. This article examines several variations on stochastic curtailment where the key probability is estimated more aggressively than the standard formulation, resulting in additional savings in average test length (ATL). In two simulation sets, the variations successfully reduced the ATL, and in many cases the average loss, compared with the standard formulation.

VL - 34 UR - http://apm.sagepub.com/content/34/1/27.abstract ER - TY - CHAP T1 - Adaptive computer-based tasks under an assessment engineering paradigm Y1 - 2009 A1 - Luecht, RM CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 289 KB} ER - TY - CHAP T1 - Adaptive item calibration: A process for estimating item parameters within a computerized adaptive test Y1 - 2009 A1 - Kingsbury, G. G. AB - The characteristics of an adaptive test change the characteristics of the field testing that is necessary to add items to an existing measurement scale. The process used to add field-test items to the adaptive test might lead to scale drift or disrupt the test by administering items of inappropriate difficulty. The current study makes use of the transitivity of examinee and item in item response theory to describe a process for adaptive item calibration. In this process an item is successively administered to examinees whose ability levels match the performance of a given field-test item. By treating the item as if it were taking an adaptive test, examinees can be selected who provide the most information about the item at its momentary difficulty level. This should provide a more efficient procedure for estimating item parameters. The process is described within the context of the one-parameter logistic IRT model. The process is then simulated to identify whether it can be more accurate and efficient than random presentation of field-test items to examinees. Results indicated that adaptive item calibration might provide a viable approach to item calibration within the context of an adaptive test. It might be most useful for expanding item pools in settings with small sample sizes or needs for large numbers of items. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 286 KB} {PDF File, 286 KB} ER - TY - JOUR T1 - An adaptive testing system for supporting versatile educational assessment JF - Computers and Education Y1 - 2009 A1 - Huang, Y-M. A1 - Lin, Y-T. A1 - Cheng, S-C. KW - Architectures for educational technology system KW - Distance education and telelearning AB - With the rapid growth of computer and mobile technology, it is a challenge to integrate computer based test (CBT) with mobile learning (m-learning) especially for formative assessment and self-assessment. In terms of self-assessment, computer adaptive test (CAT) is a proper way to enable students to evaluate themselves. In CAT, students are assessed through a process that uses item response theory (IRT), a well-founded psychometric theory. Furthermore, a large item bank is indispensable to a test, but when a CAT system has a large item bank, the test item selection of IRT becomes more tedious. Besides the large item bank, item exposure mechanism is also essential to a testing system. However, IRT all lack the above-mentioned points. These reasons have motivated the authors to carry out this study. This paper describes a design issue aimed at the development and implementation of an adaptive testing system. The system can support several assessment functions and different devices. Moreover, the researchers apply a novel approach, particle swarm optimization (PSO) to alleviate the computational complexity and resolve the problem of item exposure. Throughout the development of the system, a formative evaluation was embedded into an integral part of the design methodology that was used for improving the system. After the system was formally released onto the web, some questionnaires and experiments were conducted to evaluate the usability, precision, and efficiency of the system. The results of these evaluations indicated that the system provides an adaptive testing for different devices and supports versatile assessment functions. Moreover, the system can estimate students' ability reliably and validly and conduct an adaptive test efficiently. Furthermore, the computational complexity of the system was alleviated by the PSO approach. By the approach, the test item selection procedure becomes efficient and the average best fitness values are very close to the optimal solutions. VL - 52 SN - 0360-1315 N1 - doi: DOI: 10.1016/j.compedu.2008.06.007 ER - TY - CHAP T1 - Adequacy of an item pool measuring proficiency in English language to implement a CAT procedure Y1 - 2009 A1 - Karino, C. A. A1 - Costa, D. R. A1 - Laros, J. A. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 160 KB} ER - TY - CHAP T1 - Applications of CAT in admissions to higher education in Israel: Twenty-two years of experience Y1 - 2009 A1 - Gafni, N. A1 - Cohen, Y. A1 - Roded, K A1 - Baumer, M A1 - Moshinsky, A. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 326 KB} ER - TY - CHAP T1 - An approach to implementing adaptive testing using item response theory both offline and online Y1 - 2009 A1 - Padaki, M. A1 - Natarajan, V. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 172 KB} ER - TY - CHAP T1 - Assessing the equivalence of Internet-based vs. paper-and-pencil psychometric tests. Y1 - 2009 A1 - Baumer, M A1 - Roded, K A1 - Gafni, N. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF File, 142 K ER - TY - CHAP T1 - An automatic online calibration design in adaptive testing Y1 - 2009 A1 - Makransky, G. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 365 KB} ER - TY - CHAP T1 - A burdened CAT: Incorporating response burden with maximum Fisher's information for item selection Y1 - 2009 A1 - Swartz, R.J.. A1 - Choi, S. W. AB - Widely used in various educational and vocational assessment applications, computerized adaptive testing (CAT) has recently begun to be used to measure patient-reported outcomes Although successful in reducing respondent burden, most current CAT algorithms do not formally consider it as part of the item selection process. This study used a loss function approach motivated by decision theory to develop an item selection method that incorporates respondent burden into the item selection process based on maximum Fisher information item selection. Several different loss functions placing varying degrees of importance on respondent burden were compared, using an item bank of 62 polytomous items measuring depressive symptoms. One dataset consisted of the real responses from the 730 subjects who responded to all the items. A second dataset consisted of simulated responses to all the items based on a grid of latent trait scores with replicates at each grid point. The algorithm enables a CAT administrator to more efficiently control the respondent burden without severely affecting the measurement precision than when using MFI alone. In particular, the loss function incorporating respondent burden protected respondents from receiving longer tests when their estimated trait score fell in a region where there were few informative items. CY - In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 374 KB} ER - TY - CONF T1 - Comparing methods to recalibrate drifting items in computerized adaptive testing T2 - American Educational Research Association Y1 - 2009 A1 - Masters, J. S. A1 - Muckle, T. J. A1 - Bontempo, B JF - American Educational Research Association CY - San Diego, CA ER - TY - CHAP T1 - Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing Y1 - 2009 A1 - Diao, Q. A1 - Reckase, M. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 342 KB} ER - TY - CHAP T1 - Comparison of adaptive Bayesian estimation and weighted Bayesian estimation in multidimensional computerized adaptive testing Y1 - 2009 A1 - Chen, P. H. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 308KB} ER - TY - JOUR T1 - Comparison of CAT Item Selection Criteria for Polytomous Items JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, Seung W. A1 - Swartz, Richard J. AB -

Item selection is a core component in computerized adaptive testing (CAT). Several studies have evaluated new and classical selection methods; however, the few that have applied such methods to the use of polytomous items have reported conflicting results. To clarify these discrepancies and further investigate selection method properties, six different selection methods are compared systematically. The results showed no clear benefit from more sophisticated selection criteria and showed one method previously believed to be superior—the maximum expected posterior weighted information (MEPWI)—to be mathematically equivalent to a simpler method, the maximum posterior weighted information (MPWI).

VL - 33 UR - http://apm.sagepub.com/content/33/6/419.abstract ER - TY - JOUR T1 - Comparison of CAT item selection criteria for polytomous items JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, S. W. A1 - Swartz, R.J.. VL - 33 ER - TY - JOUR T1 - Comparison of methods for controlling maximum exposure rates in computerized adaptive testing JF - Psicothema Y1 - 2009 A1 - Barrada, J A1 - Abad, F. J. A1 - Veldkamp, B. P. KW - *Numerical Analysis, Computer-Assisted KW - Psychological Tests/*standards/*statistics & numerical data AB - This paper has two objectives: (a) to provide a clear description of three methods for controlling the maximum exposure rate in computerized adaptive testing —the Symson-Hetter method, the restricted method, and the item-eligibility method— showing how all three can be interpreted as methods for constructing the variable sub-bank of items from which each examinee receives the items in his or her test; (b) to indicate the theoretical and empirical limitations of each method and to compare their performance. With the three methods, we obtained basically indistinguishable results in overlap rate and RMSE (differences in the third decimal place). The restricted method is the best method for controlling exposure rate, followed by the item-eligibility method. The worst method is the Sympson-Hetter method. The restricted method presents problems of sequential overlap rate. Our advice is to use the item-eligibility method, as it saves time and satisfies the goals of restricting maximum exposure. Comparación de métodos para el control de tasa máxima en tests adaptativos informatizados. Este artículo tiene dos objetivos: (a) ofrecer una descripción clara de tres métodos para el control de la tasa máxima en tests adaptativos informatizados, el método Symson-Hetter, el método restringido y el métodode elegibilidad del ítem, mostrando cómo todos ellos pueden interpretarse como métodos para la construcción del subbanco de ítems variable, del cual cada examinado recibe los ítems de su test; (b) señalar las limitaciones teóricas y empíricas de cada método y comparar sus resultados. Se obtienen resultados básicamente indistinguibles en tasa de solapamiento y RMSE con los tres métodos (diferencias en la tercera posición decimal). El método restringido es el mejor en el control de la tasa de exposición,seguido por el método de elegibilidad del ítem. El peor es el método Sympson-Hetter. El método restringido presenta un problema de solapamiento secuencial. Nuestra recomendación sería utilizar el método de elegibilidad del ítem, puesto que ahorra tiempo y satisface los objetivos de limitar la tasa máxima de exposición. VL - 21 SN - 0214-9915 (Print)0214-9915 (Linking) N1 - Barrada, Juan RamonAbad, Francisco JoseVeldkamp, Bernard PComparative StudySpainPsicothemaPsicothema. 2009 May;21(2):313-20. ER - TY - CHAP T1 - A comparison of three methods of item selection for computerized adaptive testing Y1 - 2009 A1 - Costa, D. R. A1 - Karino, C. A. A1 - Moura, F. A. S. A1 - Andrade, D. F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF file, 531 K ER - TY - CHAP T1 - Computerized adaptive testing by mutual information and multiple imputations Y1 - 2009 A1 - Thissen-Roe, A. AB - Over the years, most computerized adaptive testing (CAT) systems have used score estimation procedures from item response theory (IRT). IRT models have salutary properties for score estimation, error reporting, and next-item selection. However, some testing purposes favor scoring approaches outside IRT. Where a criterion metric is readily available and more relevant than the assessed construct, for example in the selection of job applicants, a predictive model might be appropriate (Scarborough & Somers, 2006). In these cases, neither IRT scoring nor a unidimensional assessment structure can be assumed. Yet, the primary benefit of CAT remains desirable: shorter assessments with minimal loss of accuracy due to unasked items. In such a case, it remains possible to create a CAT system that produces an estimated score from a subset of available items, recognizes differential item information given the emerging item response pattern, and optimizes the accuracy of the score estimated at every successive item. The method of multiple imputations (Rubin, 1987) can be used to simulate plausible scores given plausible response patterns to unasked items (Thissen-Roe, 2005). Mutual information can then be calculated in order to select an optimally informative next item (or set of items). Previously observed response patterns to two complete neural network-scored assessments were resampled according to MIMI CAT item selection. The reproduced CAT scores were compared to full-length assessment scores. Approximately 95% accurate assignment of examinees to one of three score categories was achieved with a 70%-80% reduction in median test length. Several algorithmic factors influencing accuracy and computational performance were examined. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 179 KB} ER - TY - CHAP T1 - Computerized adaptive testing for cognitive diagnosis Y1 - 2009 A1 - Cheng, Y CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 308 KB} ER - TY - CONF T1 - Computerized adaptive testing using the two parameter logistic model with ability-based guessing T2 - Paper presented at the International Meeting of the Psychometric Society. Cambridge Y1 - 2009 A1 - Shih, H.-J. A1 - Wang, W-C. JF - Paper presented at the International Meeting of the Psychometric Society. Cambridge ER - TY - CHAP T1 - Computerized classification testing in more than two categories by using stochastic curtailment Y1 - 2009 A1 - Wouda, J. T. A1 - Theo Eggen CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 298 KB} ER - TY - JOUR T1 - A conditional exposure control method for multidimensional adaptive testing JF - Journal of Educational Measurement Y1 - 2009 A1 - Finkelman, M. A1 - Nering, M. L. A1 - Roussos, L. A. VL - 46 ER - TY - JOUR T1 - A Conditional Exposure Control Method for Multidimensional Adaptive Testing JF - Journal of Educational Measurement Y1 - 2009 A1 - Matthew Finkelman A1 - Nering, Michael L. A1 - Roussos, Louis A. AB -

In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed for tests employing the unidimensional 3-PL model. The present article explores the issues associated with controlling exposure rates when a multidimensional item response theory (MIRT) model is utilized and exposure rates must be controlled conditional upon ability. This situation is complicated by the exponentially increasing number of possible ability values in multiple dimensions. The article introduces a new procedure, called the generalized Stocking-Lewis method, that controls the exposure rate for students of comparable ability as well as with respect to the overall population. A realistic simulation set compares the new method with three other approaches: Kullback-Leibler information with no exposure control, Kullback-Leibler information with unconditional Sympson-Hetter exposure control, and random item selection.

VL - 46 UR - http://dx.doi.org/10.1111/j.1745-3984.2009.01070.x ER - TY - JOUR T1 - Considerations about expected a posteriori estimation in adaptive testing: adaptive a priori, adaptive correction for bias, and adaptive integration interval JF - Journal of Applied Measurement Y1 - 2009 A1 - Raiche, G. A1 - Blais, J. G. KW - *Bias (Epidemiology) KW - *Computers KW - Data Interpretation, Statistical KW - Models, Statistical AB - In a computerized adaptive test, we would like to obtain an acceptable precision of the proficiency level estimate using an optimal number of items. Unfortunately, decreasing the number of items is accompanied by a certain degree of bias when the true proficiency level differs significantly from the a priori estimate. The authors suggest that it is possible to reduced the bias, and even the standard error of the estimate, by applying to each provisional estimation one or a combination of the following strategies: adaptive correction for bias proposed by Bock and Mislevy (1982), adaptive a priori estimate, and adaptive integration interval. VL - 10 SN - 1529-7713 (Print)1529-7713 (Linking) N1 - Raiche, GillesBlais, Jean-GuyUnited StatesJournal of applied measurementJ Appl Meas. 2009;10(2):138-56. ER - TY - CHAP T1 - Constrained item selection using a stochastically curtailed SPRT Y1 - 2009 A1 - Wouda, J. T. A1 - Theo Eggen CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 298 KB}{PDF File, 298 KB} ER - TY - JOUR T1 - Constraint-weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control JF - Educational and Psychological Measurement Y1 - 2009 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Douglas, J. A1 - Guo, F. VL - 69 ER - TY - JOUR T1 - Constraint-Weighted a-Stratification for Computerized Adaptive Testing With Nonstatistical Constraints JF - Educational and Psychological Measurement Y1 - 2009 A1 - Ying Cheng, A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A1 - Fanmin Guo, AB -

a-stratification is a method that utilizes items with small discrimination (a) parameters early in an exam and those with higher a values when more is learned about the ability parameter. It can achieve much better item usage than the maximum information criterion (MIC). To make a-stratification more practical and more widely applicable, a method for weighting the item selection process in a-stratification as a means of satisfying multiple test constraints is proposed. This method is studied in simulation against an analogous method without stratification as well as a-stratification using descending-rather than ascending-a procedures. In addition, a variation of a-stratification that allows for unbalanced usage of a parameters is included in the study to examine the trade-off between efficiency and exposure control. Finally, MIC and randomized item selection are included as baseline measures. Results indicate that the weighting mechanism successfully addresses the constraints, that stratification helps to a great extent balancing exposure rates, and that the ascending-a design improves measurement precision.

VL - 69 UR - http://epm.sagepub.com/content/69/1/35.abstract ER - TY - CHAP T1 - Criterion-related validity of an innovative CAT-based personality measure Y1 - 2009 A1 - Schneider, R. J. A1 - McLellan, R. A. A1 - Kantrowitz, T. M. A1 - Houston, J. S. A1 - Borman, W. C. AB - This paper describes development and initial criterion-related validation of the PreVisor Computer Adaptive Personality Scales (PCAPS), a computerized adaptive testing-based personality measure that uses an ideal point IRT model based on forced-choice, paired-comparison responses. Based on results from a large consortium study, a composite of six PCAPS scales identified as relevant to the population of interest (first-line supervisors) had an estimated operational validity against an overall job performance criterion of ρ = .25. Uncorrected and corrected criterion-related validity results for each of the six PCAPS scales making up the composite are also reported. Because the PCAPS algorithm computes intermediate scale scores until a stopping rule is triggered, we were able to graph number of statement-pairs presented against criterion-related validities. Results showed generally monotonically increasing functions. However, asymptotic validity levels, or at least a reduction in the rate of increase in slope, were often reached after 5-7 statement-pairs were presented. In the case of the composite measure, there was some evidence that validities decreased after about six statement-pairs. A possible explanation for this is provided. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 163 KB} ER - TY - CHAP T1 - Developing item variants: An empirical study Y1 - 2009 A1 - Wendt, A. A1 - Kao, S. A1 - Gorham, J. A1 - Woo, A. AB - Large-scale standardized test have been widely used for educational and licensure testing. In computerized adaptive testing (CAT), one of the practical concerns for maintaining large-scale assessments is to ensure adequate numbers of high-quality items that are required for item pool functioning. Developing items at specific difficulty levels and for certain areas of test plans is a wellknown challenge. The purpose of this study was to investigate strategies for varying items that can effectively generate items at targeted difficulty levels and specific test plan areas. Each variant item generation model was developed by decomposing selected source items possessing ideal measurement properties and targeting the desirable content domains. 341 variant items were generated from 72 source items. Data were collected from six pretest periods. Items were calibrated using the Rasch model. Initial results indicate that variant items showed desirable measurement properties. Additionally, compared to an average of approximately 60% of the items passing pretest criteria, an average of 84% of the variant items passed the pretest criteria. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 194 KB} ER - TY - JOUR T1 - Development and preliminary testing of a computerized adaptive assessment of chronic pain JF - Journal of Pain Y1 - 2009 A1 - Anatchkova, M. D. A1 - Saris-Baglama, R. N. A1 - Kosinski, M. A1 - Bjorner, J. B. KW - *Computers KW - *Questionnaires KW - Activities of Daily Living KW - Adaptation, Psychological KW - Chronic Disease KW - Cohort Studies KW - Disability Evaluation KW - Female KW - Humans KW - Male KW - Middle Aged KW - Models, Psychological KW - Outcome Assessment (Health Care) KW - Pain Measurement/*methods KW - Pain, Intractable/*diagnosis/psychology KW - Psychometrics KW - Quality of Life KW - User-Computer Interface AB - The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain. VL - 10 SN - 1528-8447 (Electronic)1526-5900 (Linking) N1 - Anatchkova, Milena DSaris-Baglama, Renee NKosinski, MarkBjorner, Jakob B1R43AR052251-01A1/AR/NIAMS NIH HHS/United StatesEvaluation StudiesResearch Support, N.I.H., ExtramuralUnited StatesThe journal of pain : official journal of the American Pain SocietyJ Pain. 2009 Sep;10(9):932-43. U2 - 2763618 ER - TY - JOUR T1 - Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis JF - Rehabilitation Psychology Y1 - 2009 A1 - Forkmann, T. A1 - Boecker, M. A1 - Norra, C. A1 - Eberle, N. A1 - Kircher, T. A1 - Schauerte, P. A1 - Mischke, K. A1 - Westhofen, M. A1 - Gauggel, S. A1 - Wirtz, M. KW - Adaptation, Psychological KW - Adult KW - Aged KW - Depressive Disorder/*diagnosis/psychology KW - Diagnosis, Computer-Assisted KW - Female KW - Heart Diseases/*psychology KW - Humans KW - Male KW - Mental Disorders/*psychology KW - Middle Aged KW - Models, Statistical KW - Otorhinolaryngologic Diseases/*psychology KW - Personality Assessment/statistics & numerical data KW - Personality Inventory/*statistics & numerical data KW - Psychometrics/statistics & numerical data KW - Questionnaires KW - Reproducibility of Results KW - Sick Role AB - OBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. VL - 54 SN - 0090-5550 (Print)0090-5550 (Linking) N1 - Forkmann, ThomasBoecker, MarenNorra, ChristineEberle, NicoleKircher, TiloSchauerte, PatrickMischke, KarlWesthofen, MartinGauggel, SiegfriedWirtz, MarkusResearch Support, Non-U.S. Gov'tUnited StatesRehabilitation psychologyRehabil Psychol. 2009 May;54(2):186-97. ER - TY - JOUR T1 - Diagnostic classification models and multidimensional adaptive testing: A commentary on Rupp and Templin. JF - Measurement: Interdisciplinary Research and Perspectives Y1 - 2009 A1 - Frey, A. A1 - Carstensen, C. H. VL - 7 ER - TY - JOUR T1 - Direct and inverse problems of item pool design for computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Belov, D. I. A1 - Armstrong, R. D. VL - 69 ER - TY - JOUR T1 - Direct and Inverse Problems of Item Pool Design for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Belov, Dmitry I. A1 - Armstrong, Ronald D. AB -

The recent literature on computerized adaptive testing (CAT) has developed methods for creating CAT item pools from a large master pool. Each CAT pool is designed as a set of nonoverlapping forms reflecting the skill levels of an assumed population of test takers. This article presents a Monte Carlo method to obtain these CAT pools and discusses its advantages over existing methods. Also, a new problem is considered that finds a population ability density function best matching the master pool. An analysis of the solution to this new problem provides testing organizations with effective guidance for maintaining their master pools. Computer experiments with a pool of Law School Admission Test items and its assembly constraints are presented.

VL - 69 UR - http://epm.sagepub.com/content/69/4/533.abstract ER - TY - CHAP T1 - Effect of early misfit in computerized adaptive testing on the recovery of theta Y1 - 2009 A1 - Guyer, R. D. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 212 KB} ER - TY - JOUR T1 - Effekte des adaptiven Testens auf die Moti¬vation zur Testbearbeitung [Effects of adaptive testing on test taking motivation]. JF - Diagnostica Y1 - 2009 A1 - Frey, A. A1 - Hartig, J. A1 - Moosbrugger, H. VL - 55 ER - TY - JOUR T1 - Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application JF - International Journal for Methods in Psychiatric Research Y1 - 2009 A1 - Fliege, H. A1 - Becker, J. A1 - Walter, O. B. A1 - Rose, M. A1 - Bjorner, J. B. A1 - Klapp, B. F. AB - In the past, a German Computerized Adaptive Test, based on Item Response Theory (IRT), was developed for purposes of assessing the construct depression [Computer-adaptive test for depression (D-CAT)]. This study aims at testing the feasibility and validity of the real computer-adaptive application.The D-CAT, supplied by a bank of 64 items, was administered on personal digital assistants (PDAs) to 423 consecutive patients suffering from psychosomatic and other medical conditions (78 with depression). Items were adaptively administered until a predetermined reliability (r >/= 0.90) was attained. For validation purposes, the Hospital Anxiety and Depression Scale (HADS), the Centre for Epidemiological Studies Depression (CES-D) scale, and the Beck Depression Inventory (BDI) were administered. Another sample of 114 patients was evaluated using standardized diagnostic interviews [Composite International Diagnostic Interview (CIDI)].The D-CAT was quickly completed (mean 74 seconds), well accepted by the patients and reliable after an average administration of only six items. In 95% of the cases, 10 items or less were needed for a reliable score estimate. Correlations between the D-CAT and the HADS, CES-D, and BDI ranged between r = 0.68 and r = 0.77. The D-CAT distinguished between diagnostic groups as well as established questionnaires do.The D-CAT proved an efficient, well accepted and reliable tool. Discriminative power was comparable to other depression measures, whereby the CAT is shorter and more precise. Item usage raises questions of balancing the item selection for content in the future. Copyright (c) 2009 John Wiley & Sons, Ltd. VL - 18 SN - 1049-8931 (Print) N1 - Journal articleInternational journal of methods in psychiatric researchInt J Methods Psychiatr Res. 2009 Feb 4. ER - TY - CHAP T1 - An evaluation of a new procedure for computing information functions for Bayesian scores from computerized adaptive tests Y1 - 2009 A1 - Ito, K. A1 - Pommerich, M A1 - Segall, D. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 571 KB} ER - TY - JOUR T1 - An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception JF - Journal of Clinical Epidemiology Y1 - 2009 A1 - Kocalevent, R. D. A1 - Rose, M. A1 - Becker, J. A1 - Walter, O. B. A1 - Fliege, H. A1 - Bjorner, J. B. A1 - Kleiber, D. A1 - Klapp, B. F. KW - *Diagnosis, Computer-Assisted KW - Adolescent KW - Adult KW - Aged KW - Aged, 80 and over KW - Confidence Intervals KW - Female KW - Humans KW - Male KW - Middle Aged KW - Perception KW - Quality of Health Care/*standards KW - Questionnaires KW - Reproducibility of Results KW - Sickness Impact Profile KW - Stress, Psychological/*diagnosis/psychology KW - Treatment Outcome AB - OBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making. VL - 62 SN - 1878-5921 (Electronic)0895-4356 (Linking) N1 - Kocalevent, Ruya-DanielaRose, MatthiasBecker, JanineWalter, Otto BFliege, HerbertBjorner, Jakob BKleiber, DieterKlapp, Burghard FEvaluation StudiesUnited StatesJournal of clinical epidemiologyJ Clin Epidemiol. 2009 Mar;62(3):278-87, 287.e1-3. Epub 2008 Jul 18. ER - TY - CHAP T1 - An examination of decision-theory adaptive testing procedures Y1 - 2009 A1 - Rudner, L. M. AB - This research examined three ways to adaptively select items using decision theory: a traditional decision theory sequential testing approach (expected minimum cost), information gain (modeled after Kullback-Leibler), and a maximum discrimination approach, and then compared them all against an approach using maximum IRT Fisher information. It also examined the use of Wald’s (1947) wellknown sequential probability ratio test, SPRT, as a test termination rule in this context. The minimum cost approach was notably better than the best-case possibility for IRT. Information gain, which is based on entropy and comes from information theory, was almost identical to minimum cost. The simple approach using the item that best discriminates between the two most likely classifications also fared better than IRT, but not as well as information gain or minimum cost. Through Wald’s SPRT, large percentages of examinees can be accurately classified with very few items. With only 25 sequentially selected items, for example, approximately 90% of the simulated NAEP examinees were classified with 86% accuracy. The advantages of the decision theory model are many—the model yields accurate mastery state classifications, can use a small item pool, is simple to implement, requires little pretesting, is applicable to criterion-referenced tests, can be used in diagnostic testing, can be adapted to yield classifications on multiple skills, and should be easy to explain to non-statisticians. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 203 KB} ER - TY - CHAP T1 - Features of J-CAT (Japanese Computerized Adaptive Test) Y1 - 2009 A1 - Imai, S. A1 - Ito, S. A1 - Nakamura, Y. A1 - Kikuchi, K. A1 - Akagi, Y. A1 - Nakasono, H. A1 - Honda, A. A1 - Hiramura, T. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 655KB} ER - TY - JOUR T1 - Firestar: Computerized adaptive testing simulation program for polytomous IRT models JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, S. W. VL - 33 SN - 1552-3497 (Electronic)0146-6216 (Linking) N1 - U01 AR052177-04/NIAMS NIH HHS/United StatesJournal articleApplied psychological measurementAppl Psychol Meas. 2009 Nov 1;33(8):644-645. U2 - 2790213 ER - TY - JOUR T1 - Firestar: Computerized adaptive testing simulation program for polytomous IRT models JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, S. W. VL - 33 ER - TY - RPRT T1 - Gradual maximum information ratio approach to item selection in computerized adaptive testing Y1 - 2009 A1 - Han, K. T. JF - GMAC Research Reports PB - Graduate Management Admissions Council CY - McLean, VA. USA ER - TY - CHAP T1 - A gradual maximum information ratio approach to item selection in computerized adaptive testing Y1 - 2009 A1 - Han, K. T. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 391 KB} ER - TY - CHAP T1 - Guess what? Score differences with rapid replies versus omissions on a computerized adaptive test Y1 - 2009 A1 - Talento-Miller, E. A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 215 KB} ER - TY - CHAP T1 - A hybrid simulation procedure for the development of CATs Y1 - 2009 A1 - Nydick, S. W. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 258 KB} ER - TY - JOUR T1 - Item response theory and clinical measurement JF - Annual Review of Clinical Psychology Y1 - 2009 A1 - Reise, S. P. A1 - Waller, N. G. KW - *Psychological Theory KW - Humans KW - Mental Disorders/diagnosis/psychology KW - Psychological Tests KW - Psychometrics KW - Quality of Life KW - Questionnaires AB - In this review, we examine studies that use item response theory (IRT) to explore the psychometric properties of clinical measures. Next, we consider how IRT has been used in clinical research for: scale linking, computerized adaptive testing, and differential item functioning analysis. Finally, we consider the scale properties of IRT trait scores. We conclude that there are notable differences between cognitive and clinical measures that have relevance for IRT modeling. Future research should be directed toward a better understanding of the metric of the latent trait and the psychological processes that lead to individual differences in item response behaviors. VL - 5 SN - 1548-5951 (Electronic) N1 - Reise, Steven PWaller, Niels GU01 AR 52177/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralReviewUnited StatesAnnual review of clinical psychologyAnnu Rev Clin Psychol. 2009;5:27-48. ER - TY - CHAP T1 - Item selection and hypothesis testing for the adaptive measurement of change Y1 - 2009 A1 - Finkelman, M. A1 - Weiss, D. J. A1 - Kim-Kang, G. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 228 KB} ER - TY - JOUR T1 - Item Selection in Computerized Classification Testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Thompson, Nathan A. AB -

Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points.

VL - 69 UR - http://epm.sagepub.com/content/69/5/778.abstract ER - TY - JOUR T1 - Item selection in computerized classification testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Thompson, N. A. AB - Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points. VL - 69 SN - 0013-1644 ER - TY - JOUR T1 - Item selection rules in computerized adaptive testing: Accuracy and security JF - Methodology Y1 - 2009 A1 - Barrada, J A1 - Olea, J. A1 - Ponsoda, V. A1 - Abad, F. J. VL - 5 N1 - (PDF file, 445 KB) ER - TY - CHAP T1 - Item selection with biased-coin up-and-down designs Y1 - 2009 A1 - Sheng, Y. A1 - Sheng, Z. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 748 KB} ER - TY - JOUR T1 - I've Fallen and I Can't Get Up: Can High-Ability Students Recover From Early Mistakes in CAT? JF - Applied Psychological Measurement Y1 - 2009 A1 - Rulison, Kelly L. A1 - Loken, Eric AB -

A difficult result to interpret in Computerized Adaptive Tests (CATs) occurs when an ability estimate initially drops and then ascends continuously until the test ends, suggesting that the true ability may be higher than implied by the final estimate. This study explains why this asymmetry occurs and shows that early mistakes by high-ability students can lead to considerable underestimation, even in tests with 45 items. The opposite response pattern, where low-ability students start with lucky guesses, leads to much less bias. The authors show that using Barton and Lord's four-parameter model (4PM) and a less informative prior can lower bias and root mean square error (RMSE) for high-ability students with a poor start, as the CAT algorithm ascends more quickly after initial underperformance. Results also show that the 4PM slightly outperforms a CAT in which less discriminating items are initially used. The practical implications and relevance for psychological measurement more generally are discussed.

VL - 33 UR - http://apm.sagepub.com/content/33/2/83.abstract ER - TY - JOUR T1 - I've fallen and I can't get up: can high-ability students recover from early mistakes in CAT? JF - Applied Psychological Measurement Y1 - 2009 A1 - Rulison, K., A1 - Loken, E. VL - 33(2) ER - TY - JOUR T1 - I've Fallen and I Can't Get Up: Can High-Ability Students Recover From Early Mistakes in CAT? JF - Applied Psychological Measurement Y1 - 2009 A1 - Rulison, Kelly L. A1 - Loken, Eric AB -

A difficult result to interpret in Computerized Adaptive Tests (CATs) occurs when an ability estimate initially drops and then ascends continuously until the test ends, suggesting that the true ability may be higher than implied by the final estimate. This study explains why this asymmetry occurs and shows that early mistakes by high-ability students can lead to considerable underestimation, even in tests with 45 items. The opposite response pattern, where low-ability students start with lucky guesses, leads to much less bias. The authors show that using Barton and Lord's four-parameter model (4PM) and a less informative prior can lower bias and root mean square error (RMSE) for high-ability students with a poor start, as the CAT algorithm ascends more quickly after initial underperformance. Results also show that the 4PM slightly outperforms a CAT in which less discriminating items are initially used. The practical implications and relevance for psychological measurement more generally are discussed.

VL - 33 UR - http://apm.sagepub.com/content/33/2/83.abstract ER - TY - JOUR T1 - A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2009 A1 - Doong, S. H. AB -

The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter–type simulation. Based on these simulated parameters, a functional relation k = fP (a, b, c) relating IPs to IEPs of P is obtained by an artificial neural network and used to estimate IEPs of Q without tedious simulation. Extensive experiments using real and synthetic pools showed that this approach worked pretty well for many variants of the Sympson and Hetter procedure. It worked excellently for the conditional Stocking and Lewis multinomial selection procedure and the Chen and Lei item exposure and test overlap control procedure. This study provides the first step in an alternative means to estimate IEPs without iterative simulation.

VL - 34 UR - http://jeb.sagepub.com/cgi/content/abstract/34/4/530 ER - TY - CHAP T1 - Kullback-Leibler information in multidimensional adaptive testing: theory and application Y1 - 2009 A1 - Wang, C. A1 - Chang, Hua-Hua AB - Built on multidimensional item response theory (MIRT), multidimensional adaptive testing (MAT) can, in principle, provide a promising choice to ensuring efficient estimation of each ability dimension in a multidimensional vector. Currently, two item selection procedures have been developed for MAT, one based on Fisher information embedded within a Bayesian framework, and the other powered by Kullback-Leibler (KL) information. It is well-known that in unidimensional IRT that the second derivative of KL information (also termed “global information”) is Fisher information evaluated atθ 0. This paper first generalizes the relationship between these two types of information in two ways—the analytical result is given as well as the graphical representation, to enhance interpretation and understanding. Second, a KL information index is constructed for MAT, which represents the integration of KL nformation over all of the ability dimensions. This paper further discusses how this index correlates with the item discrimination parameters. The analytical results would lay foundation for future development of item selection methods in MAT which can help equalize the item exposure rate. Finally, a simulation study is conducted to verify the above results. The connection between the item parameters, item KL information, and item exposure rate is demonstrated for empirical MAT delivered by an item bank calibrated under two-dimensional IRT. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 316 KB} ER - TY - CHAP T1 - Limiting item exposure for target difficulty ranges in a high-stakes CAT Y1 - 2009 A1 - Li, X. A1 - Becker, K. A1 - Gorham, J. A1 - Woo, A. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. {PDF File, 1. N1 - MB} ER - TY - JOUR T1 - Logistics of collecting patient-reported outcomes (PROs) in clinical practice: an overview and practical examples JF - Quality of Life Research Y1 - 2009 A1 - Rose, M. A1 - Bezjak, A. AB - PURPOSE: Interest in collecting patient-reported outcomes (PROs), such as health-related quality of life (HRQOL), health status reports, and patient satisfaction is on the rise and practical aspects of collecting PROs in clinical practice are becoming more important. The purpose of this paper is to draw the attention to a number of issues relevant for a successful integration of PRO measures into the daily work flow of busy clinical settings. METHODS: The paper summarizes the results from a breakout session held at an ISOQOL special topic conference for PRO measures in clinical practice in 2007. RESULTS: Different methodologies of collecting PROs are discussed, and the support needed for each methodology is highlighted. The discussion is illustrated by practical real-life examples from early adaptors who administered paper-pencil, or electronic PRO assessments (ePRO) for more than a decade. The paper also reports about new experiences with more recent technological developments, such as SmartPens and Computer Adaptive Tests (CATs) in daily practice. CONCLUSIONS: Methodological and logistical issues determine the resources needed for a successful integration of PRO measures into daily work flow procedures and influence significantly the usefulness of PRO data for clinical practice. VL - 18 SN - 0962-9343 (Print) N1 - Rose, MatthiasBezjak, AndreaNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 Feb;18(1):125-36. Epub 2009 Jan 20. ER - TY - JOUR T1 - The maximum priority index method for severely constrained item selection in computerized adaptive testing JF - British Journal of Mathematical and Statistical Psychology Y1 - 2009 A1 - Cheng, Y A1 - Chang, Hua-Hua KW - Aptitude Tests/*statistics & numerical data KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Educational Measurement/*statistics & numerical data KW - Humans KW - Mathematical Computing KW - Models, Statistical KW - Personality Tests/*statistics & numerical data KW - Psychometrics/*statistics & numerical data KW - Reproducibility of Results KW - Software AB - This paper introduces a new heuristic approach, the maximum priority index (MPI) method, for severely constrained item selection in computerized adaptive testing. Our simulation study shows that it is able to accommodate various non-statistical constraints simultaneously, such as content balancing, exposure control, answer key balancing, and so on. Compared with the weighted deviation modelling method, it leads to fewer constraint violations and better exposure control while maintaining the same level of measurement precision. VL - 62 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Cheng, YingChang, Hua-HuaResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2009 May;62(Pt 2):369-83. Epub 2008 Jun 2. ER - TY - JOUR T1 - Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. A1 - Hambleton, R. K. A1 - Montpetit, K. A1 - Bilodeau, N. A1 - Gorton, G. E. A1 - Watson, K. A1 - Tucker, C. A. KW - *Computer Simulation KW - *Health Status KW - *Models, Statistical KW - Adaptation, Psychological KW - Adolescent KW - Cerebral Palsy/*physiopathology KW - Child KW - Child, Preschool KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Massachusetts KW - Pennsylvania KW - Questionnaires KW - Young Adult AB - PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AHambleton, Ronald KMontpetit, KathleenBilodeau, NathalieGorton, George EWatson, KyleTucker, Carole AK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 Apr;18(3):359-70. Epub 2009 Feb 17. U2 - 2692519 ER - TY - CHAP T1 - The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: CAT software Y1 - 2009 A1 - Thompson, N. A. AB - Development of computerized adaptive tests (CAT) requires a number of appropriate software tools. This paper describes the development of two new CAT software programs. CATSIM has been designed specifically to conduct several different kinds of simulation studies, which are necessary for planning purposes as well as properly designing live CATs. FastCAT is a software system for banking items and publishing CAT tests as standalone files, to be administered anywhere. Both are available for public use. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 283 KB} ER - TY - CHAP T1 - The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: IRT software Y1 - 2009 A1 - Thissen, D. AB - IRTPRO (Item Response Theory for Patient-Reported Outcomes) is an entirely new application for item calibration and test scoring using IRT. IRTPRO implements algorithms for maximum likelihood estimation of item parameters (item calibration) for several unidimensional and multidimensional item response theory (IRT) models for dichotomous and polytomous item responses. In addition, the software provides computation of goodness-of-fit indices, statistics for the diagnosis of local dependence and for the detection of differential item functioning (DIF), and IRT scaled scores. This paper illustrates the use, and some capabilities, of the software. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF File, 817 K ER - TY - JOUR T1 - A mixed integer programming model for multiple stage adaptive testing JF - European Journal of Operational Research Y1 - 2009 A1 - Edmonds, J. A1 - Armstrong, R. D. KW - Education KW - Integer programming KW - Linear programming AB - The last decade has seen paper-and-pencil (P&P) tests being replaced by computerized adaptive tests (CATs) within many testing programs. A CAT may yield several advantages relative to a conventional P&P test. A CAT can determine the questions or test items to administer, allowing each test form to be tailored to a test taker's skill level. Subsequent items can be chosen to match the capability of the test taker. By adapting to a test taker's ability, a CAT can acquire more information about a test taker while administering fewer items. A Multiple Stage Adaptive test (MST) provides a means to implement a CAT that allows review before the administration. The MST format is a hybrid between the conventional P&P and CAT formats. This paper presents mixed integer programming models for MST assembly problems. Computational results with commercial optimization software will be given and advantages of the models evaluated. VL - 193 SN - 0377-2217 N1 - doi: DOI: 10.1016/j.ejor.2007.10.047 ER - TY - JOUR T1 - Multidimensional adaptive testing in educational and psychological measurement: Current state and future challenges JF - Studies in Educational Evaluation Y1 - 2009 A1 - Frey, A. A1 - Seitz, N-N. AB - The paper gives an overview of multidimensional adaptive testing (MAT) and evaluates its applicability in educational and psychological testing. The approach of Segall (1996) is described as a general framework for MAT. The main advantage of MAT is its capability to increase measurement efficiency. In simulation studies conceptualizing situations typical to large scale assessments, the number of presented items was reduced by MAT by about 30–50% compared to unidimensional adaptive testing and by about 70% compared to fixed item testing holding measurement precision constant. Empirical results underline these findings. Before MAT is used routinely some open questions should be answered first. After that, MAT represents a very promising approach to highly efficient simultaneous testing of multiple competencies. VL - 35 SN - 0191491X ER - TY - JOUR T1 - Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection JF - Psychometrika Y1 - 2009 A1 - Mulder, J. A1 - van der Linden, W. J. AB - Several criteria from the optimal design literature are examined for use with item selection in multidimensional adaptive testing. In particular, it is examined what criteria are appropriate for adaptive testing in which all abilities are intentional, some should be considered as a nuisance, or the interest is in the testing of a composite of the abilities. Both the theoretical analyses and the studies of simulated data in this paper suggest that the criteria of A-optimality and D-optimality lead to the most accurate estimates when all abilities are intentional, with the former slightly outperforming the latter. The criterion of E-optimality showed occasional erratic behavior for this case of adaptive testing, and its use is not recommended. If some of the abilities are nuisances, application of the criterion of A(s)-optimality (or D(s)-optimality), which focuses on the subset of intentional abilities is recommended. For the measurement of a linear combination of abilities, the criterion of c-optimality yielded the best results. The preferences of each of these criteria for items with specific patterns of parameter values was also assessed. It was found that the criteria differed mainly in their preferences of items with different patterns of values for their discrimination parameters. VL - 74 SN - 0033-3123 (Print)0033-3123 (Linking) N1 - Journal articlePsychometrikaPsychometrika. 2009 Jun;74(2):273-296. Epub 2008 Dec 23. U2 - 2813188 ER - TY - JOUR T1 - Multiple Maximum Exposure Rates in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2009 A1 - Barrada, Juan Ramón A1 - Veldkamp, Bernard P. A1 - Olea, Julio AB -

Computerized adaptive testing is subject to security problems, as the item bank content remains operative over long periods and administration time is flexible for examinees. Spreading the content of a part of the item bank could lead to an overestimation of the examinees' trait level. The most common way of reducing this risk is to impose a maximum exposure rate (rmax) that no item should exceed. Several methods have been proposed with this aim. All of these methods establish a single value of rmax throughout the test. This study presents a new method, the multiple-rmax method, that defines as many values of rmax as the number of items presented in the test. In this way, it is possible to impose a high degree of randomness in item selection at the beginning of the test, leaving the administration of items with the best psychometric properties to the moment when the trait level estimation is most accurate. The implementation of the multiple-r max method is described and is tested in simulated item banks and in an operative bank. Compared with a single maximum exposure method, the new method has a more balanced usage of the item bank and delays the possible distortion of trait estimation due to security problems, with either no or only slight decrements of measurement accuracy.

VL - 33 UR - http://apm.sagepub.com/content/33/1/58.abstract ER - TY - CHAP T1 - The nine lives of CAT-ASVAB: Innovations and revelations Y1 - 2009 A1 - Pommerich, M A1 - Segall, D. O. A1 - Moreno, K. E. AB - The Armed Services Vocational Aptitude Battery (ASVAB) is administered annually to more than one million military applicants and high school students. ASVAB scores are used to determine enlistment eligibility, assign applicants to military occupational specialties, and aid students in career exploration. The ASVAB is administered as both a paper-and-pencil (P&P) test and a computerized adaptive test (CAT). CAT-ASVAB holds the distinction of being the first large-scale adaptive test battery to be administered in a high-stakes setting. Approximately two-thirds of military applicants currently take CAT-ASVAB; long-term plans are to replace P&P-ASVAB with CAT-ASVAB at all test sites. Given CAT-ASVAB’s pedigree—approximately 20 years in development and 20 years in operational administration—much can be learned from revisiting some of the major highlights of CATASVAB history. This paper traces the progression of CAT-ASVAB through nine major phases of development including: research and evelopment of the CAT-ASVAB prototype, the initial development of psychometric procedures and item pools, initial and full-scale operational implementation, the introduction of new item pools, the introduction of Windows administration, the introduction of Internet administration, and research and development of the next generation CATASVAB. A background and history is provided for each phase, including discussions of major research and operational issues, innovative approaches and practices, and lessons learned. CY - In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 169 KB} ER - TY - CHAP T1 - Obtaining reliable diagnostic information through constrained CAT Y1 - 2009 A1 - Wang, C. A1 - Chang, Hua-Hua A1 - Douglas, J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 252 KB} ER - TY - CHAP T1 - Optimizing item exposure control algorithms for polytomous computerized adaptive tests with restricted item banks Y1 - 2009 A1 - Chajewski, M. A1 - Lewis, C. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 923 KB} ER - TY - JOUR T1 - a posteriori estimation in adaptive testing: Adaptive a priori, adaptive correction for bias, and adaptive integration interval JF - Journal of Applied Measurement Y1 - 2009 A1 - Raîche, G. A1 - Blais, J-G. VL - 10(2) ER - TY - CHAP T1 - Practical issues concerning the application of the DINA model to CAT data Y1 - 2009 A1 - Huebner, A. A1 - Wang, B. A1 - Lee, S. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 139 KB} ER - TY - JOUR T1 - Predictive Control of Speededness in Adaptive Testing JF - Applied Psychological Measurement Y1 - 2009 A1 - van der Linden, Wim J. AB -

An adaptive testing method is presented that controls the speededness of a test using predictions of the test takers' response times on the candidate items in the pool. Two different types of predictions are investigated: posterior predictions given the actual response times on the items already administered and posterior predictions that use the responses on these items as an additional source of information. In a simulation study with an adaptive test modeled after a test from the Armed Services Vocational Aptitude Battery, the effectiveness of the methods in removing differential speededness from the test was evaluated.

VL - 33 UR - http://apm.sagepub.com/content/33/1/25.abstract ER - TY - JOUR T1 - Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing JF - Journal of Rheumatology Y1 - 2009 A1 - Fries, J.F. A1 - Cella, D. A1 - Rose, M. A1 - Krishnan, E. A1 - Bruce, B. KW - *Disability Evaluation KW - *Outcome Assessment (Health Care) KW - Arthritis/diagnosis/*physiopathology KW - Health Surveys KW - Humans KW - Prognosis KW - Reproducibility of Results AB - OBJECTIVE: Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated. METHODS: A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ. RESULTS: Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms. CONCLUSION: Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length. VL - 36 SN - 0315-162X (Print)0315-162X (Linking) N1 - Fries, James FCella, DavidRose, MatthiasKrishnan, EswarBruce, BonnieU01 AR052158/AR/NIAMS NIH HHS/United StatesU01 AR52177/AR/NIAMS NIH HHS/United StatesConsensus Development ConferenceResearch Support, N.I.H., ExtramuralCanadaThe Journal of rheumatologyJ Rheumatol. 2009 Sep;36(9):2061-6. ER - TY - ABST T1 - Proposta para a construo de um Teste Adaptativo Informatizado baseado na Teoria da Resposta ao Item (Proposal for the construction of a Computerized Adaptive Testing based on the Item Response Theory) Y1 - 2009 A1 - Moreira Junior, F. J. A1 - Andrade, D. F. CY - Poster session presented at the Congresso Brasileiro de Teoria da Resposta ao Item, Florianpolis SC Brazil N1 - (In Portguese) ER - TY - CHAP T1 - Quantifying the impact of compromised items in CAT Y1 - 2009 A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 438 KB} ER - TY - JOUR T1 - Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation JF - Health and Quality of Life Outcomes Y1 - 2009 A1 - Chien, T. W. A1 - Wu, H. M. A1 - Wang, W-C. A1 - Castillo, R. V. A1 - Chou, W. KW - *Activities of Daily Living KW - *Computer Graphics KW - *Computer Simulation KW - *Diagnosis, Computer-Assisted KW - Female KW - Humans KW - Male KW - Point-of-Care Systems KW - Reproducibility of Results KW - Stroke/*rehabilitation KW - Taiwan KW - United States AB - BACKGROUND: The aim of this study was to verify the effectiveness and efficacy of saving time and reducing burden for patients, nurses, and even occupational therapists through computer adaptive testing (CAT). METHODS: Based on an item bank of the Barthel Index (BI) and the Frenchay Activities Index (FAI) for assessing comprehensive activities of daily living (ADL) function in stroke patients, we developed a visual basic application (VBA)-Excel CAT module, and (1) investigated whether the averaged test length via CAT is shorter than that of the traditional all-item-answered non-adaptive testing (NAT) approach through simulation, (2) illustrated the CAT multimedia on a tablet PC showing data collection and response errors of ADL clinical functional measures in stroke patients, and (3) demonstrated the quality control of endorsing scale with fit statistics to detect responding errors, which will be further immediately reconfirmed by technicians once patient ends the CAT assessment. RESULTS: The results show that endorsed items could be shorter on CAT (M = 13.42) than on NAT (M = 23) at 41.64% efficiency in test length. However, averaged ability estimations reveal insignificant differences between CAT and NAT. CONCLUSION: This study found that mobile nursing services, placed at the bedsides of patients could, through the programmed VBA-Excel CAT module, reduce the burden to patients and save time, more so than the traditional NAT paper-and-pencil testing appraisals. VL - 7 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Chien, Tsair-WeiWu, Hing-ManWang, Weng-ChungCastillo, Roberto VasquezChou, WillyComparative StudyValidation StudiesEnglandHealth and quality of life outcomesHealth Qual Life Outcomes. 2009 May 5;7:39. U2 - 2688502 ER - TY - JOUR T1 - Replenishing a computerized adaptive test of patient-reported daily activity functioning JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Jette, A. M. A1 - Tao, W. A1 - Moed, R. A1 - Meyers, D. A1 - Ludlow, L. H. KW - *Activities of Daily Living KW - *Disability Evaluation KW - *Questionnaires KW - *User-Computer Interface KW - Adult KW - Aged KW - Cohort Studies KW - Computer-Assisted Instruction KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods AB - PURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengJette, Alan MTao, WeiMoed, RichardMeyers, DougLudlow, Larry HK02 HD45354-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 May;18(4):461-71. Epub 2009 Mar 14. ER - TY - JOUR T1 - Studying the Equivalence of Computer-Delivered and Paper-Based Administrations of the Raven Standard Progressive Matrices Test JF - Educational and Psychological Measurement Y1 - 2009 A1 - Arce-Ferrer, Alvaro J. A1 - Martínez Guzmán, Elvira AB -

This study investigates the effect of mode of administration of the Raven Standard Progressive Matrices test on distribution, accuracy, and meaning of raw scores. A random sample of high school students take counterbalanced paper-and-pencil and computer-based administrations of the test and answer a questionnaire surveying preferences for computer-delivered test administrations. Administration mode effect is studied with repeated measures multivariate analysis of variance, internal consistency reliability estimates, and confirmatory factor analysis approaches. Results show a lack of test mode effect on distribution, accuracy, and meaning of raw scores. Participants indicate their preferences for the computer-delivered administration of the test. The article discusses findings in light of previous studies of the Raven Standard Progressive Matrices test.

VL - 69 UR - http://epm.sagepub.com/content/69/5/855.abstract ER - TY - CHAP T1 - Termination criteria in computerized adaptive tests: Variable-length CATs are not biased. Y1 - 2009 A1 - Babcock, B. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. ER - TY - CHAP T1 - Test overlap rate and item exposure rate as indicators of test security in CATs Y1 - 2009 A1 - Barrada, J A1 - Olea, J. A1 - Ponsoda, V. A1 - Abad, F. J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF File, 261 K ER - TY - CHAP T1 - Using automatic item generation to address item demands for CAT Y1 - 2009 A1 - Lai, H. A1 - Alves, C. A1 - Gierl, M. J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 320 KB} ER - TY - CHAP T1 - Utilizing the generalized likelihood ratio as a termination criterion Y1 - 2009 A1 - Thompson, N. A. AB - Computer-based testing can be used to classify examinees into mutually exclusive groups. Currently, the predominant psychometric algorithm for designing computerized classification tests (CCTs) is the sequential probability ratio test (SPRT; Reckase, 1983) based on item response theory (IRT). The SPRT has been shown to be more efficient than confidence intervals around θ estimates as a method for CCT delivery (Spray & Reckase, 1996; Rudner, 2002). More recently, it was demonstrated that the SPRT, which only uses fixed values, is less efficient than a generalized form which tests whether a given examinee’s θ is below θ1or above θ2 (Thompson, 2007). This formulation allows the indifference region to vary based on observed data. Moreover, this composite hypothesis formulation better represents the conceptual purpose of the test, which is to test whether θ is above or below the cutscore. The purpose of this study was to explore the specifications of the new generalized likelihood ratio (GLR; Huang, 2004). As with the SPRT, the efficiency of the procedure depends on the nominal error rates and the distance between θ1 and θ2 (Eggen, 1999). This study utilized a monte-carlo approach, with 10,000 examinees simulated under each condition, to evaluate differences in efficiency and accuracy due to hypothesis structure, nominal error rate, and indifference region size. The GLR was always at least as efficient as the fixed-point SPRT while maintaining equivalent levels of accuracy. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 194 KB} ER - TY - JOUR T1 - Validation of the MMPI-2 computerized adaptive version (MMPI-2-CA) in a correctional intake facility JF - Psychological Services Y1 - 2009 A1 - Forbey, J. D. A1 - Ben-Porath, Y. S. A1 - Gartland, D. AB - Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. The time savings afforded by this technique could be of particular benefit in settings where large numbers of psychological screenings are conducted, such as correctional facilities. In the current study, item and time savings, as well as the test–retest and extratest correlations associated with an audio augmented administration of all the scales of the Minnesota Multiphasic Personality Inventory (MMPI)-2 Computerized Adaptive (MMPI-2-CA) are reported. Participants include 366 men, ages 18 to 62 years (M = 33.04, SD = 10.40), undergoing intake into a large Midwestern state correctional facility. Results of the current study indicate considerable item and corresponding time savings for the MMPI-2-CA compared to conventional administration of the test, as well as comparability in terms of test–retest and correlations with external measures. Future directions of adaptive personality testing are discussed. VL - 6 SN - 1939-148X ER - TY - JOUR T1 - When cognitive diagnosis meets computerized adaptive testing: CD-CAT JF - Psychometrika Y1 - 2009 A1 - Cheng, Y VL - 74 ER - TY - JOUR T1 - Adaptive measurement of individual change JF - Zeitschrift für Psychologie / Journal of Psychology Y1 - 2008 A1 - Kim-Kang, G. A1 - Weiss, D. J. VL - 216 N1 - {PDF file, 568 KB} ER - TY - JOUR T1 - Adaptive models of psychological testing JF - Zeitschrift für Psychologie / Journal of Psychology Y1 - 2008 A1 - van der Linden, W. J. VL - 216(1) ER - TY - JOUR T1 - Adaptive Models of Psychological Testing JF - Zeitschrift für Psychologie / Journal of Psychology Y1 - 2008 A1 - van der Linden, W. J. VL - 216 IS - 1 ER - TY - JOUR T1 - Adaptive short forms for outpatient rehabilitation outcome assessment JF - American Journal of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Jette, A. M. A1 - Haley, S. M. A1 - Ni, P. A1 - Moed, R. KW - *Activities of Daily Living KW - *Ambulatory Care Facilities KW - *Mobility Limitation KW - *Treatment Outcome KW - Disabled Persons/psychology/*rehabilitation KW - Female KW - Humans KW - Male KW - Middle Aged KW - Questionnaires KW - Rehabilitation Centers AB - OBJECTIVE: To develop outpatient Adaptive Short Forms for the Activity Measure for Post-Acute Care item bank for use in outpatient therapy settings. DESIGN: A convenience sample of 11,809 adults with spine, lower limb, upper limb, and miscellaneous orthopedic impairments who received outpatient rehabilitation in 1 of 127 outpatient rehabilitation clinics in the United States. We identified optimal items for use in developing outpatient Adaptive Short Forms based on the Basic Mobility and Daily Activities domains of the Activity Measure for Post-Acute Care item bank. Patient scores were derived from the Activity Measure for Post-Acute Care computerized adaptive testing program. Items were selected for inclusion on the Adaptive Short Forms based on functional content, range of item coverage, measurement precision, item exposure rate, and data collection burden. RESULTS: Two outpatient Adaptive Short Forms were developed: (1) an 18-item Basic Mobility Adaptive Short Form and (2) a 15-item Daily Activities Adaptive Short Form, derived from the same item bank used to develop the Activity Measure for Post-Acute Care computerized adaptive testing program. Both Adaptive Short Forms achieved acceptable psychometric properties. CONCLUSIONS: In outpatient postacute care settings where computerized adaptive testing outcome applications are currently not feasible, item response theory-derived Adaptive Short Forms provide the efficient capability to monitor patients' functional outcomes. The development of Adaptive Short Form functional outcome instruments linked by a common, calibrated item bank has the potential to create a bridge to outcome monitoring across postacute care settings and can facilitate the eventual transformation from Adaptive Short Forms to computerized adaptive testing applications easier and more acceptable to the rehabilitation community. VL - 87 SN - 1537-7385 (Electronic) N1 - Jette, Alan MHaley, Stephen MNi, PengshengMoed, RichardK02 HD45354-01/HD/NICHD NIH HHS/United StatesR01 HD43568/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.United StatesAmerican journal of physical medicine & rehabilitation / Association of Academic PhysiatristsAm J Phys Med Rehabil. 2008 Oct;87(10):842-52. ER - TY - JOUR T1 - Are we ready for computerized adaptive testing? JF - Psychiatric Services Y1 - 2008 A1 - Unick, G. J. A1 - Shumway, M. A1 - Hargreaves, W. KW - *Attitude of Health Personnel KW - *Diagnosis, Computer-Assisted/instrumentation KW - Humans KW - Mental Disorders/*diagnosis KW - Software VL - 59 SN - 1075-2730 (Print)1075-2730 (Linking) N1 - Unick, George JShumway, MarthaHargreaves, WilliamCommentUnited StatesPsychiatric services (Washington, D.C.)Psychiatr Serv. 2008 Apr;59(4):369. ER - TY - JOUR T1 - Assessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Coster, W. J. A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Disability Evaluation KW - *Social Adjustment KW - Activities of Daily Living KW - Adolescent KW - Age Factors KW - Child KW - Child, Preschool KW - Computer Simulation KW - Cross-Over Studies KW - Disabled Children/*rehabilitation KW - Female KW - Follow-Up Studies KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care) KW - Reference Values KW - Reproducibility of Results KW - Retrospective Studies KW - Risk Factors KW - Self Care/*standards/trends KW - Sex Factors KW - Sickness Impact Profile AB - OBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Coster, Wendy JHaley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesR41 HD052318-01A1/HD/NICHD NIH HHS/United StatesR43 HD42388-01/HD/NICHD NIH HHS/United StatesComparative StudyResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Apr;89(4):622-9. U2 - 2666276 ER - TY - CONF T1 - An Automated Decision System for Computer Adaptive Testing Using Genetic Algorithms T2 - Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008. SNPD'08. Ninth ACIS International Conference on Y1 - 2008 A1 - Phankokkruad, M. A1 - Woraratpanya, K. AB - This paper proposes an approach to solve the triangle decision tree problem for computer adaptive testing (CAT) using genetic algorithms (GAs). In this approach, item response theory (IRT) parameters composed of discrimination, difficulty, and guess are firstly obtained and stored in an item bank. Then a fitness function, which is based on IRT parameters, of GAs for obtaining an optimal solution is set up. Finally, the GAs is applied to the parameters of the item bank so that an optimal decision tree is generated. Based on a six-level triangle-decision tree for examination items, the experimental results show that the optimal decision tree can be generated correctly when compared with the standard patterns. JF - Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008. SNPD'08. Ninth ACIS International Conference on PB - IEEE ER - TY - JOUR T1 - Binary items and beyond: a simulation of computer adaptive testing using the Rasch partial credit model JF - Journal of Applied Measurement Y1 - 2008 A1 - Lange, R. KW - *Data Interpretation, Statistical KW - *User-Computer Interface KW - Educational Measurement/*statistics & numerical data KW - Humans KW - Illinois KW - Models, Statistical AB - Past research on Computer Adaptive Testing (CAT) has focused almost exclusively on the use of binary items and minimizing the number of items to be administrated. To address this situation, extensive computer simulations were performed using partial credit items with two, three, four, and five response categories. Other variables manipulated include the number of available items, the number of respondents used to calibrate the items, and various manipulations of respondents' true locations. Three item selection strategies were used, and the theoretically optimal Maximum Information method was compared to random item selection and Bayesian Maximum Falsification approaches. The Rasch partial credit model proved to be quite robust to various imperfections, and systematic distortions did occur mainly in the absence of sufficient numbers of items located near the trait or performance levels of interest. The findings further indicate that having small numbers of items is more problematic in practice than having small numbers of respondents to calibrate these items. Most importantly, increasing the number of response categories consistently improved CAT's efficiency as well as the general quality of the results. In fact, increasing the number of response categories proved to have a greater positive impact than did the choice of item selection method, as the Maximum Information approach performed only slightly better than the Maximum Falsification approach. Accordingly, issues related to the efficiency of item selection methods are far less important than is commonly suggested in the literature. However, being based on computer simulations only, the preceding presumes that actual respondents behave according to the Rasch model. CAT research could thus benefit from empirical studies aimed at determining whether, and if so, how, selection strategies impact performance. VL - 9 SN - 1529-7713 (Print)1529-7713 (Linking) N1 - Lange, RenseUnited StatesJournal of applied measurementJ Appl Meas. 2008;9(1):81-104. ER - TY - JOUR T1 - CAT-MD: Computerized adaptive testing on mobile devices JF - International Journal of Web-Based Learning and Teaching Technologies Y1 - 2008 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 3 ER - TY - JOUR T1 - Combining computer adaptive testing technology with cognitively diagnostic assessment JF - Behavioral Research Methods Y1 - 2008 A1 - McGlohen, M. A1 - Chang, Hua-Hua KW - *Cognition KW - *Computers KW - *Models, Statistical KW - *User-Computer Interface KW - Diagnosis, Computer-Assisted/*instrumentation KW - Humans AB - A major advantage of computerized adaptive testing (CAT) is that it allows the test to home in on an examinee's ability level in an interactive manner. The aim of the new area of cognitive diagnosis is to provide information about specific content areas in which an examinee needs help. The goal of this study was to combine the benefit of specific feedback from cognitively diagnostic assessment with the advantages of CAT. In this study, three approaches to combining these were investigated: (1) item selection based on the traditional ability level estimate (theta), (2) item selection based on the attribute mastery feedback provided by cognitively diagnostic assessment (alpha), and (3) item selection based on both the traditional ability level estimate (theta) and the attribute mastery feedback provided by cognitively diagnostic assessment (alpha). The results from these three approaches were compared for theta estimation accuracy, attribute mastery estimation accuracy, and item exposure control. The theta- and alpha-based condition outperformed the alpha-based condition regarding theta estimation, attribute mastery pattern estimation, and item exposure control. Both the theta-based condition and the theta- and alpha-based condition performed similarly with regard to theta estimation, attribute mastery estimation, and item exposure control, but the theta- and alpha-based condition has an additional advantage in that it uses the shadow test method, which allows the administrator to incorporate additional constraints in the item selection process, such as content balancing, item type constraints, and so forth, and also to select items on the basis of both the current theta and alpha estimates, which can be built on top of existing 3PL testing programs. VL - 40 SN - 1554-351X (Print) N1 - McGlohen, MeghanChang, Hua-HuaUnited StatesBehavior research methodsBehav Res Methods. 2008 Aug;40(3):808-21. ER - TY - JOUR T1 - Comparability of Computer-Based and Paper-and-Pencil Testing in K–12 Reading Assessments JF - Educational and Psychological Measurement Y1 - 2008 A1 - Shudong Wang, A1 - Hong Jiao, A1 - Young, Michael J. A1 - Brooks, Thomas A1 - Olson, John AB -

In recent years, computer-based testing (CBT) has grown in popularity, is increasingly being implemented across the United States, and will likely become the primary mode for delivering tests in the future. Although CBT offers many advantages over traditional paper-and-pencil testing, assessment experts, researchers, practitioners, and users have expressed concern about the comparability of scores between the two test administration modes. To help provide an answer to this issue, a meta-analysis was conducted to synthesize the administration mode effects of CBTs and paper-and-pencil tests on K—12 student reading assessments. Findings indicate that the administration mode had no statistically significant effect on K—12 student reading achievement scores. Four moderator variables—study design, sample size, computer delivery algorithm, and computer practice—made statistically significant contributions to predicting effect size. Three moderator variables—grade level, type of test, and computer delivery method—did not affect the differences in reading scores between test modes.

VL - 68 UR - http://epm.sagepub.com/content/68/1/5.abstract ER - TY - JOUR T1 - Computer Adaptive-Attribute Testing A New Approach to Cognitive Diagnostic Assessment JF - Zeitschrift für Psychologie / Journal of Psychology Y1 - 2008 A1 - Gierl, M. J. A1 - Zhou, J. KW - cognition and assessment KW - cognitive diagnostic assessment KW - computer adaptive testing AB -

The influence of interdisciplinary forces stemming from developments in cognitive science,mathematical statistics, educational
psychology, and computing science are beginning to appear in educational and psychological assessment. Computer adaptive-attribute testing (CA-AT) is one example. The concepts and procedures in CA-AT can be found at the intersection between computer adaptive testing and cognitive diagnostic assessment. CA-AT allows us to fuse the administrative benefits of computer adaptive testing with the psychological benefits of cognitive diagnostic assessment to produce an innovative psychologically-based adaptive testing approach. We describe the concepts behind CA-AT as well as illustrate how it can be used to promote formative, computer-based, classroom assessment.

VL - 216 IS - 1 ER - TY - JOUR T1 - Computer-Based and Paper-and-Pencil Administration Mode Effects on a Statewide End-of-Course English Test JF - Educational and Psychological Measurement Y1 - 2008 A1 - Kim, Do-Hong A1 - Huynh, Huynh AB -

The current study compared student performance between paper-and-pencil testing (PPT) and computer-based testing (CBT) on a large-scale statewide end-of-course English examination. Analyses were conducted at both the item and test levels. The overall results suggest that scores obtained from PPT and CBT were comparable. However, at the content domain level, a rather large difference in the reading comprehension section suggests that reading comprehension test may be more affected by the test administration mode. Results from the confirmatory factor analysis suggest that the administration mode did not alter the construct of the test.

VL - 68 UR - http://epm.sagepub.com/content/68/4/554.abstract ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Haley, S. M. A1 - Gandek, B. A1 - Siebens, H. A1 - Black-Schaffer, R. M. A1 - Sinclair, S. J. A1 - Tao, W. A1 - Coster, W. J. A1 - Ni, P. A1 - Jette, A. M. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Haley, Stephen MGandek, BarbaraSiebens, HilaryBlack-Schaffer, Randie MSinclair, Samuel JTao, WeiCoster, Wendy JNi, PengshengJette, Alan MK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01/HD/NICHD NIH HHS/United StatesR01 HD043568/HD/NICHD NIH HHS/United StatesR01 HD043568-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Feb;89(2):275-83. U2 - 2666330 ER - TY - JOUR T1 - Computerized adaptive testing for patients with knee inpairments produced valid and responsive measures of function JF - Journal of Clinical Epidemiology Y1 - 2008 A1 - Hart, D. L. A1 - Wang, Y-C. A1 - Stratford, P. W. A1 - Mioduski, J. E. VL - 61 ER - TY - JOUR T1 - Computerized adaptive testing in back pain: Validation of the CAT-5D-QOL JF - Spine Y1 - 2008 A1 - Kopec, J. A. A1 - Badii, M. A1 - McKenna, M. A1 - Lima, V. D. A1 - Sayre, E. C. A1 - Dvorak, M. KW - *Disability Evaluation KW - *Health Status Indicators KW - *Quality of Life KW - Adult KW - Aged KW - Algorithms KW - Back Pain/*diagnosis/psychology KW - British Columbia KW - Diagnosis, Computer-Assisted/*standards KW - Feasibility Studies KW - Female KW - Humans KW - Internet KW - Male KW - Middle Aged KW - Predictive Value of Tests KW - Questionnaires/*standards KW - Reproducibility of Results AB - STUDY DESIGN: We have conducted an outcome instrument validation study. OBJECTIVE: Our objective was to develop a computerized adaptive test (CAT) to measure 5 domains of health-related quality of life (HRQL) and assess its feasibility, reliability, validity, and efficiency. SUMMARY OF BACKGROUND DATA: Kopec and colleagues have recently developed item response theory based item banks for 5 domains of HRQL relevant to back pain and suitable for CAT applications. The domains are Daily Activities (DAILY), Walking (WALK), Handling Objects (HAND), Pain or Discomfort (PAIN), and Feelings (FEEL). METHODS: An adaptive algorithm was implemented in a web-based questionnaire administration system. The questionnaire included CAT-5D-QOL (5 scales), Modified Oswestry Disability Index (MODI), Roland-Morris Disability Questionnaire (RMDQ), SF-36 Health Survey, and standard clinical and demographic information. Participants were outpatients treated for mechanical back pain at a referral center in Vancouver, Canada. RESULTS: A total of 215 patients completed the questionnaire and 84 completed a retest. On average, patients answered 5.2 items per CAT-5D-QOL scale. Reliability ranged from 0.83 (FEEL) to 0.92 (PAIN) and was 0.92 for the MODI, RMDQ, and Physical Component Summary (PCS-36). The ceiling effect was 0.5% for PAIN compared with 2% for MODI and 5% for RMQ. The CAT-5D-QOL scales correlated as anticipated with other measures of HRQL and discriminated well according to the level of satisfaction with current symptoms, duration of the last episode, sciatica, and disability compensation. The average relative discrimination index was 0.87 for PAIN, 0.67 for DAILY and 0.62 for WALK, compared with 0.89 for MODI, 0.80 for RMDQ, and 0.59 for PCS-36. CONCLUSION: The CAT-5D-QOL is feasible, reliable, valid, and efficient in patients with back pain. This methodology can be recommended for use in back pain research and should improve outcome assessment, facilitate comparisons across studies, and reduce patient burden. VL - 33 SN - 1528-1159 (Electronic)0362-2436 (Linking) N1 - Kopec, Jacek ABadii, MaziarMcKenna, MarioLima, Viviane DSayre, Eric CDvorak, MarcelResearch Support, Non-U.S. Gov'tValidation StudiesUnited StatesSpineSpine (Phila Pa 1976). 2008 May 20;33(12):1384-90. ER - TY - JOUR T1 - Computerized Adaptive Testing of Personality Traits JF - Zeitschrift für Psychologie / Journal of Psychology Y1 - 2008 A1 - Hol, A. M. A1 - Vorst, H. C. M. A1 - Mellenbergh, G. J. KW - Adaptive Testing KW - cmoputer-assisted testing KW - Item Response Theory KW - Likert scales KW - Personality Measures AB -

A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a
conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective
Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).

VL - 216 IS - 1 ER - TY - JOUR T1 - Controlling item exposure and test overlap on the fly in computerized adaptive testing JF - British Journal of Mathematical and Statistical Psychology Y1 - 2008 A1 - Chen, S-Y. A1 - Lei, P. W. A1 - Liao, W. H. KW - *Decision Making, Computer-Assisted KW - *Models, Psychological KW - Humans AB - This paper proposes an on-line version of the Sympson and Hetter procedure with test overlap control (SHT) that can provide item exposure control at both the item and test levels on the fly without iterative simulations. The on-line procedure is similar to the SHT procedure in that exposure parameters are used for simultaneous control of item exposure rates and test overlap rate. The exposure parameters for the on-line procedure, however, are updated sequentially on the fly, rather than through iterative simulations conducted prior to operational computerized adaptive tests (CATs). Unlike the SHT procedure, the on-line version can control item exposure rate and test overlap rate without time-consuming iterative simulations even when item pools or examinee populations have been changed. Moreover, the on-line procedure was found to perform better than the SHT procedure in controlling item exposure and test overlap for examinees who take tests earlier. Compared with two other on-line alternatives, this proposed on-line method provided the best all-around test security control. Thus, it would be an efficient procedure for controlling item exposure and test overlap in CATs. VL - 61 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Chen, Shu-YingLei, Pui-WaLiao, Wen-HanResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2008 Nov;61(Pt 2):471-92. Epub 2007 Jul 23. ER - TY - CONF T1 - Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing T2 - Joint Meeting on Adolescent Treatment Effectiveness Y1 - 2008 A1 - Dennis, M. L. A1 - Funk, R. A1 - Titus, J. A1 - Riley, B. B. A1 - Hosman, S. A1 - Kinne, S. JF - Joint Meeting on Adolescent Treatment Effectiveness CY - Washington D.C., USA N1 - ProCite field[6]: Paper presented at the ER - TY - JOUR T1 - The D-optimality item selection criterion in the early stage of CAT: A study with the graded response model JF - Journal of Educational and Behavioral Statistics Y1 - 2008 A1 - Passos, V. L. A1 - Berger, M. P. F. A1 - Tan, F. E. S. KW - computerized adaptive testing KW - D optimality KW - item selection AB - During the early stage of computerized adaptive testing (CAT), item selection criteria based on Fisher’s information often produce less stable latent trait estimates than the Kullback-Leibler global information criterion. Robustness against early stage instability has been reported for the D-optimality criterion in a polytomous CAT with the Nominal Response Model and is shown herein to be reproducible for the Graded Response Model. For comparative purposes, the A-optimality and the global information criteria are also applied. Their item selection is investigated as a function of test progression and item bank composition. The results indicate how the selection of specific item parameters underlies the criteria performances evaluated via accuracy and precision of estimation. In addition, the criteria item exposure rates are compared, without the use of any exposure controlling measure. On the account of stability, precision, accuracy, numerical simplicity, and less evidently, item exposure rate, the D-optimality criterion can be recommended for CAT. VL - 33 ER - TY - BOOK T1 - Effect of early misfit in computerized adaptive testing on the recovery of theta Y1 - 2008 A1 - Guyer, R. D. CY - Unpublished Ph.D. dissertation, University of Minnesota, Minneapolis MN. N1 - {PDF file, 1,004 KB} ER - TY - JOUR T1 - Efficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning JF - Disability & Rehabilitation Y1 - 2008 A1 - Allen, D. D. A1 - Ni, P. A1 - Haley, S. M. KW - *Disability Evaluation KW - Child KW - Computers KW - Disabled Children/*classification/rehabilitation KW - Efficiency KW - Humans KW - Outcome Assessment (Health Care) KW - Psychometrics KW - Reproducibility of Results KW - Retrospective Studies KW - Self Care KW - Sensitivity and Specificity AB - PURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden. VL - 30 SN - 0963-8288 (Print)0963-8288 (Linking) N1 - Allen, Diane DNi, PengshengHaley, Stephen MK02 HD45354-01/HD/NICHD NIH HHS/United StatesNIDDR H133P0001/DD/NCBDD CDC HHS/United StatesResearch Support, N.I.H., ExtramuralEnglandDisability and rehabilitationDisabil Rehabil. 2008;30(6):479-84. ER - TY - JOUR T1 - Functioning and validity of a computerized adaptive test to measure anxiety (A CAT) JF - Depression and Anxiety Y1 - 2008 A1 - Becker, J. A1 - Fliege, H. A1 - Kocalevent, R. D. A1 - Bjorner, J. B. A1 - Rose, M. A1 - Walter, O. B. A1 - Klapp, B. F. AB - Background: The aim of this study was to evaluate the Computerized Adaptive Test to measure anxiety (A-CAT), a patient-reported outcome questionnaire that uses computerized adaptive testing to measure anxiety. Methods: The A-CAT builds on an item bank of 50 items that has been built using conventional item analyses and item response theory analyses. The A-CAT was administered on Personal Digital Assistants to n=357 patients diagnosed and treated at the department of Psychosomatic Medicine and Psychotherapy, Charité Berlin, Germany. For validation purposes, two subgroups of patients (n=110 and 125) answered the A-CAT along with established anxiety and depression questionnaires. Results: The A-CAT was fast to complete (on average in 2 min, 38 s) and a precise item response theory based CAT score (reliability>.9) could be estimated after 4–41 items. On average, the CAT displayed 6 items (SD=4.2). Convergent validity of the A-CAT was supported by correlations to existing tools (Hospital Anxiety and Depression Scale-A, Beck Anxiety Inventory, Berliner Stimmungs-Fragebogen A/D, and State Trait Anxiety Inventory: r=.56–.66); discriminant validity between diagnostic groups was higher for the A-CAT than for other anxiety measures. Conclusions: The German A-CAT is an efficient, reliable, and valid tool for assessing anxiety in patients suffering from anxiety disorders and other conditions with significant potential for initial assessment and long-term treatment monitoring. Future research directions are to explore content balancing of the item selection algorithm of the CAT, to norm the tool to a healthy sample, and to develop practical cutoff scores. Depression and Anxiety, 2008. © 2008 Wiley-Liss, Inc. VL - 25 SN - 1520-6394 ER - TY - JOUR T1 - ICAT: An adaptive testing procedure for the identification of idiosyncratic knowledge patterns JF - Zeitschrift für Psychologie Y1 - 2008 A1 - Kingsbury, G. G. A1 - Houser, R.L. KW - computerized adaptive testing AB -

Traditional adaptive tests provide an efficient method for estimating student achievements levels, by adjusting the characteristicsof the test questions to match the performance of each student. These traditional adaptive tests are not designed to identify diosyncraticknowledge patterns. As students move through their education, they learn content in any number of different ways related to their learning style and cognitive development. This may result in a student having different achievement levels from one content area to another within a domain of content. This study investigates whether such idiosyncratic knowledge patterns exist. It discusses the differences between idiosyncratic knowledge patterns and multidimensionality. Finally, it proposes an adaptive testing procedure that can be used to identify a student’s areas of strength and weakness more efficiently than current adaptive testing approaches. The findings of the study indicate that a fairly large number of students may have test results that are influenced by their idiosyncratic knowledge patterns. The findings suggest that these patterns persist across time for a large number of students, and that the differences in student performance between content areas within a subject domain are large enough to allow them to be useful in instruction. Given the existence of idiosyncratic patterns of knowledge, the proposed testing procedure may enable us to provide more useful information to teachers. It should also allow us to differentiate between idiosyncratic patterns or knowledge, and important mutidimensionality in the testing data.

VL - 216 ER - TY - JOUR T1 - ICAT: An adaptive testing procedure for the identification of idiosyncratic knowledge patterns JF - Zeitschrift für Psychologie / Journal of Psychology Y1 - 2008 A1 - Kingsbury, G. G. A1 - Houser, R.L. VL - 216(1) ER - TY - JOUR T1 - Impact of altering randomization intervals on precision of measurement and item exposure JF - Journal of Applied Measurement Y1 - 2008 A1 - Muckle, T. J. A1 - Bergstrom, B. A. A1 - Becker, K. A1 - Stahl, J. A. AB -

This paper reports on the use of simulation when a randomization procedure is used to control item exposure in a computerized adaptive test for certification. We present a method to determine the optimum width of the interval from which items are selected and we report on the impact of relaxing the interval width on measurement precision and item exposure. Results indicate that, if the item bank is well targeted, it may be possible to widen the randomization interval and thus reduce item exposure, without seriously impacting the error of measure for test takers whose ability estimate is near the pass point.

VL - 9 IS - 2 ER - TY - JOUR T1 - Implementing Sympson-Hetter Item-Exposure Control in a Shadow-Test Approach to Constrained Adaptive Testing JF - International Journal of Testing Y1 - 2008 A1 - Veldkamp, Bernard P. A1 - van der Linden, Wim J. VL - 8 UR - http://www.tandfonline.com/doi/abs/10.1080/15305050802262233 ER - TY - JOUR T1 - Incorporating randomness in the Fisher information for improving item-exposure control in CATs JF - British Journal of Mathematical and Statistical Psychology Y1 - 2008 A1 - Barrada, J A1 - Olea, J. A1 - Ponsoda, V. A1 - Abad, F. J. VL - 61 ER - TY - JOUR T1 - An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain JF - BMC Musculoskelet Disorders Y1 - 2008 A1 - Elhan, A. H. A1 - Oztuna, D. A1 - Kutlay, S. A1 - Kucukdeveci, A. A. A1 - Tennant, A. AB - ABSTRACT: BACKGROUND: Recent approaches to outcome measurement involving Computerized Adaptive Testing (CAT) offer an approach for measuring disability in low back pain (LBP) in a way that can reduce the burden upon patient and professional. The aim of this study was to explore the potential of CAT in LBP for measuring disability as defined in the International Classification of Functioning, Disability and Health (ICF) which includes impairments, activity limitation, and participation restriction. METHODS: 266 patients with low back pain answered questions from a range of widely used questionnaires. An exploratory factor analysis (EFA) was used to identify disability dimensions which were then subjected to Rasch analysis. Reliability was tested by internal consistency and person separation index (PSI). Discriminant validity of disability levels were evaluated by Spearman correlation coefficient (r), intraclass correlation coefficient [ICC(2,1)] and the Bland-Altman approach. A CAT was developed for each dimension, and the results checked against simulated and real applications from a further 133 patients. RESULTS: Factor analytic techniques identified two dimensions named "body functions" and "activity-participation". After deletion of some items for failure to fit the Rasch model, the remaining items were mostly free of Differential Item Functioning (DIF) for age and gender. Reliability exceeded 0.90 for both dimensions. The disability levels generated using all items and those obtained from the real CAT application were highly correlated (i.e. >0.97 for both dimensions). On average, 19 and 14 items were needed to estimate the precise disability levels using the initial CAT for the first and second dimension. However, a marginal increase in the standard error of the estimate across successive iterations substantially reduced the number of items required to make an estimate. CONCLUSIONS: Using a combination approach of EFA and Rasch analysis this study has shown that it is possible to calibrate items onto a single metric in a way that can be used to provide the basis of a CAT application. Thus there is an opportunity to obtain a wide variety of information to evaluate the biopsychosocial model in its more complex forms, without necessarily increasing the burden of information collection for patients. VL - 9 SN - 1471-2474 (Electronic) N1 - Journal articleBMC musculoskeletal disordersBMC Musculoskelet Disord. 2008 Dec 18;9(1):166. ER - TY - JOUR T1 - Investigating item exposure control on the fly in computerized adaptive testing JF - Psychological Testing Y1 - 2008 A1 - Wu, M.-L. A1 - Chen, S-Y. VL - 55 ER - TY - JOUR T1 - Item exposure control in a-stratified computerized adaptive testing JF - Psychological Testing Y1 - 2008 A1 - Jhu, Y.-J., A1 - Chen, S-Y. VL - 55 ER - TY - JOUR T1 - Letting the CAT out of the bag: Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire JF - Spine Y1 - 2008 A1 - Cook, K. F. A1 - Choi, S. W. A1 - Crane, P. K. A1 - Deyo, R. A. A1 - Johnson, K. L. A1 - Amtmann, D. KW - *Disability Evaluation KW - *Health Status Indicators KW - Adult KW - Aged KW - Aged, 80 and over KW - Back Pain/*diagnosis/psychology KW - Calibration KW - Computer Simulation KW - Diagnosis, Computer-Assisted/*standards KW - Humans KW - Middle Aged KW - Models, Psychological KW - Predictive Value of Tests KW - Questionnaires/*standards KW - Reproducibility of Results AB - STUDY DESIGN: A post hoc simulation of a computer adaptive administration of the items of a modified version of the Roland-Morris Disability Questionnaire. OBJECTIVE: To evaluate the effectiveness of adaptive administration of back pain-related disability items compared with a fixed 11-item short form. SUMMARY OF BACKGROUND DATA: Short form versions of the Roland-Morris Disability Questionnaire have been developed. An alternative to paper-and-pencil short forms is to administer items adaptively so that items are presented based on a person's responses to previous items. Theoretically, this allows precise estimation of back pain disability with administration of only a few items. MATERIALS AND METHODS: Data were gathered from 2 previously conducted studies of persons with back pain. An item response theory model was used to calibrate scores based on all items, items of a paper-and-pencil short form, and several computer adaptive tests (CATs). RESULTS: Correlations between each CAT condition and scores based on a 23-item version of the Roland-Morris Disability Questionnaire ranged from 0.93 to 0.98. Compared with an 11-item short form, an 11-item CAT produced scores that were significantly more highly correlated with scores based on the 23-item scale. CATs with even fewer items also produced scores that were highly correlated with scores based on all items. For example, scores from a 5-item CAT had a correlation of 0.93 with full scale scores. Seven- and 9-item CATs correlated at 0.95 and 0.97, respectively. A CAT with a standard-error-based stopping rule produced scores that correlated at 0.95 with full scale scores. CONCLUSION: A CAT-based back pain-related disability measure may be a valuable tool for use in clinical and research contexts. Use of CAT for other common measures in back pain research, such as other functional scales or measures of psychological distress, may offer similar advantages. VL - 33 SN - 1528-1159 (Electronic) N1 - Cook, Karon FChoi, Seung WCrane, Paul KDeyo, Richard AJohnson, Kurt LAmtmann, Dagmar5 P60-AR48093/AR/United States NIAMS5U01AR052171-03/AR/United States NIAMSComparative StudyResearch Support, N.I.H., ExtramuralUnited StatesSpineSpine. 2008 May 20;33(12):1378-83. ER - TY - JOUR T1 - Local Dependence in an Operational CAT: Diagnosis and Implications JF - Journal of Educational Measurement Y1 - 2008 A1 - Pommerich, Mary A1 - Segall, Daniel O. AB -

The accuracy of CAT scores can be negatively affected by local dependence if the CAT utilizes parameters that are misspecified due to the presence of local dependence and/or fails to control for local dependence in responses during the administration stage. This article evaluates the existence and effect of local dependence in a test of Mathematics Knowledge. Diagnostic tools were first used to evaluate the existence of local dependence in items that were calibrated under a 3PL model. A simulation study was then used to evaluate the effect of local dependence on the precision of examinee CAT scores when the 3PL model was used for selection and scoring. The diagnostic evaluation showed strong evidence for local dependence. The simulation suggested that local dependence in parameters had a minimal effect on CAT score precision, while local dependence in responses had a substantial effect on score precision, depending on the degree of local dependence present.

VL - 45 UR - http://dx.doi.org/10.1111/j.1745-3984.2008.00061.x ER - TY - JOUR T1 - Measuring physical functioning in children with spinal impairments with computerized adaptive testing JF - Journal of Pediatric Orthopedics Y1 - 2008 A1 - Mulcahey, M. J. A1 - Haley, S. M. A1 - Duffy, T. A1 - Pengsheng, N. A1 - Betz, R. R. KW - *Disability Evaluation KW - Adolescent KW - Child KW - Child, Preschool KW - Computer Simulation KW - Cross-Sectional Studies KW - Disabled Children/*rehabilitation KW - Female KW - Humans KW - Infant KW - Kyphosis/*diagnosis/rehabilitation KW - Male KW - Prospective Studies KW - Reproducibility of Results KW - Scoliosis/*diagnosis/rehabilitation AB - BACKGROUND: The purpose of this study was to assess the utility of measuring current physical functioning status of children with scoliosis and kyphosis by applying computerized adaptive testing (CAT) methods. Computerized adaptive testing uses a computer interface to administer the most optimal items based on previous responses, reducing the number of items needed to obtain a scoring estimate. METHODS: This was a prospective study of 77 subjects (0.6-19.8 years) who were seen by a spine surgeon during a routine clinic visit for progress spine deformity. Using a multidimensional version of the Pediatric Evaluation of Disability Inventory CAT program (PEDI-MCAT), we evaluated content range, accuracy and efficiency, known-group validity, concurrent validity with the Pediatric Outcomes Data Collection Instrument, and test-retest reliability in a subsample (n = 16) within a 2-week interval. RESULTS: We found the PEDI-MCAT to have sufficient item coverage in both self-care and mobility content for this sample, although most patients tended to score at the higher ends of both scales. Both the accuracy of PEDI-MCAT scores as compared with a fixed format of the PEDI (r = 0.98 for both mobility and self-care) and test-retest reliability were very high [self-care: intraclass correlation (3,1) = 0.98, mobility: intraclass correlation (3,1) = 0.99]. The PEDI-MCAT took an average of 2.9 minutes for the parents to complete. The PEDI-MCAT detected expected differences between patient groups, and scores on the PEDI-MCAT correlated in expected directions with scores from the Pediatric Outcomes Data Collection Instrument domains. CONCLUSIONS: Use of the PEDI-MCAT to assess the physical functioning status, as perceived by parents of children with complex spinal impairments, seems to be feasible and achieves accurate and efficient estimates of self-care and mobility function. Additional item development will be needed at the higher functioning end of the scale to avoid ceiling effects for older children. LEVEL OF EVIDENCE: This is a level II prospective study designed to establish the utility of computer adaptive testing as an evaluation method in a busy pediatric spine practice. VL - 28 SN - 0271-6798 (Print)0271-6798 (Linking) N1 - Mulcahey, M JHaley, Stephen MDuffy, TheresaPengsheng, NiBetz, Randal RK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesUnited StatesJournal of pediatric orthopedicsJ Pediatr Orthop. 2008 Apr-May;28(3):330-5. U2 - 2696932 ER - TY - JOUR T1 - Modern sequential analysis and its application to computerized adaptive testing JF - Psychometrika Y1 - 2008 A1 - Bartroff, J. A1 - Finkelman, M. A1 - Lai, T. L. AB - After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing.We then show how these methods can be used to design adaptive mastery tests, which are asymptotically optimal and are also shown to provide substantial improvements over currently used sequential and fixed length tests. VL - 73 ER - TY - JOUR T1 - A Monte Carlo Approach for Adaptive Testing With Content Constraints JF - Applied Psychological Measurement Y1 - 2008 A1 - Belov, Dmitry I. A1 - Armstrong, Ronald D. A1 - Weissman, Alexander AB -

This article presents a new algorithm for computerized adaptive testing (CAT) when content constraints are present. The algorithm is based on shadow CAT methodology to meet content constraints but applies Monte Carlo methods and provides the following advantages over shadow CAT: (a) lower maximum item exposure rates, (b) higher utilization of the item pool, and (c) more robust ability estimates. Computer simulations with Law School Admission Test items demonstrated that the new algorithm (a) produces similar ability estimates as shadow CAT but with half the maximum item exposure rate and 100% pool utilization and (b) produces more robust estimates when a high- (or low-) ability examinee performs poorly (or well) at the beginning of the test.

VL - 32 UR - http://apm.sagepub.com/content/32/6/431.abstract ER - TY - JOUR T1 - A monte carlo approach for adaptive testing with content constraints JF - Applied Psychological Measurement Y1 - 2008 A1 - Belov, D. I. A1 - Armstrong, R. D. A1 - Weissman, A. VL - 32 ER - TY - JOUR T1 - A Monte Carlo Approach to the Design, Assembly, and Evaluation of Multistage Adaptive Tests JF - Applied Psychological Measurement Y1 - 2008 A1 - Belov, Dmitry I. A1 - Armstrong, Ronald D. AB -

This article presents an application of Monte Carlo methods for developing and assembling multistage adaptive tests (MSTs). A major advantage of the Monte Carlo assembly over other approaches (e.g., integer programming or enumerative heuristics) is that it provides a uniform sampling from all MSTs (or MST paths) available from a given item pool. The uniform sampling allows a statistically valid analysis for MST design and evaluation. Given an item pool, MST model, and content constraints for test assembly, three problems are addressed in this study. They are (a) the construction of item response theory (IRT) targets for each MST path, (b) the assembly of an MST such that each path satisfies content constraints and IRT constraints, and (c) an analysis of the pool and constraints to increase the number of nonoverlapping MSTs that can be assembled from the pool. The primary intent is to produce reliable measurements and enhance pool utilization.

VL - 32 UR - http://apm.sagepub.com/content/32/2/119.abstract ER - TY - JOUR T1 - A Monte Carlo approach to the design, assembly, and evaluation of multistage adaptive tests JF - Applied Psychological Measurement Y1 - 2008 A1 - Belov, D.I., Armstrong, R.D. VL - 32 ER - TY - JOUR T1 - The NAPLEX: evolution, purpose, scope, and educational implications JF - American Journal of Pharmaceutical Education Y1 - 2008 A1 - Newton, D. W. A1 - Boyle, M. A1 - Catizone, C. A. KW - *Educational Measurement KW - Education, Pharmacy/*standards KW - History, 20th Century KW - History, 21st Century KW - Humans KW - Licensure, Pharmacy/history/*legislation & jurisprudence KW - North America KW - Pharmacists/*legislation & jurisprudence KW - Software AB - Since 2004, passing the North American Pharmacist Licensure Examination (NAPLEX) has been a requirement for earning initial pharmacy licensure in all 50 United States. The creation and evolution from 1952-2005 of the particular pharmacy competency testing areas and quantities of questions are described for the former paper-and-pencil National Association of Boards of Pharmacy Licensure Examination (NABPLEX) and the current candidate-specific computer adaptive NAPLEX pharmacy licensure examinations. A 40% increase in the weighting of NAPLEX Blueprint Area 2 in May 2005, compared to that in the preceding 1997-2005 Blueprint, has implications for candidates' NAPLEX performance and associated curricular content and instruction. New pharmacy graduates' scores on the NAPLEX are neither intended nor validated to serve as a criterion for assessing or judging the quality or effectiveness of pharmacy curricula and instruction. The newest cycle of NAPLEX Blueprint revision, a continual process to ensure representation of nationwide contemporary practice, began in early 2008. It may take up to 2 years, including surveying several thousand national pharmacists, to complete. VL - 72 SN - 1553-6467 (Electronic)0002-9459 (Linking) N1 - Newton, David WBoyle, MariaCatizone, Carmen AHistorical ArticleUnited StatesAmerican journal of pharmaceutical educationAm J Pharm Educ. 2008 Apr 15;72(2):33. U2 - 2384208 ER - TY - JOUR T1 - Predicting item exposure parameters in computerized adaptive testing JF - British Journal of Mathematical and Statistical Psychology Y1 - 2008 A1 - Chen, S-Y. A1 - Doong, S. H. KW - *Algorithms KW - *Artificial Intelligence KW - Aptitude Tests/*statistics & numerical data KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Humans KW - Models, Statistical KW - Psychometrics/statistics & numerical data KW - Reproducibility of Results KW - Software AB - The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) - a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge-based solution for finding item exposure parameters. VL - 61 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Chen, Shu-YingDoong, Shing-HwangResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2008 May;61(Pt 1):75-91. ER - TY - ABST T1 - Preparing the implementation of computerized adaptive testing for high-stakes examinations Y1 - 2008 A1 - Huh, S. JF - Journal of Educational Evaluation for Health Professions VL - 5 SN - 1975-5937 (Electronic) N1 - Huh, SunEditorialKorea (South)Journal of educational evaluation for health professionsJ Educ Eval Health Prof. 2008;5:1. Epub 2008 Dec 22. U2 - 2631196 ER - TY - JOUR T1 - Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing JF - Spanish Journal of Psychology Y1 - 2008 A1 - Barrada, J A1 - Olea, J. A1 - Abad, F. J. KW - *Character KW - *Databases KW - *Software Design KW - Aptitude Tests/*statistics & numerical data KW - Bias (Epidemiology) KW - Computing Methodologies KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Educational Measurement/*statistics & numerical data KW - Humans KW - Mathematical Computing KW - Psychometrics/statistics & numerical data AB -

If examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.

VL - 11 SN - 1138-7416 N1 - Barrada, Juan RamonOlea, JulioAbad, Francisco JoseResearch Support, Non-U.S. Gov'tSpainThe Spanish journal of psychologySpan J Psychol. 2008 Nov;11(2):618-25. ER - TY - JOUR T1 - Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study JF - Applied Psychological Measurement Y1 - 2008 A1 - Qing Yi, A1 - Jinming Zhang, A1 - Chang, Hua-Hua AB -

Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive testing (CAT) for two realistic item selection methods, maximum item information and a-stratified with content blocking, using the randomized method as a baseline for comparison. Damage caused by organized item theft was evaluated by the number of compromised items each examinee could encounter and the impact of the compromised items on examinees' ability estimates. Severity of test security violation was assessed under self-organized and organized item theft simulation scenarios. Results indicated that though item theft could cause severe damage to CAT with either item selection method, the maximum item information method was more vulnerable to the organized item theft simulation than was the a-stratified method.

VL - 32 UR - http://apm.sagepub.com/content/32/7/543.abstract ER - TY - JOUR T1 - Some new developments in adaptive testing technology JF - Zeitschrift für Psychologie Y1 - 2008 A1 - van der Linden, W. J. KW - computerized adaptive testing AB -

In an ironic twist of history, modern psychological testing has returned to an adaptive format quite common when testing was not yet standardized. Important stimuli to the renewed interest in adaptive testing have been the development of item-response theory in psychometrics, which models the responses on test items using separate parameters for the items and test takers, and the use of computers in test administration, which enables us to estimate the parameter for a test taker and select the items in real time. This article reviews a selection from the latest developments in the technology of adaptive testing, such as constrained adaptive item selection, adaptive testing using rule-based item generation, multidimensional adaptive testing, adaptive use of test batteries, and the use of response times in adaptive testing.

VL - 216 ER - TY - JOUR T1 - Strategies for controlling item exposure in computerized adaptive testing with the partial credit model JF - Journal of Applied Measurement Y1 - 2008 A1 - Davis, L. L. A1 - Dodd, B. G. KW - *Algorithms KW - *Computers KW - *Educational Measurement/statistics & numerical data KW - Humans KW - Questionnaires/*standards KW - United States AB - Exposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT). The current study investigated the performance of four procedures for controlling item exposure in a CAT under the partial credit model. In addition to a no exposure control baseline condition, the Kingsbury-Zara, modified-within-.10-logits, Sympson-Hetter, and conditional Sympson-Hetter procedures were implemented to control exposure rates. The Kingsbury-Zara and the modified-within-.10-logits procedures were implemented with 3 and 6 item candidate conditions. The results show that the Kingsbury-Zara and modified-within-.10-logits procedures with 6 item candidates performed as well as the conditional Sympson-Hetter in terms of exposure rates, overlap rates, and pool utilization. These two procedures are strongly recommended for use with partial credit CATs due to their simplicity and strength of their results. VL - 9 SN - 1529-7713 (Print)1529-7713 (Linking) N1 - Davis, Laurie LaughlinDodd, Barbara GUnited StatesJournal of applied measurementJ Appl Meas. 2008;9(1):1-17. ER - TY - JOUR T1 - A Strategy for Controlling Item Exposure in Multidimensional Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2008 A1 - Lee, Yi-Hsuan A1 - Ip, Edward H. A1 - Fuh, Cheng-Der AB -

Although computerized adaptive tests have enjoyed tremendous growth, solutions for important problems remain unavailable. One problem is the control of item exposure rate. Because adaptive algorithms are designed to select optimal items, they choose items with high discriminating power. Thus, these items are selected more often than others, leading to both overexposure and underutilization of some parts of the item pool. Overused items are often compromised, creating a security problem that could threaten the validity of a test. Building on a previously proposed stratification scheme to control the exposure rate for one-dimensional tests, the authors extend their method to multidimensional tests. A strategy is proposed based on stratification in accordance with a functional of the vector of the discrimination parameter, which can be implemented with minimal computational overhead. Both theoretical and empirical validation studies are provided. Empirical results indicate significant improvement over the commonly used method of controlling exposure rate that requires only a reasonable sacrifice in efficiency.

VL - 68 UR - http://epm.sagepub.com/content/68/2/215.abstract ER - TY - JOUR T1 - To Weight Or Not To Weight? Balancing Influence Of Initial Items In Adaptive Testing JF - Psychometrica Y1 - 2008 A1 - Chang, H.-H. A1 - Ying, Z. AB -

It has been widely reported that in computerized adaptive testing some examinees may get much lower scores than they would normally if an alternative paper-and-pencil version were given. The main purpose of this investigation is to quantitatively reveal the cause for the underestimation phenomenon. The logistic models, including the 1PL, 2PL, and 3PL models, are used to demonstrate our assertions. Our analytical derivation shows that, under the maximum information item selection strategy, if an examinee failed a few items at the beginning of the test, easy but more discriminating items are likely to be administered. Such items are ineffective to move the estimate close to the true theta, unless the test is sufficiently long or a variable-length test is used. Our results also indicate that a certain weighting mechanism is necessary to make the algorithm rely less on the items administered at the beginning of the test.

VL - 73 IS - 3 ER - TY - JOUR T1 - Transitioning from fixed-length questionnaires to computer-adaptive versions JF - Zeitschrift für Psychologie \ Journal of Psychology Y1 - 2008 A1 - Walter, O. B. A1 - Holling, H. VL - 216(1) ER - TY - JOUR T1 - Using computerized adaptive testing to reduce the burden of mental health assessment JF - Psychiatric Services Y1 - 2008 A1 - Gibbons, R. D. A1 - Weiss, D. J. A1 - Kupfer, D. J. A1 - Frank, E. A1 - Fagiolini, A. A1 - Grochocinski, V. J. A1 - Bhaumik, D. K. A1 - Stover, A. A1 - Bock, R. D. A1 - Immekus, J. C. KW - *Diagnosis, Computer-Assisted KW - *Questionnaires KW - Adolescent KW - Adult KW - Aged KW - Agoraphobia/diagnosis KW - Anxiety Disorders/diagnosis KW - Bipolar Disorder/diagnosis KW - Female KW - Humans KW - Male KW - Mental Disorders/*diagnosis KW - Middle Aged KW - Mood Disorders/diagnosis KW - Obsessive-Compulsive Disorder/diagnosis KW - Panic Disorder/diagnosis KW - Phobic Disorders/diagnosis KW - Reproducibility of Results KW - Time Factors AB - OBJECTIVE: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. METHODS: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. RESULTS: Tests of competing models based on item response theory supported the scale's bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients--one with bipolar disorder and one without--on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. CONCLUSIONS: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden. VL - 59 SN - 1075-2730 (Print) N1 - Gibbons, Robert DWeiss, David JKupfer, David JFrank, EllenFagiolini, AndreaGrochocinski, Victoria JBhaumik, Dulal KStover, AngelaBock, R DarrellImmekus, Jason CR01-MH-30915/MH/United States NIMHR01-MH-66302/MH/United States NIMHResearch Support, N.I.H., ExtramuralUnited StatesPsychiatric services (Washington, D.C.)Psychiatr Serv. 2008 Apr;59(4):361-8. ER - TY - JOUR T1 - Using item banks to construct measures of patient reported outcomes in clinical trials: investigator perceptions JF - Clinical Trials Y1 - 2008 A1 - Flynn, K. E. A1 - Dombeck, C. B. A1 - DeWitt, E. M. A1 - Schulman, K. A. A1 - Weinfurt, K. P. AB - BACKGROUND: Item response theory (IRT) promises more sensitive and efficient measurement of patient-reported outcomes (PROs) than traditional approaches; however, the selection and use of PRO measures from IRT-based item banks differ from current methods of using PRO measures. PURPOSE: To anticipate barriers to the adoption of IRT item banks into clinical trials. METHODS: We conducted semistructured telephone or in-person interviews with 42 clinical researchers who published results from clinical trials in the Journal of the American Medical Association, the New England Journal of Medicine, or other leading clinical journals from July 2005 through May 2006. Interviews included a brief tutorial on IRT item banks. RESULTS: After the tutorial, 39 of 42 participants understood the novel products available from an IRT item bank, namely customized short forms and computerized adaptive testing. Most participants (38/42) thought that item banks could be useful in their clinical trials, but they mentioned several potential barriers to adoption, including economic and logistical constraints, concerns about whether item banks are better than current PRO measures, concerns about how to convince study personnel or statisticians to use item banks, concerns about FDA or sponsor acceptance, and the lack of availability of item banks validated in specific disease populations. LIMITATIONS: Selection bias might have led to more positive responses to the concept of item banks in clinical trials. CONCLUSIONS: Clinical investigators are open to a new method of PRO measurement offered in IRT item banks, but bank developers must address investigator and stakeholder concerns before widespread adoption can be expected. VL - 5 SN - 1740-7745 (Print) N1 - Flynn, Kathryn EDombeck, Carrie BDeWitt, Esi MorganSchulman, Kevin AWeinfurt, Kevin P5U01AR052186/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralEnglandClinical trials (London, England)Clin Trials. 2008;5(6):575-86. ER - TY - JOUR T1 - Using response times for item selection in adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 2008 A1 - van der Linden, W. J. VL - 33 ER - TY - JOUR T1 - On using stochastic curtailment to shorten the SPRT in sequential mastery testing JF - Journal of Educational and Behavioral Statistics Y1 - 2008 A1 - Finkelman, M. D. VL - 33 ER - TY - JOUR T1 - Utilizing Rasch measurement models to develop a computer adaptive self-report of walking, climbing, and running JF - Disability and Rehabilitation Y1 - 2008 A1 - Velozo, C. A. A1 - Wang, Y. A1 - Lehman, L. A. A1 - Wang, J. H. AB - Purpose.The purpose of this paper is to show how the Rasch model can be used to develop a computer adaptive self-report of walking, climbing, and running.Method.Our instrument development work on the walking/climbing/running construct of the ICF Activity Measure was used to show how to develop a computer adaptive test (CAT). Fit of the items to the Rasch model and validation of the item difficulty hierarchy was accomplished using Winsteps software. Standard error was used as a stopping rule for the CAT. Finally, person abilities were connected to items difficulties using Rasch analysis ‘maps’.Results.All but the walking one mile item fit the Rasch measurement model. A CAT was developed which selectively presented items based on the last calibrated person ability measure and was designed to stop when standard error decreased to a pre-set criterion. Finally, person ability measures were connected to the ability to perform specific walking/climbing/running activities using Rasch maps.Conclusions.Rasch measurement models can be useful in developing CAT measures for rehabilitation and disability. In addition to CATs reducing respondent burden, the connection of person measures to item difficulties may be important for the clinical interpretation of measures.Read More: http://informahealthcare.com/doi/abs/10.1080/09638280701617317 VL - 30 ER - TY - JOUR T1 - The Wald–Wolfowitz Theorem Is Violated in Sequential Mastery Testing JF - Sequential Analysis Y1 - 2008 A1 - Finkelman, M. VL - 27 ER - TY - CHAP T1 - Adaptive estimators of trait level in adaptive testing: Some proposals Y1 - 2007 A1 - Raîche, G. A1 - Blais, J. G. A1 - Magis, D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 125 KB} ER - TY - CHAP T1 - Adaptive testing with the multi-unidimensional pairwise preference model Y1 - 2007 A1 - Stark, S. A1 - Chernyshenko, O. S. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 145 KB} ER - TY - JOUR T1 - Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment JF - Quality of Life Research Y1 - 2007 A1 - Fayers, P. M. AB - OBJECTIVES: We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). BACKGROUND: IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. RESULTS: Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. CONCLUSIONS: Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified. VL - 16 SN - 0962-9343 (Print) N1 - Fayers, Peter MNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:187-94. Epub 2007 Apr 7. ER - TY - JOUR T1 - Automated Simultaneous Assembly of Multistage Testlets for a High-Stakes Licensing Examination JF - Educational and Psychological Measurement Y1 - 2007 A1 - Breithaupt, Krista A1 - Hare, Donovan R. AB -

Many challenges exist for high-stakes testing programs offering continuous computerized administration. The automated assembly of test questions to exactly meet content and other requirements, provide uniformity, and control item exposure can be modeled and solved by mixed-integer programming (MIP) methods. A case study of the computerized licensing examination of the American Institute of Certified Public Accountants is offered as one application of MIP techniques for test assembly. The solution illustrates assembly for a computer-adaptive multistage testing design. However, the general form of the constraint-based solution can be modified to generate optimal test designs for paper-based or computerized administrations, regardless of the specific psychometric model. An extension of this methodology allows for long-term planning for the production and use of test content on the basis of an exact psychometric test designs and administration schedules.

VL - 67 UR - http://epm.sagepub.com/content/67/1/5.abstract ER - TY - JOUR T1 - A “Rearrangement Procedure” For Scoring Adaptive Tests with Review Options JF - International Journal of Testing Y1 - 2007 A1 - Papanastasiou, Elena C. A1 - Reckase, Mark D. VL - 7 UR - http://www.tandfonline.com/doi/abs/10.1080/15305050701632262 ER - TY - CHAP T1 - Bundle models for computerized adaptive testing in e-learning assessment Y1 - 2007 A1 - Scalise, K. A1 - Wilson, M. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 426 KB} ER - TY - CHAP T1 - CAT Security: A practitioner’s perspective Y1 - 2007 A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 104 KB} ER - TY - CHAP T1 - Choices in CAT models in the context of educational testing Y1 - 2007 A1 - Theo Eggen CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 123 KB} ER - TY - CONF T1 - Choices in CAT models in the context of educattional testing T2 - GMAC Conference on Computerized Adaptive Testing Y1 - 2007 A1 - Theo Eggen JF - GMAC Conference on Computerized Adaptive Testing PB - Graduate Management Admission Council CY - St. Paul, MN ER - TY - CHAP T1 - Comparison of computerized adaptive testing and classical methods for measuring individual change Y1 - 2007 A1 - Kim-Kang, G. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 347 KB} ER - TY - JOUR T1 - The comparison of maximum likelihood estimation and expected a posteriori in CAT using the graded response model JF - Journal of Elementary Education Y1 - 2007 A1 - Chen, S-K. VL - 19 ER - TY - BOOK T1 - A comparison of two methods of polytomous computerized classification testing for multiple cutscores Y1 - 2007 A1 - Thompson, N. A. CY - Unpublished doctoral dissertation, University of Minnesota N1 - {PDF file, 363 KB} ER - TY - JOUR T1 - Computerized adaptive personality testing: A review and illustration with the MMPI-2 Computerized Adaptive Version JF - Psychological Assessment Y1 - 2007 A1 - Forbey, J. D. A1 - Ben-Porath, Y. S. KW - Adolescent KW - Adult KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Female KW - Humans KW - Male KW - MMPI/*statistics & numerical data KW - Personality Assessment/*statistics & numerical data KW - Psychometrics/statistics & numerical data KW - Reference Values KW - Reproducibility of Results AB - Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. Two approaches have been explored for adaptive testing in computerized personality assessment: item response theory and the countdown method. In this article, the authors review the literature on each and report the results of an investigation designed to explore the utility, in terms of item and time savings, and validity, in terms of correlations with external criterion measures, of an expanded countdown method-based research version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), the MMPI-2 Computerized Adaptive Version (MMPI-2-CA). Participants were 433 undergraduate college students (170 men and 263 women). Results indicated considerable item savings and corresponding time savings for the adaptive testing modalities compared with a conventional computerized MMPI-2 administration. Furthermore, computerized adaptive administration yielded comparable results to computerized conventional administration of the MMPI-2 in terms of both test scores and their validity. Future directions for computerized adaptive personality testing are discussed. VL - 19 SN - 1040-3590 (Print) N1 - Forbey, Johnathan DBen-Porath, Yossef SResearch Support, Non-U.S. Gov'tUnited StatesPsychological assessmentPsychol Assess. 2007 Mar;19(1):14-24. ER - TY - JOUR T1 - Computerized adaptive testing for measuring development of young children JF - Statistics in Medicine Y1 - 2007 A1 - Jacobusse, G. A1 - Buuren, S. KW - *Child Development KW - *Models, Statistical KW - Child, Preschool KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Humans KW - Netherlands AB - Developmental indicators that are used for routine measurement in The Netherlands are usually chosen to optimally identify delayed children. Measurements on the majority of children without problems are therefore quite imprecise. This study explores the use of computerized adaptive testing (CAT) to monitor the development of young children. CAT is expected to improve the measurement precision of the instrument. We do two simulation studies - one with real data and one with simulated data - to evaluate the usefulness of CAT. It is shown that CAT selects developmental indicators that maximally match the individual child, so that all children can be measured to the same precision. VL - 26 SN - 0277-6715 (Print) N1 - Jacobusse, GertBuuren, Stef vanEnglandStatistics in medicineStat Med. 2007 Jun 15;26(13):2629-38. ER - TY - JOUR T1 - Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison With Short Forms JF - Applied Psychological Measurement Y1 - 2007 A1 - Hol, A. Michiel A1 - Vorst, Harrie C. M. A1 - Mellenbergh, Gideon J. AB -

In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit.

VL - 31 UR - http://apm.sagepub.com/content/31/5/412.abstract ER - TY - JOUR T1 - Computerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms JF - Applied Psychological Measurement Y1 - 2007 A1 - Hol, A. M. A1 - Vorst, H. C. M. A1 - Mellenbergh, G. J. KW - 2220 Tests & Testing KW - Adaptive Testing KW - Attitude Measurement KW - computer adaptive testing KW - Computer Assisted Testing KW - items KW - Motivation KW - polytomous motivation KW - Statistical Validity KW - Test Administration KW - Test Forms KW - Test Items AB - In a randomized experiment (n=515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) VL - 31 SN - 0146-6216 N1 - 10.1177/0146621606297314Journal; Peer Reviewed Journal; Journal Article ER - TY - CHAP T1 - Computerized adaptive testing with the bifactor model Y1 - 2007 A1 - Weiss, D. J. A1 - Gibbons, R. D. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 159 KB} ER - TY - CHAP T1 - Computerized attribute-adaptive testing: A new computerized adaptive testing approach incorporating cognitive psychology Y1 - 2007 A1 - Zhou, J. A1 - Gierl, M. J. A1 - Cui, Y. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 296 KB} ER - TY - Generic T1 - Computerized classification testing with composite hypotheses T2 - GMAC Conference on Computerized Adaptive Testing Y1 - 2007 A1 - Thompson, N. A. A1 - Ro, S. KW - computerized adaptive testing JF - GMAC Conference on Computerized Adaptive Testing PB - Graduate Management Admissions Council CY - St. Paul, MN N1 - Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved [date] from www. psych. umn. edu/psylabs/CATCentral ER - TY - CHAP T1 - Computerized classification testing with composite hypotheses Y1 - 2007 A1 - Thompson, N. A. A1 - Ro, S. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 96 KB} ER - TY - JOUR T1 - Computerizing Organizational Attitude Surveys JF - Educational and Psychological Measurement Y1 - 2007 A1 - Mueller, Karsten A1 - Liebig, Christian A1 - Hattrup, Keith AB -

Two quasi-experimental field studies were conducted to evaluate the psychometric equivalence of computerized and paper-and-pencil job satisfaction measures. The present research extends previous work in the area by providing better control of common threats to validity in quasi-experimental research on test mode effects and by evaluating a more comprehensive measurement model for job attitudes. Results of both studies demonstrated substantial equivalence of the computerized measure with the paper-and-pencil version. Implications for the practical use of computerized organizational attitude surveys are discussed.

VL - 67 UR - http://epm.sagepub.com/content/67/4/658.abstract ER - TY - JOUR T1 - Conditional Item-Exposure Control in Adaptive Testing Using Item-Ineligibility Probabilities JF - Journal of Educational and Behavioral Statistics Y1 - 2007 A1 - van der Linden, Wim J. A1 - Veldkamp, Bernard P. AB -

Two conditional versions of the exposure-control method with item-ineligibility constraints for adaptive testing in van der Linden and Veldkamp (2004) are presented. The first version is for unconstrained item selection, the second for item selection with content constraints imposed by the shadow-test approach. In both versions, the exposure rates of the items are controlled using probabilities of item ineligibility given θ that adapt the exposure rates automatically to a goal value for the items in the pool. In an extensive empirical study with an adaptive version of the Law School Admission Test, the authors show how the method can be used to drive conditional exposure rates below goal values as low as 0.025. Obviously, the price to be paid for minimal exposure rates is a decrease in the accuracy of the ability estimates. This trend is illustrated with empirical data.

VL - 32 UR - http://jeb.sagepub.com/cgi/content/abstract/32/4/398 ER - TY - CONF T1 - Cutscore location and classification accuracy in computerized classification testing T2 - Paper presented at the international meeting of the Psychometric Society Y1 - 2007 A1 - Ro, S. A1 - Thompson, N. A. JF - Paper presented at the international meeting of the Psychometric Society CY - Tokyo, Japan N1 - {PDF file, 94 KB} ER - TY - JOUR T1 - The design and evaluation of a computerized adaptive test on mobile devices JF - Computers & Education Y1 - 2007 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 49. ER - TY - CHAP T1 - The design of p-optimal item banks for computerized adaptive tests Y1 - 2007 A1 - Reckase, M. D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. {PDF file, 211 KB}. ER - TY - CHAP T1 - Designing optimal item pools for computerized adaptive tests with Sympson-Hetter exposure control Y1 - 2007 A1 - Gu, L. A1 - Reckase, M. D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing N1 - 3 MB} ER - TY - CHAP T1 - Designing templates based on a taxonomy of innovative items Y1 - 2007 A1 - Parshall, C. G. A1 - Harmes, J. C. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 149 KB} ER - TY - JOUR T1 - Detecting Differential Speededness in Multistage Testing JF - Journal of Educational Measurement Y1 - 2007 A1 - van der Linden, Wim J. A1 - Breithaupt, Krista A1 - Chuah, Siang Chee A1 - Zhang, Yanwei AB -

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed between subtests and test takers and detecting differential speededness. An empirical data set for a multistage test in the computerized CPA Exam was used to demonstrate the procedures. Although the more difficult subtests appeared to have items that were more time intensive than the easier subtests, an analysis of the residual response times did not reveal any significant differential speededness because the time limit appeared to be appropriate. In a separate analysis, within each of the subtests, we found minor but consistent patterns of residual times that are believed to be due to a warm-up effect, that is, use of more time on the initial items than they actually need.

VL - 44 UR - http://dx.doi.org/10.1111/j.1745-3984.2007.00030.x ER - TY - JOUR T1 - Developing tailored instruments: item banking and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Bjorner, J. B. A1 - Chang, C-H. A1 - Thissen, D. A1 - Reeve, B. B. KW - *Health Status KW - *Health Status Indicators KW - *Mental Health KW - *Outcome Assessment (Health Care) KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Algorithms KW - Factor Analysis, Statistical KW - Humans KW - Models, Statistical KW - Psychometrics AB - Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. VL - 16 SN - 0962-9343 (Print) N1 - Bjorner, Jakob BueChang, Chih-HungThissen, DavidReeve, Bryce B1R43NS047763-01/NS/United States NINDSAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:95-108. Epub 2007 Feb 15. ER - TY - JOUR T1 - Development and evaluation of a computer adaptive test for “Anxiety” (Anxiety-CAT) JF - Quality of Life Research Y1 - 2007 A1 - Walter, O. B. A1 - Becker, J. A1 - Bjorner, J. B. A1 - Fliege, H. A1 - Klapp, B. F. A1 - Rose, M. VL - 16 ER - TY - CHAP T1 - The development of a computerized adaptive test for integrity Y1 - 2007 A1 - Egberink, I. J. L. A1 - Veldkamp, B. P. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDf file, 290 KB} ER - TY - CHAP T1 - Development of a multiple-component CAT for measuring foreign language proficiency (SIMTEST) Y1 - 2007 A1 - Sumbling, M. A1 - Sanz, P. A1 - Viladrich, M. C. A1 - Doval, E. A1 - Riera, L. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 258 KB} ER - TY - JOUR T1 - The effect of including pretest items in an operational computerized adaptive test: Do different ability examinees spend different amounts of time on embedded pretest items? JF - Educational Assessment Y1 - 2007 A1 - Ferdous, A. A. A1 - Plake, B. S. A1 - Chang, S-R. KW - ability KW - operational computerized adaptive test KW - pretest items KW - time AB - The purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Lawrence Erlbaum: US VL - 12 SN - 1062-7197 (Print); 1532-6977 (Electronic) ER - TY - ABST T1 - The effect of using item parameters calibrated from paper administrations in computer adaptive test administrations Y1 - 2007 A1 - Pommerich, M KW - Mode effects AB - Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can result in examinee scores that are artificially inflated or deflated. As such, researchers have engaged in extensive studies of whether scores differ across paper and computer presentations of the same tests. The research generally seems to indicate that the more complicated it is to present or take a test on computer, the greater the possibility of mode effects. In a computer adaptive test, mode effects may be a particular concern if items are calibrated using item responses obtained from one administration mode (i.e., paper), and those parameters are then used operationally in a different administration mode (i.e., computer). This paper studies the suitability of using parameters calibrated from a paper administration for item selection and scoring in a computer adaptive administration, for two tests with lengthy passages that required navigation in the computer administration. The results showed that the use of paper calibrated parameters versus computer calibrated parameters in computer adaptive administrations had small to moderate effects on the reliability of examinee scores, at fairly short test lengths. This effect was generally diminished for longer test lengths. However, the results suggest that in some cases, some loss in reliability might be inevitable if paper-calibrated parameters are used in computer adaptive administrations. JF - Journal of Technology, Learning, and Assessment VL - 5 ER - TY - JOUR T1 - The Effect of Using Item Parameters Calibrated from Paper Administrations in Computer Adaptive Test Administrations JF - The Journal of Technology, Learning, and Assessment Y1 - 2007 A1 - Pommerich, M AB -

Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can
result in examinee scores that are artificially inflated or deflated. As such, researchers have engaged in extensive studies of whether scores differ across paper and computer presentations of the same tests. The research generally seems to indicate that the more
complicated it is to present or take a test on computer, the greater the possibility of mode effects. In a computer adaptive test, mode effects may be a particular concern if items are calibrated using item responses obtained from one administration mode (i.e., paper), and those parameters are then used operationally in a different administration mode (i.e., computer). This paper studies the suitability of using parameters calibrated from a paper administration for item selection and scoring in a computer adaptive administration, for two tests with lengthy passages that required navigation in the computer administration. The results showed that the use of paper calibrated parameters versus computer calibrated parameters in computer adaptive administrations had small to
moderate effects on the reliability of examinee scores, at fairly short test lengths. This effect was generally diminished for longer test lengths. However, the results suggest that in some cases, some loss in reliability might be inevitable if paper-calibrated parameters
are used in computer adaptive administrations. 

VL - 5 ER - TY - JOUR T1 - Estimating the Standard Error of the Maximum Likelihood Ability Estimator in Adaptive Testing Using the Posterior-Weighted Test Information Function JF - Educational and Psychological Measurement Y1 - 2007 A1 - Penfield, Randall D. AB -

The standard error of the maximum likelihood ability estimator is commonly estimated by evaluating the test information function at an examinee's current maximum likelihood estimate (a point estimate) of ability. Because the test information function evaluated at the point estimate may differ from the test information function evaluated at an examinee's true ability value, the estimated standard error may be biased under certain conditions. This is of particular concern in adaptive testing because the height of the test information function is expected to be higher at the current estimate of ability than at the actual value of ability. This article proposes using the posterior-weighted test information function in computing the standard error of the maximum likelihood ability estimator for adaptive test sessions. A simulation study showed that the proposed approach provides standard error estimates that are less biased and more efficient than those provided by the traditional point estimate approach.

VL - 67 UR - http://epm.sagepub.com/content/67/6/958.abstract ER - TY - JOUR T1 - Evaluation of computer adaptive testing systems JF - International Journal of Web-Based Learning and Teaching Technologies Y1 - 2007 A1 - Economides, A. A. A1 - Roupas, C KW - computer adaptive testing systems KW - examination organizations KW - systems evaluation AB - Many educational organizations are trying to reduce the cost of the exams, the workload and delay of scoring, and the human errors. Also, they try to increase the accuracy and efficiency of the testing. Recently, most examination organizations use computer adaptive testing (CAT) as the method for large scale testing. This article investigates the current state of CAT systems and identifies their strengths and weaknesses. It evaluates 10 CAT systems using an evaluation framework of 15 domains categorized into three dimensions: educational, technical, and economical. The results show that the majority of the CAT systems give priority to security, reliability, and maintainability. However, they do not offer to the examinee any advanced support and functionalities. Also, the feedback to the examinee is limited and the presentation of the items is poor. Recommendations are made in order to enhance the overall quality of a CAT system. For example, alternative multimedia items should be available so that the examinee would choose a preferred media type. Feedback could be improved by providing more information to the examinee or providing information anytime the examinee wished. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - IGI Global: US VL - 2 SN - 1548-1093 (Print); 1548-1107 (Electronic) ER - TY - JOUR T1 - An exploration and realization of computerized adaptive testing with cognitive diagnosis JF - Acta Psychologica Sinica Y1 - 2007 A1 - Haijing, L. A1 - Shuliang, D. AB - An increased attention paid to “cognitive bugs behavior,” appears to lead to an increased research interests in diagnostic testing based on Item Response Theory(IRT)that combines cognitive psychology and psychometrics. The study of cognitive diagnosis were applied mainly to Paper-and-Pencil (P&P) testing. Rarely has it been applied to computerized adaptive testing CAT), To our knowledge, no research on CAT with cognitive diagnosis has been conducted in China. Since CAT is more efficient and accurate than P&P testing, there is important to develop an application technique for cognitive diagnosis suitable for CAT. This study attempts to construct a preliminary CAT system for cognitive diagnosis.With the help of the methods for “ Diagnosis first, Ability estimation second ”, the knowledge state conversion diagram was used to describe all the possible knowledge states in a domain of interest and the relation among the knowledge states at the diagnosis stage, where a new strategy of item selection based-on the algorithm of Depth First Search was proposed. On the other hand, those items that contain attributes which the examinee has not mastered were removed in ability estimation. At the stage of accurate ability estimation, all the items answered by each examinee not only matched his/her ability estimated value, but also were limited to those items whose attributes have been mastered by the examinee.We used Monte Carlo Simulation to simulate all the data of the three different structures of cognitive attributes in this study. These structures were tree-shaped, forest-shaped, and some isolated vertices (that are related to simple Q-matrix). Both tree-shaped and isolated vertices structure were derived from actual cases, while forest-shaped structure was a generalized simulation. 3000 examinees and 3000 items were simulated in the experiment of tree-shaped, 2550 examinees and 3100 items in forest-shaped, and 2000 examinees and 2500 items in isolated vertices. The maximum test length was all assumed as 30 items for all those experiments. The difficulty parameters and the logarithm of the discrimination were drawn from the standard normal distribution N(0,1). There were 100 examinees of each attribute pattern in the experiment of tree-shaped and 50 examinees of each attribute pattern in forest-shaped. In isolated vertices, 2000 examinees are students come from actual case.To assess the behaviors of the proposed diagnostic approach, three assessment indices were used. They are attribute pattern classification agreement rate (abr.APCAR), the Recovery (the average of the absolute deviation between the estimated value and the true value) and the average test length (abr. Length).Parts of results of Monte Carlo study were as follows.For the attribute structure of tree-shaped, APCAR is 84.27%,Recovery is 0.17,Length is 24.80.For the attribute structure of forest-shaped, APCAR is 84.02%,Recovery is 0.172,Length is 23.47.For the attribute structure of isolated vertices, APCAR is 99.16%,Recorvery is 0.256,Length is 27.32.As show the above, we can conclude that the results are favorable. The rate of cognitive diagnosis accuracy has exceeded 80% in each experiment, and the Recovery is also good. Therefore, it should be an acceptable idea to construct an initiatory CAT system for cognitive diagnosis, if we use the methods for “Diagnosis first, Ability estimation second ” with the help of both knowledge state conversion diagram and the new strategy of item selection based-on the algorithm of Depth First Search VL - 39 ER - TY - CHAP T1 - Exploring potential designs for multi-form structure computerized adaptive tests with uniform item exposure Y1 - 2007 A1 - Edwards, M. C. A1 - Thissen, D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 295 KB} ER - TY - JOUR T1 - The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Cella, D. A1 - Gershon, R. C. A1 - Lai, J-S. A1 - Choi, S. W. AB - The use of item banks and computerized adaptive testing (CAT) begins with clear definitions of important outcomes, and references those definitions to specific questions gathered into large and well-studied pools, or “banks” of items. Items can be selected from the bank to form customized short scales, or can be administered in a sequence and length determined by a computer programmed for precision and clinical relevance. Although far from perfect, such item banks can form a common definition and understanding of human symptoms and functional problems such as fatigue, pain, depression, mobility, social function, sensory function, and many other health concepts that we can only measure by asking people directly. The support of the National Institutes of Health (NIH), as witnessed by its cooperative agreement with measurement experts through the NIH Roadmap Initiative known as PROMIS (www.nihpromis.org), is a big step in that direction. Our approach to item banking and CAT is practical; as focused on application as it is on science or theory. From a practical perspective, we frequently must decide whether to re-write and retest an item, add more items to fill gaps (often at the ceiling of the measure), re-test a bank after some modifications, or split up a bank into units that are more unidimensional, yet less clinically relevant or complete. These decisions are not easy, and yet they are rarely unforgiving. We encourage people to build practical tools that are capable of producing multiple short form measures and CAT administrations from common banks, and to further our understanding of these banks with various clinical populations and ages, so that with time the scores that emerge from these many activities begin to have not only a common metric and range, but a shared meaning and understanding across users. In this paper, we provide an overview of item banking and CAT, discuss our approach to item banking and its byproducts, describe testing options, discuss an example of CAT for fatigue, and discuss models for long term sustainability of an entity such as PROMIS. Some barriers to success include limitations in the methods themselves, controversies and disagreements across approaches, and end-user reluctance to move away from the familiar. VL - 16 SN - 0962-9343 ER - TY - JOUR T1 - Hypothetischer Einsatz adaptiven Testens bei der Messung von Bildungsstandards in Mathematik [Hypothetical use of adaptive testing for the measurement of educational standards in mathematics] . JF - Zeitschrift für Erziehungswissenschaft Y1 - 2007 A1 - Frey, A. A1 - Ehmke, T. VL - 8 ER - TY - CHAP T1 - ICAT: An adaptive testing procedure to allow the identification of idiosyncratic knowledge patterns Y1 - 2007 A1 - Kingsbury, G. G. A1 - Houser, R.L. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 161 KB} ER - TY - CHAP T1 - Implementing the Graduate Management Admission Test® computerized adaptive test Y1 - 2007 A1 - Rudner, L. M. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 135 KB} ER - TY - JOUR T1 - Improving patient reported outcomes using item response theory and computerized adaptive testing JF - Journal of Rheumatology Y1 - 2007 A1 - Chakravarty, E. F. A1 - Bjorner, J. B. A1 - Fries, J.F. KW - *Rheumatic Diseases/physiopathology/psychology KW - Clinical Trials KW - Data Interpretation, Statistical KW - Disability Evaluation KW - Health Surveys KW - Humans KW - International Cooperation KW - Outcome Assessment (Health Care)/*methods KW - Patient Participation/*methods KW - Research Design/*trends KW - Software AB - OBJECTIVE: Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. METHODS: We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. RESULTS: Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance"items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. CONCLUSION: By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials. VL - 34 SN - 0315-162X (Print) N1 - Chakravarty, Eliza FBjorner, Jakob BFries, James FAr052158/ar/niamsConsensus Development ConferenceResearch Support, N.I.H., ExtramuralCanadaThe Journal of rheumatologyJ Rheumatol. 2007 Jun;34(6):1426-31. ER - TY - JOUR T1 - The initial development of an item bank to assess and screen for psychological distress in cancer patients JF - Psycho-Oncology Y1 - 2007 A1 - Smith, A. B. A1 - Rush, R. A1 - Velikova, G. A1 - Wall, L. A1 - Wright, E. P. A1 - Stark, D. A1 - Selby, P. A1 - Sharpe, M. KW - 3293 Cancer KW - cancer patients KW - Distress KW - initial development KW - Item Response Theory KW - Models KW - Neoplasms KW - Patients KW - Psychological KW - psychological distress KW - Rasch KW - Stress AB - Psychological distress is a common problem among cancer patients. Despite the large number of instruments that have been developed to assess distress, their utility remains disappointing. This study aimed to use Rasch models to develop an item-bank which would provide the basis for better means of assessing psychological distress in cancer patients. An item bank was developed from eight psychological distress questionnaires using Rasch analysis to link common items. Items from the questionnaires were added iteratively with common items as anchor points and misfitting items (infit mean square > 1.3) removed, and unidimensionality assessed. A total of 4914 patients completed the questionnaires providing an initial pool of 83 items. Twenty items were removed resulting in a final pool of 63 items. Good fit was demonstrated and no additional factor structure was evident from the residuals. However, there was little overlap between item locations and person measures, since items mainly targeted higher levels of distress. The Rasch analysis allowed items to be pooled and generated a unidimensional instrument for measuring psychological distress in cancer patients. Additional items are required to more accurately assess patients across the whole continuum of psychological distress. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) VL - 16 SN - 1057-9249 N1 - 10.1002/pon.1117Journal; Peer Reviewed Journal; Journal Article ER - TY - CHAP T1 - Investigating CAT designs to achieve comparability with a paper test Y1 - 2007 A1 - Thompson, T. A1 - Way, W. D. CY - In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 141 KB} ER - TY - JOUR T1 - IRT health outcomes data analysis project: an overview and summary JF - Quality of Life Research Y1 - 2007 A1 - Cook, K. F. A1 - Teal, C. R. A1 - Bjorner, J. B. A1 - Cella, D. A1 - Chang, C-H. A1 - Crane, P. K. A1 - Gibbons, L. E. A1 - Hays, R. D. A1 - McHorney, C. A. A1 - Ocepek-Welikson, K. A1 - Raczek, A. E. A1 - Teresi, J. A. A1 - Reeve, B. B. KW - *Data Interpretation, Statistical KW - *Health Status KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Female KW - HIV Infections/psychology KW - Humans KW - Male KW - Neoplasms/psychology KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Stress, Psychological AB - BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. VL - 16 SN - 0962-9343 (Print) N1 - Cook, Karon FTeal, Cayla RBjorner, Jakob BCella, DavidChang, Chih-HungCrane, Paul KGibbons, Laura EHays, Ron DMcHorney, Colleen AOcepek-Welikson, KatjaRaczek, Anastasia ETeresi, Jeanne AReeve, Bryce B1U01AR52171-01/AR/United States NIAMSR01 (CA60068)/CA/United States NCIY1-PC-3028-01/PC/United States NCIResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:121-32. Epub 2007 Mar 10. ER - TY - CONF T1 - Item selection in computerized classification testing T2 - Paper presented at the Conference on High Stakes Testing Y1 - 2007 A1 - Thompson, N. A. JF - Paper presented at the Conference on High Stakes Testing CY - University of Nebraska N1 - {PDF file, 87KB} ER - TY - JOUR T1 - Methodological issues for building item banks and computerized adaptive scales JF - Quality of Life Research Y1 - 2007 A1 - Thissen, D. A1 - Reeve, B. B. A1 - Bjorner, J. B. A1 - Chang, C-H. AB - Abstract This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research. VL - 16 SN - 0962-93431573-2649 ER - TY - JOUR T1 - Methods for restricting maximum exposure rate in computerized adaptative testing JF - Methodology: European Journal of Research Methods for the Behavioral and Social Sciences Y1 - 2007 A1 - Barrada, J A1 - Olea, J. A1 - Ponsoda, V. KW - computerized adaptive testing KW - item bank security KW - item exposure control KW - overlap rate KW - Sympson-Hetter method AB - The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method (Revuelta & Ponsoda, 1998), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Hogrefe & Huber Publishers GmbH: Germany VL - 3 SN - 1614-1881 (Print); 1614-2241 (Electronic) ER - TY - CHAP T1 - The modified maximum global discrimination index method for cognitive diagnostic computerized adaptive testing Y1 - 2007 A1 - Cheng, Y A1 - Chang, Hua-Hua CY -   D. J. Weiss (Ed.).  Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 172 KB} ER - TY - RPRT T1 - A multiple objective test assembly approach for exposure control problems in computerized adaptive testing Y1 - 2007 A1 - Veldkamp, B. P. A1 - Verschoor, Angela J. A1 - Theo Eggen JF - Measurement and Research Department Reports PB - Cito CY - Arnhem, The Netherlands ER - TY - JOUR T1 - Mutual Information Item Selection in Adaptive Classification Testing JF - Educational and Psychological Measurement Y1 - 2007 A1 - Weissman, Alexander AB -

A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local- and global-information measures in the multiple-category classification setting. Results from simulation studies using three item selection methods, Fisher information (FI), posterior-weighted FI (FIP), and MI, are provided for an adaptive four-category classification test. Both across and within the four classification categories, it is shown that in general, MI item selection classifies the highest proportion of examinees correctly and yields the shortest test lengths. The next best performance is observed for FIP item selection, followed by FI.

VL - 67 UR - http://epm.sagepub.com/content/67/1/41.abstract ER - TY - JOUR T1 - An NCME instructional module on multistage testing JF - Educational Measurement: Issues and Practice Y1 - 2007 A1 - Hendrickson, A. VL - 26(2) N1 - #HE07044 ER - TY - CHAP T1 - A new delivery system for CAT Y1 - 2007 A1 - Park, J. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 248 KB} ER - TY - CHAP T1 - Nonparametric online item calibration Y1 - 2007 A1 - Samejima, F. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - 8 MB} ER - TY - CHAP T1 - Partial order knowledge structures for CAT applications Y1 - 2007 A1 - Desmarais, M. C. A1 - Pu, X, A1 - Blais, J-G. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 475 KB} ER - TY - CHAP T1 - Patient-reported outcomes measurement and computerized adaptive testing: An application of post-hoc simulation to a diagnostic screening instrument Y1 - 2007 A1 - Immekus, J. C. A1 - Gibbons, R. D. A1 - Rush, J. A. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 203 KB} ER - TY - JOUR T1 - Patient-reported outcomes measurement and management with innovative methodologies and technologies JF - Quality of Life Research Y1 - 2007 A1 - Chang, C-H. KW - *Health Status KW - *Outcome Assessment (Health Care) KW - *Quality of Life KW - *Software KW - Computer Systems/*trends KW - Health Insurance Portability and Accountability Act KW - Humans KW - Patient Satisfaction KW - Questionnaires KW - United States AB - Successful integration of modern psychometrics and advanced informatics in patient-reported outcomes (PRO) measurement and management can potentially maximize the value of health outcomes research and optimize the delivery of quality patient care. Unlike the traditional labor-intensive paper-and-pencil data collection method, item response theory-based computerized adaptive testing methodologies coupled with novel technologies provide an integrated environment to collect, analyze and present ready-to-use PRO data for informed and shared decision-making. This article describes the needs, challenges and solutions for accurate, efficient and cost-effective PRO data acquisition and dissemination means in order to provide critical and timely PRO information necessary to actively support and enhance routine patient care in busy clinical settings. VL - 16 Suppl 1 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Chang, Chih-HungR21CA113191/CA/NCI NIH HHS/United StatesResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:157-66. Epub 2007 May 26. ER - TY - JOUR T1 - The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years JF - Medical Care Y1 - 2007 A1 - Cella, D. A1 - Yount, S. A1 - Rothrock, N. A1 - Gershon, R. C. A1 - Cook, K. F. A1 - Reeve, B. A1 - Ader, D. A1 - Fries, J.F. A1 - Bruce, B. A1 - Rose, M. AB - The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. VL - 45 ER - TY - JOUR T1 - A Practitioner’s Guide for Variable-length Computerized Classification Testing JF - Practical Assessment Research and Evaluation Y1 - 2007 A1 - Thompson, N. A. VL - 12 IS - 1 ER - TY - Generic T1 - A practitioner's guide to variable-length computerized classification testing Y1 - 2007 A1 - Thompson, N. A. KW - CAT KW - classification KW - computer adaptive testing KW - computerized adaptive testing KW - Computerized classification testing AB - Variable-length computerized classification tests, CCTs, (Lin & Spray, 2000; Thompson, 2006) are a powerful and efficient approach to testing for the purpose of classifying examinees into groups. CCTs are designed by the specification of at least five technical components: psychometric model, calibrated item bank, starting point, item selection algorithm, and termination criterion. Several options exist for each of these CCT components, creating a myriad of possible designs. Confusion among designs is exacerbated by the lack of a standardized nomenclature. This article outlines the components of a CCT, common options for each component, and the interaction of options for different components, so that practitioners may more efficiently design CCTs. It also offers a suggestion of nomenclature. JF - Practical Assessment, Research and Evaluation VL - 12 ER - TY - JOUR T1 - Prospective evaluation of the am-pac-cat in outpatient rehabilitation settings JF - Physical Therapy Y1 - 2007 A1 - Jette, A., A1 - Haley, S. A1 - Tao, W. A1 - Ni, P. A1 - Moed, R. A1 - Meyers, D. A1 - Zurek, M. VL - 87 ER - TY - JOUR T1 - Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) JF - Medical Care Y1 - 2007 A1 - Reeve, B. B. A1 - Hays, R. D. A1 - Bjorner, J. B. A1 - Cook, K. F. A1 - Crane, P. K. A1 - Teresi, J. A. A1 - Thissen, D. A1 - Revicki, D. A. A1 - Weiss, D. J. A1 - Hambleton, R. K. A1 - Liu, H. A1 - Gershon, R. C. A1 - Reise, S. P. A1 - Lai, J. S. A1 - Cella, D. KW - *Health Status KW - *Information Systems KW - *Quality of Life KW - *Self Disclosure KW - Adolescent KW - Adult KW - Aged KW - Calibration KW - Databases as Topic KW - Evaluation Studies as Topic KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Questionnaires/standards KW - United States AB - BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. VL - 45 SN - 0025-7079 (Print) N1 - Reeve, Bryce BHays, Ron DBjorner, Jakob BCook, Karon FCrane, Paul KTeresi, Jeanne AThissen, DavidRevicki, Dennis AWeiss, David JHambleton, Ronald KLiu, HonghuGershon, RichardReise, Steven PLai, Jin-sheiCella, DavidPROMIS Cooperative GroupAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralUnited StatesMedical careMed Care. 2007 May;45(5 Suppl 1):S22-31. ER - TY - JOUR T1 - Psychometric properties of an emotional adjustment measure: An application of the graded response model JF - European Journal of Psychological Assessment Y1 - 2007 A1 - Rubio, V. J. A1 - Aguado, D. A1 - Hontangas, P. M. A1 - Hernández, J. M. KW - computerized adaptive tests KW - Emotional Adjustment KW - Item Response Theory KW - Personality Measures KW - personnel recruitment KW - Psychometrics KW - Samejima's graded response model KW - test reliability KW - validity AB - Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Hogrefe & Huber Publishers GmbH: Germany VL - 23 SN - 1015-5759 (Print) ER - TY - JOUR T1 - Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN Substance Problem Scale JF - Journal of Applied Measurement Y1 - 2007 A1 - Riley, B. B. A1 - Conrad, K. J. A1 - Bezruczko, N. A1 - Dennis, M. L. KW - My article AB - Substance abuse treatment programs are being pressed to measure and make clinical decisions more efficiently about an increasing array of problems. This computerized adaptive testing (CAT) simulation examined the relative efficiency, precision and construct validity of different starting and stopping rules used to shorten the Global Appraisal of Individual Needs’ (GAIN) Substance Problem Scale (SPS) and facilitate diagnosis based on it. Data came from 1,048 adolescents and adults referred to substance abuse treatment centers in 5 sites. CAT performance was evaluated using: (1) average standard errors, (2) average number of items, (3) bias in personmeasures, (4) root mean squared error of person measures, (5) Cohen’s kappa to evaluate CAT classification compared to clinical classification, (6) correlation between CAT and full-scale measures, and (7) construct validity of CAT classification vs. clinical classification using correlations with five theoretically associated instruments. Results supported both CAT efficiency and validity. VL - 8 ER - TY - JOUR T1 - A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005 JF - Journal of Technology,Learning, and Assessment, Y1 - 2007 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. AB - Since researchers acknowledged the several advantages of computerized adaptive testing (CAT) over traditional linear test administration, the issue of item exposure control has received increased attention. Due to CAT’s underlying philosophy, particular items in the item pool may be presented too often and become overexposed, while other items are rarely selected by the CAT algorithm and thus become underexposed. Several item exposure control strategies have been presented in the literature aiming to prevent overexposure of some items and to increase the use rate of rarely or never selected items. This paper reviews such strategies that appeared in the relevant literature from 1983 to 2005. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item exposure control strategies for dichotomous scoring, polytomous scoring and testlet-based CAT systems. In addition, the paper discusses the strengths and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed with an overall objective of providing researchers of this field, especially newcomers, a wide view of item exposure control strategies. VL - 5(8) N1 - http://www.jtla.org. {PDF file, 326 KB} ER - TY - CHAP T1 - The shadow-test approach: A universal framework for implementing adaptive testing Y1 - 2007 A1 - van der Linden, W. J. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 461 KB} ER - TY - CHAP T1 - Some thoughts on controlling item exposure in adaptive testing Y1 - 2007 A1 - Lewis, C. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 357 KB} ER - TY - CHAP T1 - Statistical aspects of adaptive testing Y1 - 2007 A1 - van der Linden, W. J. A1 - Glas, C. A. W. CY - C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (Vol. 27: Psychometrics) (pp. 801838). Amsterdam: North-Holland. ER - TY - JOUR T1 - A system for interactive assessment and management in palliative care JF - Journal of Pain Symptom Management Y1 - 2007 A1 - Chang, C-H. A1 - Boni-Saenz, A. A. A1 - Durazo-Arvizu, R. A. A1 - DesHarnais, S. A1 - Lau, D. T. A1 - Emanuel, L. L. KW - *Needs Assessment KW - Humans KW - Medical Informatics/*organization & administration KW - Palliative Care/*organization & administration AB - The availability of psychometrically sound and clinically relevant screening, diagnosis, and outcome evaluation tools is essential to high-quality palliative care assessment and management. Such data will enable us to improve patient evaluations, prognoses, and treatment selections, and to increase patient satisfaction and quality of life. To accomplish these goals, medical care needs more precise, efficient, and comprehensive tools for data acquisition, analysis, interpretation, and management. We describe a system for interactive assessment and management in palliative care (SIAM-PC), which is patient centered, model driven, database derived, evidence based, and technology assisted. The SIAM-PC is designed to reliably measure the multiple dimensions of patients' needs for palliative care, and then to provide information to clinicians, patients, and the patients' families to achieve optimal patient care, while improving our capacity for doing palliative care research. This system is innovative in its application of the state-of-the-science approaches, such as item response theory and computerized adaptive testing, to many of the significant clinical problems related to palliative care. VL - 33 SN - 0885-3924 (Print) N1 - Chang, Chih-HungBoni-Saenz, Alexander ADurazo-Arvizu, Ramon ADesHarnais, SusanLau, Denys TEmanuel, Linda LR21CA113191/CA/United States NCIResearch Support, N.I.H., ExtramuralReviewUnited StatesJournal of pain and symptom managementJ Pain Symptom Manage. 2007 Jun;33(6):745-55. Epub 2007 Mar 23. ER - TY - JOUR T1 - Test design optimization in CAT early stage with the nominal response model JF - Applied Psychological Measurement Y1 - 2007 A1 - Passos, V. L. A1 - Berger, M. P. F. A1 - Tan, F. E. KW - computerized adaptive testing KW - nominal response model KW - robust performance KW - test design optimization AB - The early stage of computerized adaptive testing (CAT) refers to the phase of the trait estimation during the administration of only a few items. This phase can be characterized by bias and instability of estimation. In this study, an item selection criterion is introduced in an attempt to lessen this instability: the D-optimality criterion. A polytomous unconstrained CAT simulation is carried out to evaluate this criterion's performance under different test premises. The simulation shows that the extent of early stage instability depends primarily on the quality of the item pool information and its size and secondarily on the item selection criteria. The efficiency of the D-optimality criterion is similar to the efficiency of other known item selection criteria. Yet, it often yields estimates that, at the beginning of CAT, display a more robust performance against instability. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 31 SN - 0146-6216 (Print) ER - TY - JOUR T1 - Two-phase item selection procedure for flexible content balancing in CAT JF - Applied Psychological. Measurement Y1 - 2007 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Yi, Q. VL - 3 ER - TY - JOUR T1 - Two-Phase Item Selection Procedure for Flexible Content Balancing in CAT JF - Applied Psychological Measurement Y1 - 2007 A1 - Ying Cheng, A1 - Chang, Hua-Hua A1 - Qing Yi, AB -

Content balancing is an important issue in the design and implementation of computerized adaptive testing (CAT). Content-balancing techniques that have been applied in fixed content balancing, where the number of items from each content area is fixed, include constrained CAT (CCAT), the modified multinomial model (MMM), modified constrained CAT (MCCAT), and others. In this article, four methods are proposed to address the flexible content-balancing issue with the a-stratification design, named STR_C. The four methods are MMM+, an extension of MMM; MCCAT+, an extension of MCCAT; the TPM method, a two-phase content-balancing method using MMM in both phases; and the TPF method, a two-phase content-balancing method using MMM in the first phase and MCCAT in the second. Simulation results show that all of the methods work well in content balancing, and TPF performs the best in item exposure control and item pool utilization while maintaining measurement precision.

VL - 31 UR - http://apm.sagepub.com/content/31/6/467.abstract ER - TY - CHAP T1 - Up-and-down procedures for approximating optimal designs using person-response functions Y1 - 2007 A1 - Sheng, Y. A1 - Flournoy, N. A1 - Osterlind, S. J. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 1,042 KB} ER - TY - CHAP T1 - Use of CAT in dynamic testing Y1 - 2007 A1 - De Beer, M. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 133 KB} ER - TY - Generic T1 - The use of computerized adaptive testing to assess psychopathology using the Global Appraisal of Individual Needs T2 - American Evaluation Association Y1 - 2007 A1 - Conrad, K. J. A1 - Riley, B. B. A1 - Dennis, M. L. JF - American Evaluation Association PB - American Evaluation Association CY - Portland, OR USA ER - TY - CHAP T1 - Validity and decision issues in selecting a CAT measurement model Y1 - 2007 A1 - Olsen, J. B. A1 - Bunderson, C. V CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 977 KB} ER - TY - JOUR T1 - Adaptive success control in computerized adaptive testing JF - Psychology Science Y1 - 2006 A1 - Häusler, Joachim KW - adaptive success control KW - computerized adaptive testing KW - Psychometrics AB - In computerized adaptive testing (CAT) procedures within the framework of probabilistic test theory the difficulty of an item is adjusted to the ability of the respondent, with the aim of maximizing the amount of information generated per item, thereby also increasing test economy and test reasonableness. However, earlier research indicates that respondents might feel over-challenged by a constant success probability of p = 0.5 and therefore cannot come to a sufficiently high answer certainty within a reasonable timeframe. Consequently response time per item increases, which -- depending on the test material -- can outweigh the benefit of administering optimally informative items. Instead of a benefit, the result of using CAT procedures could be a loss of test economy. Based on this problem, an adaptive success control algorithm was designed and tested, adapting the success probability to the working style of the respondent. Persons who need higher answer certainty in order to come to a decision are detected and receive a higher success probability, in order to minimize the test duration (not the number of items as in classical CAT). The method is validated on the re-analysis of data from the Adaptive Matrices Test (AMT, Hornke, Etzel & Rettig, 1999) and by the comparison between an AMT version using classical CAT and an experimental version using Adaptive Success Control. The results are discussed in the light of psychometric and psychological aspects of test quality. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Pabst Science Publishers: Germany VL - 48 SN - 0033-3018 (Print) ER - TY - JOUR T1 - Applying Bayesian item selection approaches to adaptive tests using polytomous items JF - Applied Measurement in Education Y1 - 2006 A1 - Penfield, R. D. KW - adaptive tests KW - Bayesian item selection KW - computer adaptive testing KW - maximum expected information KW - polytomous items KW - posterior weighted information AB - This study applied the maximum expected information (MEI) and the maximum posterior- weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability estimation using the MEI and MPI approaches to the traditional maximal item information (MII) approach. The results of the simulation study indicated that the MEI and MPI approaches led to a superior efficiency of ability estimation compared with the MII approach. The superiority of the MEI and MPI approaches over the MII approach was greatest when the bank contained items having a relatively peaked information function. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Lawrence Erlbaum: US VL - 19 SN - 0895-7347 (Print); 1532-4818 (Electronic) ER - TY - JOUR T1 - Assembling a computerized adaptive testing item pool as a set of linear tests JF - Journal of Educational and Behavioral Statistics Y1 - 2006 A1 - van der Linden, W. J. A1 - Ariel, A. A1 - Veldkamp, B. P. KW - Algorithms KW - computerized adaptive testing KW - item pool KW - linear tests KW - mathematical models KW - statistics KW - Test Construction KW - Test Items AB - Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. PB - Sage Publications: US VL - 31 SN - 1076-9986 (Print) ER - TY - JOUR T1 - Assessing CAT Test Security Severity JF - Applied Psychological Measurement Y1 - 2006 A1 - Yi, Q., Zhang, J. A1 - Chang, Hua-Hua VL - 30(1) ER - TY - ABST T1 - A CAT with personality and attitude Y1 - 2006 A1 - Hol, A. M. CY - Enschede, The Netherlands: PrintPartners Ipskamp B N1 - #HO06-01 . ER - TY - JOUR T1 - Comparing methods of assessing differential item functioning in a computerized adaptive testing environment JF - Journal of Educational Measurement Y1 - 2006 A1 - Lei, P-W. A1 - Chen, S-Y. A1 - Yu, L. KW - computerized adaptive testing KW - educational testing KW - item response theory likelihood ratio test KW - logistic regression KW - trait estimation KW - unidirectional & non-unidirectional differential item functioning AB - Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 43 SN - 0022-0655 (Print) ER - TY - JOUR T1 - Comparing Methods of Assessing Differential Item Functioning in a Computerized Adaptive Testing Environment JF - Journal of Educational Measurement Y1 - 2006 A1 - Lei, Pui-Wa A1 - Chen, Shu-Ying A1 - Yu, Lan AB -

Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed.

VL - 43 UR - http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x ER - TY - JOUR T1 - The comparison among item selection strategies of CAT with multiple-choice items JF - Acta Psychologica Sinica Y1 - 2006 A1 - Hai-qi, D. A1 - De-zhi, C. A1 - Shuliang, D. A1 - Taiping, D. KW - CAT KW - computerized adaptive testing KW - graded response model KW - item selection strategies KW - multiple choice items AB - The initial purpose of comparing item selection strategies for CAT was to increase the efficiency of tests. As studies continued, however, it was found that increasing the efficiency of item bank using was also an important goal of comparing item selection strategies. These two goals often conflicted. The key solution was to find a strategy with which both goals could be accomplished. The item selection strategies for graded response model in this study included: the average of the difficulty orders matching with the ability; the medium of the difficulty orders matching with the ability; maximum information; A stratified (average); and A stratified (medium). The evaluation indexes used for comparison included: the bias of ability estimates for the true; the standard error of ability estimates; the average items which the examinees have administered; the standard deviation of the frequency of items selected; and sum of the indices weighted. Using the Monte Carlo simulation method, we obtained some data and computer iterated the data 20 times each under the conditions that the item difficulty parameters followed the normal distribution and even distribution. The results were as follows; The results indicated that no matter difficulty parameters followed the normal distribution or even distribution. Every type of item selection strategies designed in this research had its strong and weak points. In general evaluation, under the condition that items were stratified appropriately, A stratified (medium) (ASM) had the best effect. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Science Press: China VL - 38 SN - 0439-755X (Print) ER - TY - CONF T1 - A comparison of online calibration methods for a CAT T2 - Presented at the National Council on Measurement on Education Y1 - 2006 A1 - Morgan, D. L. A1 - Way, W. D. A1 - Augemberg, K.E. JF - Presented at the National Council on Measurement on Education CY - San Francisco, CA ER - TY - JOUR T1 - Comparison of the Psychometric Properties of Several Computer-Based Test Designs for Credentialing Exams With Multiple Purposes JF - Applied Measurement in Education Y1 - 2006 A1 - Jodoin, Michael G. A1 - Zenisky, April A1 - Hambleton, Ronald K. VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_3 ER - TY - JOUR T1 - Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Haley, S. M. A1 - Ni, P. A1 - Hambleton, R. K. A1 - Slavin, M. D. A1 - Jette, A. M. KW - *Recovery of Function KW - Activities of Daily Living KW - Adolescent KW - Adult KW - Aged KW - Aged, 80 and over KW - Confidence Intervals KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Rehabilitation/*standards KW - Reproducibility of Results KW - Software AB - BACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings. VL - 59 SN - 0895-4356 (Print) N1 - Haley, Stephen MNi, PengshengHambleton, Ronald KSlavin, Mary DJette, Alan MK02 hd45354-01/hd/nichdR01 hd043568/hd/nichdComparative StudyResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, Non-P.H.S.EnglandJournal of clinical epidemiologyJ Clin Epidemiol. 2006 Nov;59(11):1174-82. Epub 2006 Jul 11. ER - TY - JOUR T1 - Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Haley, S. A1 - Ni, P. A1 - Hambleton, R. K. A1 - Slavin, M. A1 - Jette, A. VL - 59 SN - 08954356 ER - TY - CHAP T1 - Computer-based testing T2 - Handbook of multimethod measurement in psychology Y1 - 2006 A1 - F Drasgow A1 - Chuah, S. C. KW - Adaptive Testing computerized adaptive testing KW - Computer Assisted Testing KW - Experimentation KW - Psychometrics KW - Theories AB - (From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA ) JF - Handbook of multimethod measurement in psychology PB - American Psychological Association CY - Washington D.C. USA VL - xiv N1 - Using Smart Source ParsingHandbook of multimethod measurement in psychology. (pp. 87-100). Washington, DC : American Psychological Association, [URL:http://www.apa.org/books]. xiv, 553 pp ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2006 A1 - Haley, S. M. A1 - Siebens, H. A1 - Coster, W. J. A1 - Tao, W. A1 - Black-Schaffer, R. M. A1 - Gandek, B. A1 - Sinclair, S. J. A1 - Ni, P. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. VL - 87 SN - 0003-9993 (Print) N1 - Haley, Stephen MSiebens, HilaryCoster, Wendy JTao, WeiBlack-Schaffer, Randie MGandek, BarbaraSinclair, Samuel JNi, PengshengK0245354-01/phsR01 hd043568/hd/nichdResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2006 Aug;87(8):1033-42. ER - TY - JOUR T1 - Computerized adaptive testing of diabetes impact: a feasibility study of Hispanics and non-Hispanics in an active clinic population JF - Quality of Life Research Y1 - 2006 A1 - Schwartz, C. A1 - Welch, G. A1 - Santiago-Kelley, P. A1 - Bode, R. A1 - Sun, X. KW - *Computers KW - *Hispanic Americans KW - *Quality of Life KW - Adult KW - Aged KW - Data Collection/*methods KW - Diabetes Mellitus/*psychology KW - Feasibility Studies KW - Female KW - Humans KW - Language KW - Male KW - Middle Aged AB - BACKGROUND: Diabetes is a leading cause of death and disability in the US and is twice as common among Hispanic Americans as non-Hispanics. The societal costs of diabetes provide an impetus for developing tools that can improve patient care and delay or prevent diabetes complications. METHODS: We implemented a feasibility study of a Computerized Adaptive Test (CAT) to measure diabetes impact using a sample of 103 English- and 97 Spanish-speaking patients (mean age = 56.5, 66.5% female) in a community medical center with a high proportion of minority patients (28% African-American). The 37 items of the Diabetes Impact Survey were translated using forward-backward translation and cognitive debriefing. Participants were randomized to receive either the full-length tool or the Diabetes-CAT first, in the patient's native language. RESULTS: The number of items and the amount of time to complete the survey for the CAT was reduced to one-sixth the amount for the full-length tool in both languages, across disease severity. Confirmatory Factor Analysis confirmed that the Diabetes Impact Survey is unidimensional. The Diabetes-CAT demonstrated acceptable internal consistency reliability, construct validity, and discriminant validity in the overall sample, although subgroup analyses suggested that the English sample data evidenced higher levels of reliability and validity than the Spanish sample and issues with discriminant validity in the Spanish sample. Differential Item Function analysis revealed differences in responses tendencies by language group in 3 of the 37 items. Participant interviews suggested that the Spanish-speaking patients generally preferred the paper survey to the computer-assisted tool, and were twice as likely to experience difficulties understanding the items. CONCLUSIONS: While the Diabetes-CAT demonstrated clear advantages in reducing respondent burden as compared to the full-length tool, simplifying the item bank will be necessary for enhancing the feasibility of the Diabetes-CAT for use with low literacy patients. VL - 15 SN - 0962-9343 (Print) N1 - Schwartz, CarolynWelch, GarrySantiago-Kelley, PaulaBode, RitaSun, Xiaowu1 r43 dk066874-01/dk/niddkResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2006 Nov;15(9):1503-18. Epub 2006 Sep 26. ER - TY - JOUR T1 - Computerized adaptive testing under nonparametric IRT models JF - Psychometrika Y1 - 2006 A1 - Xu, X. A1 - Douglas, J. VL - 71 ER - TY - CONF T1 - Constraints-weighted information method for item selection of severely constrained computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2006 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Wang, X. B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - CHAP T1 - Designing computerized adaptive tests Y1 - 2006 A1 - Davey, T. A1 - Pitoniak, M. J. CY - S.M. Downing and T. M. Haladyna (Eds.), Handbook of test development. New Jersey: Lawrence Erlbaum Associates. ER - TY - JOUR T1 - Effects of Estimation Bias on Multiple-Category Classification With an IRT-Based Adaptive Classification Procedure JF - Educational and Psychological Measurement Y1 - 2006 A1 - Yang, Xiangdong A1 - Poggio, John C. A1 - Glasnapp, Douglas R. AB -

The effects of five ability estimators, that is, maximum likelihood estimator, weighted likelihood estimator, maximum a posteriori, expected a posteriori, and Owen's sequential estimator, on the performances of the item response theory–based adaptive classification procedure on multiple categories were studied via simulations. The following results were found. (a) The Bayesian estimators were more likely to misclassify examinees into an inward category because of their inward biases, when a fixed start value of zero was assigned to every examinee. (b) When moderately accurate start values were available, however, Bayesian estimators produced classifications that were slightly more accurate than was the maximum likelihood estimator or weighted likelihood estimator. Expected a posteriori was the procedure that produced the most accurate results among the three Bayesian methods. (c) All five estimators produced equivalent efficiencies in terms of number of items required, which was 50 or more items except for abilities that were less than -2.00 or greater than 2.00.

VL - 66 UR - http://epm.sagepub.com/content/66/4/545.abstract ER - TY - JOUR T1 - Equating scores from adaptive to linear tests JF - Applied Psychological Measurement Y1 - 2006 A1 - van der Linden, W. J. KW - computerized adaptive testing KW - equipercentile equating KW - local equating KW - score reporting KW - test characteristic function AB - Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test for a population of test takers. The two local methods were generally best. Surprisingly, the TCF method performed slightly worse than the equipercentile method. Both methods showed strong bias and uniformly large inaccuracy, but the TCF method suffered from extra error due to the lower asymptote of the test characteristic function. It is argued that the worse performances of the two methods are a consequence of the fact that they use a single equating transformation for an entire population of test takers and therefore have to compromise between the individual score distributions. PB - Sage Publications: US VL - 30 SN - 0146-6216 (Print) ER - TY - JOUR T1 - Estimation of an examinee's ability in the web-based computerized adaptive testing program IRT-CAT JF - J Educ Eval Health Prof Y1 - 2006 A1 - Lee, Y. H. A1 - Park, J. H. A1 - Park, I. Y. AB - We developed a program to estimate an examinee s ability in order to provide freely available access to a web-based computerized adaptive testing (CAT) program. We used PHP and Java Script as the program languages, PostgresSQL as the database management system on an Apache web server and Linux as the operating system. A system which allows for user input and searching within inputted items and creates tests was constructed. We performed an ability estimation on each test based on a Rasch model and 2- or 3-parametric logistic models. Our system provides an algorithm for a web-based CAT, replacing previous personal computer-based ones, and makes it possible to estimate an examinee's ability immediately at the end of test. VL - 3 SN - 1975-5937 (Electronic) N1 - Lee, Yoon-HwanPark, Jung-HoPark, In-YongKorea (South)Journal of educational evaluation for health professionsJ Educ Eval Health Prof. 2006;3:4. Epub 2006 Nov 22. U2 - 2631187 ER - TY - JOUR T1 - An evaluation of a patient-reported outcomes found computerized adaptive testing was efficient in assessing osteoarthritis impact JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Kosinski, M. A1 - Bjorner, J. A1 - Warejr, J. A1 - Sullivan, E. A1 - Straus, W. AB - BACKGROUND AND OBJECTIVES: Evaluate a patient-reported outcomes questionnaire that uses computerized adaptive testing (CAT) to measure the impact of osteoarthritis (OA) on functioning and well-being. MATERIALS AND METHODS: OA patients completed 37 questions about the impact of OA on physical, social and role functioning, emotional well-being, and vitality. Questionnaire responses were calibrated and scored using item response theory, and two scores were estimated: a Total-OA score based on patients' responses to all 37 questions, and a simulated CAT-OA score where the computer selected and scored the five most informative questions for each patient. Agreement between Total-OA and CAT-OA scores was assessed using correlations. Discriminant validity of Total-OA and CAT-OA scores was assessed with analysis of variance. Criterion measures included OA pain and severity, patient global assessment, and missed work days. RESULTS: Simulated CAT-OA and Total-OA scores correlated highly (r = 0.96). Both Total-OA and simulated CAT-OA scores discriminated significantly between patients differing on the criterion measures. F-statistics across criterion measures ranged from 39.0 (P < .001) to 225.1 (P < .001) for the Total-OA score, and from 40.5 (P < .001) to 221.5 (P < .001) for the simulated CAT-OA score. CONCLUSIONS: CAT methods produce valid and precise estimates of the impact of OA on functioning and well-being with significant reduction in response burden. VL - 59 SN - 08954356 ER - TY - JOUR T1 - Evaluation parameters for computer adaptive testing JF - British Journal of Educational Technology Y1 - 2006 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. VL - Vol. 37 IS - No 2 ER - TY - JOUR T1 - Expansion of a physical function item bank and development of an abbreviated form for clinical research JF - Journal of Applied Measurement Y1 - 2006 A1 - Bode, R. K. A1 - Lai, J-S. A1 - Dineen, K. A1 - Heinemann, A. W. A1 - Shevrin, D. A1 - Von Roenn, J. A1 - Cella, D. KW - clinical research KW - computerized adaptive testing KW - performance levels KW - physical function item bank KW - Psychometrics KW - test reliability KW - Test Validity AB - We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Richard M Smith: US VL - 7 SN - 1529-7713 (Print) ER - TY - JOUR T1 - Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue JF - Quality of Life Research Y1 - 2006 A1 - Lai, J-S. A1 - Crane, P. K. A1 - Cella, D. KW - *Factor Analysis, Statistical KW - *Quality of Life KW - Aged KW - Chicago KW - Fatigue/*etiology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/*complications KW - Questionnaires AB - BACKGROUND: Fatigue is the most common unrelieved symptom experienced by people with cancer. The purpose of this study was to examine whether cancer-related fatigue (CRF) can be summarized using a single score, that is, whether CRF is sufficiently unidimensional for measurement approaches that require or assume unidimensionality. We evaluated this question using factor analysis techniques including the theory-driven bi-factor model. METHODS: Five hundred and fifty five cancer patients from the Chicago metropolitan area completed a 72-item fatigue item bank, covering a range of fatigue-related concerns including intensity, frequency and interference with physical, mental, and social activities. Dimensionality was assessed using exploratory and confirmatory factor analysis (CFA) techniques. RESULTS: Exploratory factor analysis (EFA) techniques identified from 1 to 17 factors. The bi-factor model suggested that CRF was sufficiently unidimensional. CONCLUSIONS: CRF can be considered sufficiently unidimensional for applications that require unidimensionality. One such application, item response theory (IRT), will facilitate the development of short-form and computer-adaptive testing. This may further enable practical and accurate clinical assessment of CRF. VL - 15 N1 - 0962-9343 (Print)Journal ArticleResearch Support, N.I.H., Extramural ER - TY - JOUR T1 - A Feedback Control Strategy for Enhancing Item Selection Efficiency in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2006 A1 - Weissman, Alexander AB -

A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level (θ) estimation and vice versa. When discrepancies exist between an examinee's estimated and true θ levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with optimal item selection can be predicted using item response theory (IRT), without knowledge of an examinee's true θ level, yielding a specific reference point for applying an internal correcting or feedback control mechanism. Incorporating such a mechanism in a CAT is shown to be an effective strategy for increasing item selection efficiency. Results from simulation studies using maximum likelihood (ML) and modal a posteriori (MAP) trait-level estimation and Fisher information (FI) and Fisher interval information (FII) item selection are provided.

VL - 30 UR - http://apm.sagepub.com/content/30/2/84.abstract ER - TY - JOUR T1 - How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation JF - Applied Measurement in Education Y1 - 2006 A1 - Chuah, Siang Chee A1 - F Drasgow A1 - Luecht, Richard VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_5 ER - TY - JOUR T1 - An Introduction to Multistage Testing JF - Applied Measurement in Education Y1 - 2006 A1 - Alan D Mead VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_1 ER - TY - JOUR T1 - Item banks and their potential applications to health status assessment in diverse populations JF - Medical Care Y1 - 2006 A1 - Hahn, E. A. A1 - Cella, D. A1 - Bode, R. K. A1 - Gershon, R. C. A1 - Lai, J. S. AB - In the context of an ethnically diverse, aging society, attention is increasingly turning to health-related quality of life measurement to evaluate healthcare and treatment options for chronic diseases. When evaluating and treating symptoms and concerns such as fatigue, pain, or physical function, reliable and accurate assessment is a priority. Modern psychometric methods have enabled us to move from long, static tests that provide inefficient and often inaccurate assessment of individual patients, to computerized adaptive tests (CATs) that can precisely measure individuals on health domains of interest. These modern methods, collectively referred to as item response theory (IRT), can produce calibrated "item banks" from larger pools of questions. From these banks, CATs can be conducted on individuals to produce their scores on selected domains. Item banks allow for comparison of patients across different question sets because the patient's score is expressed on a common scale. Other advantages of using item banks include flexibility in terms of the degree of precision desired; interval measurement properties under most circumstances; realistic capability for accurate individual assessment over time (using CAT); and measurement equivalence across different patient populations. This work summarizes the process used in the creation and evaluation of item banks and reviews their potential contributions and limitations regarding outcome assessment and patient care, particularly when they are applied across people of different cultural backgrounds. VL - 44 N1 - 0025-7079 (Print)Journal ArticleResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov't ER - TY - JOUR T1 - [Item Selection Strategies of Computerized Adaptive Testing based on Graded Response Model.] JF - Acta Psychologica Sinica Y1 - 2006 A1 - Ping, Chen A1 - Shuliang, Ding A1 - Haijing, Lin A1 - Jie, Zhou KW - computerized adaptive testing KW - item selection strategy AB - Item selection strategy (ISS) is an important component of Computerized Adaptive Testing (CAT). Its performance directly affects the security, efficiency and precision of the test. Thus, ISS becomes one of the central issues in CATs based on the Graded Response Model (GRM). It is well known that the goal of IIS is to administer the next unused item remaining in the item bank that best fits the examinees current ability estimate. In dichotomous IRT models, every item has only one difficulty parameter and the item whose difficulty matches the examinee's current ability estimate is considered to be the best fitting item. However, in GRM, each item has more than two ordered categories and has no single value to represent the item difficulty. Consequently, some researchers have used to employ the average or the median difficulty value across categories as the difficulty estimate for the item. Using the average value and the median value in effect introduced two corresponding ISSs. In this study, we used computer simulation compare four ISSs based on GRM. We also discussed the effect of "shadow pool" on the uniformity of pool usage as well as the influence of different item parameter distributions and different ability estimation methods on the evaluation criteria of CAT. In the simulation process, Monte Carlo method was adopted to simulate the entire CAT process; 1,000 examinees drawn from standard normal distribution and four 1,000-sized item pools of different item parameter distributions were also simulated. The assumption of the simulation is that a polytomous item is comprised of six ordered categories. In addition, ability estimates were derived using two methods. They were expected a posteriori Bayesian (EAP) and maximum likelihood estimation (MLE). In MLE, the Newton-Raphson iteration method and the Fisher Score iteration method were employed, respectively, to solve the likelihood equation. Moreover, the CAT process was simulated with each examinee 30 times to eliminate random error. The IISs were evaluated by four indices usually used in CAT from four aspects--the accuracy of ability estimation, the stability of IIS, the usage of item pool, and the test efficiency. Simulation results showed adequate evaluation of the ISS that matched the estimate of an examinee's current trait level with the difficulty values across categories. Setting "shadow pool" in ISS was able to improve the uniformity of pool utilization. Finally, different distributions of the item parameter and different ability estimation methods affected the evaluation indices of CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Science Press: China VL - 38 SN - 0439-755X (Print) ER - TY - RPRT T1 - Kernel-smoothed DIF detection procedure for computerized adaptive tests (Computerized testing report 00-08) Y1 - 2006 A1 - Nandakumar, R. A1 - Banks, J. C. A1 - Roussos, L. A. PB - Law School Admission Council CY - Newton, PA ER - TY - JOUR T1 - Maximum information stratification method for controlling item exposure in computerized adaptive testing JF - Psicothema Y1 - 2006 A1 - Barrada, J A1 - Mazuela, P. A1 - Olea, J. KW - *Artificial Intelligence KW - *Microcomputers KW - *Psychological Tests KW - *Software Design KW - Algorithms KW - Chi-Square Distribution KW - Humans KW - Likelihood Functions AB - The proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters (a) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter (c), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank.Método de estratificación por máxima información para el control de la exposición en tests adaptativos informatizados. La propuesta para aumentar la seguridad en los tests adaptativos informatizados que ha recibido más atención en los últimos años ha sido el método a-estratificado (AE - Chang y Ying, 1999): en los momentos iniciales del test sólo pueden administrarse ítems con bajos parámetros de discriminación (a), incrementándose los valores del parámetro a admisibles según avanza el test. Con este método la distribución de las tasas de exposición de los ítems es más equilibrada, manteniendo una adecuada precisión en la medida. El parámetro de pseudoadivinación (c), presente en el modelo logístico de tres parámetros, se supone irrelevante y no se incorpora en el AE. El método de Estratificación por Máxima Información (EMI) incorpora el parámetro c a la estratificación del banco y a la regla de selección de ítems, mejorando la precisión en comparación con AE, tanto para bancos donde los parámetros a y b correlacionan como para bancos donde no. Para ambos tipos de bancos, los métodos de bloqueo de b (Chang, Qian y Ying, 2001) mejoran la seguridad del banco. VL - 18 SN - 0214-9915 (Print) N1 - Barrada, Juan RamonMazuela, PalomaOlea, JulioResearch Support, Non-U.S. Gov'tSpainPsicothemaPsicothema. 2006 Feb;18(1):156-9. ER - TY - JOUR T1 - Measurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2006 A1 - Haley, S. M. A1 - Ni, P. A1 - Ludlow, L. H. A1 - Fragala-Pinkham, M. A. KW - *Disability Evaluation KW - *Pediatrics KW - Adolescent KW - Child KW - Child, Preschool KW - Computers KW - Disabled Persons/*classification/rehabilitation KW - Efficiency KW - Humans KW - Infant KW - Outcome Assessment (Health Care) KW - Psychometrics KW - Self Care AB - OBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired. VL - 87 SN - 0003-9993 (Print) N1 - Haley, Stephen MNi, PengshengLudlow, Larry HFragala-Pinkham, Maria AK02 hd45354-01/hd/nichdResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2006 Sep;87(9):1223-9. ER - TY - JOUR T1 - Multidimensional computerized adaptive testing of the EORTC QLQ-C30: basic developments and evaluations JF - Quality of Life Research Y1 - 2006 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Fayers, P. A1 - Sprangers, M. A1 - Bjorner, J. B. KW - *Quality of Life KW - *Self Disclosure KW - Adult KW - Female KW - Health Status KW - Humans KW - Male KW - Middle Aged KW - Questionnaires/*standards KW - User-Computer Interface AB - OBJECTIVE: Self-report questionnaires are widely used to measure health-related quality of life (HRQOL). Ideally, such questionnaires should be adapted to the individual patient and at the same time scores should be directly comparable across patients. This may be achieved using computerized adaptive testing (CAT). Usually, CAT is carried out for a single domain at a time. However, many HRQOL domains are highly correlated. Multidimensional CAT may utilize these correlations to improve measurement efficiency. We investigated the possible advantages and difficulties of multidimensional CAT. STUDY DESIGN AND SETTING: We evaluated multidimensional CAT of three scales from the EORTC QLQ-C30: the physical functioning, emotional functioning, and fatigue scales. Analyses utilised a database with 2958 European cancer patients. RESULTS: It was possible to obtain scores for the three domains with five to seven items administered using multidimensional CAT that were very close to the scores obtained using all 12 items and with no or little loss of measurement precision. CONCLUSION: The findings suggest that multidimensional CAT may significantly improve measurement precision and efficiency and encourage further research into multidimensional CAT. Particularly, the estimation of the model underlying the multidimensional CAT and the conceptual aspects need further investigations. VL - 15 SN - 0962-9343 (Print) N1 - Petersen, Morten AaGroenvold, MogensAaronson, NeilFayers, PeterSprangers, MirjamBjorner, Jakob BEuropean Organisation for Research and Treatment of Cancer Quality of Life GroupResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2006 Apr;15(3):315-29. ER - TY - CONF T1 - Multiple maximum exposure rates in computerized adaptive testing T2 - Paper presented at the SMABS-EAM Conference Y1 - 2006 A1 - Barrada, J A1 - Veldkamp, B. P. A1 - Olea, J. JF - Paper presented at the SMABS-EAM Conference CY - Budapest, Hungary ER - TY - JOUR T1 - Multistage Testing: Widely or Narrowly Applicable? JF - Applied Measurement in Education Y1 - 2006 A1 - Stark, Stephen A1 - Chernyshenko, Oleksandr S. VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_6 ER - TY - JOUR T1 - Optimal and nonoptimal computer-based test designs for making pass-fail decisions JF - Applied Measurement in Education Y1 - 2006 A1 - Hambleton, R. K. A1 - Xing, D. KW - adaptive test KW - credentialing exams KW - Decision Making KW - Educational Measurement KW - multistage tests KW - optimal computer-based test designs KW - test form AB - Now that many credentialing exams are being routinely administered by computer, new computer-based test designs, along with item response theory models, are being aggressively researched to identify specific designs that can increase the decision consistency and accuracy of pass-fail decisions. The purpose of this study was to investigate the impact of optimal and nonoptimal multistage test (MST) designs, linear parallel-form test designs (LPFT), and computer adaptive test (CAT) designs on the decision consistency and accuracy of pass-fail decisions. Realistic testing situations matching those of one of the large credentialing agencies were simulated to increase the generalizability of the findings. The conclusions were clear: (a) With the LPFTs, matching test information functions (TIFs) to the mean of the proficiency distribution produced slightly better results than matching them to the passing score; (b) all of the test designs worked better than test construction using random selection of items, subject to content constraints only; (c) CAT performed better than the other test designs; and (d) if matching a TIP to the passing score, the MST design produced a bit better results than the LPFT design. If an argument for the MST design is to be made, it can be made on the basis of slight improvements over the LPFT design and better expected item bank utilization, candidate preference, and the potential for improved diagnostic feedback, compared with the feedback that is possible with fixed linear test forms. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Lawrence Erlbaum: US VL - 19 SN - 0895-7347 (Print); 1532-4818 (Electronic) ER - TY - JOUR T1 - Optimal Testing With Easy or Difficult Items in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2006 A1 - Theo Eggen A1 - Verschoor, Angela J. AB -

Computerized adaptive tests (CATs) are individualized tests that, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the present study, it is shown that maximum information item selection in CATs using an item bank that is calibrated with the one or the two-parameter logistic model results in each individual answering about 50% of the items correctly. Two item selection procedures giving easier (or more difficult) tests for students are presented and evaluated. Item selection on probability points of items yields good results only with the one-parameter logistic model and not with the two-parameter logistic model. An alternative selection procedure, based on maximum information at a shifted ability level, gives satisfactory results with both models. Index terms: computerized adaptive testing, item selection, item response theory

VL - 30 UR - http://apm.sagepub.com/content/30/5/379.abstract ER - TY - JOUR T1 - Optimal testing with easy or difficult items in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2006 A1 - Theo Eggen A1 - Verschoor, Angela J. KW - computer adaptive tests KW - individualized tests KW - Item Response Theory KW - item selection KW - Measurement AB - Computerized adaptive tests (CATs) are individualized tests that, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the present study, it is shown that maximum information item selection in CATs using an item bank that is calibrated with the one- or the two-parameter logistic model results in each individual answering about 50% of the items correctly. Two item selection procedures giving easier (or more difficult) tests for students are presented and evaluated. Item selection on probability points of items yields good results only with the one-parameter logistic model and not with the two-parameter logistic model. An alternative selection procedure, based on maximum information at a shifted ability level, gives satisfactory results with both models. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 30 SN - 0146-6216 (Print) ER - TY - JOUR T1 - Optimal Testlet Pool Assembly for Multistage Testing Designs JF - Applied Psychological Measurement Y1 - 2006 A1 - Ariel, Adelaide A1 - Veldkamp, Bernard P. A1 - Breithaupt, Krista AB -

Computerized multistage testing (MST) designs require sets of test questions (testlets) to be assembled to meet strict, often competing criteria. Rules that govern testlet assembly may dictate the number of questions on a particular subject or may describe desirable statistical properties for the test, such as measurement precision. In an MST design, testlets of differing difficulty levels must be created. Statistical properties for assembly of the testlets can be expressed using item response theory (IRT) parameters. The testlet test information function (TIF) value can be maximized at a specific point on the IRT ability scale. In practical MST designs, parallel versions of testlets are needed, so sets of testlets with equivalent properties are built according to equivalent specifications. In this project, the authors study the use of a mathematical programming technique to simultaneously assemble testlets to ensure equivalence and fairness to candidates who may be administered different testlets.

VL - 30 UR - http://apm.sagepub.com/content/30/3/204.abstract ER - TY - JOUR T1 - Overview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications JF - Medical Care Y1 - 2006 A1 - Teresi, J. A. KW - *Cross-Cultural Comparison KW - Data Interpretation, Statistical KW - Factor Analysis, Statistical KW - Guidelines as Topic KW - Humans KW - Models, Statistical KW - Psychometrics/*methods KW - Statistics as Topic/*methods KW - Statistics, Nonparametric AB - BACKGROUND: Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Discussed are parameters and statistics that are not invariant and cannot be compared validly in crosscultural studies with varying distributions of disability in contrast to those that can be compared (if the model assumptions are met) because they are produced by models such as linear and nonlinear regression. OBJECTIVES: The purpose of this overview is to provide an integrated approach to the quantitative methods used in this special issue to examine measurement equivalence. The methods include classical test theory (CTT), factor analytic, and parametric and nonparametric approaches to DIF detection. Also included in the quantitative section is a discussion of item banking and computerized adaptive testing (CAT). METHODS: Factorial invariance and the articles discussing this topic are introduced. A brief overview of the DIF methods presented in the quantitative section of the special issue is provided together with a discussion of ways in which DIF analyses and examination of invariance using factor models may be complementary. CONCLUSIONS: Although factor analytic and DIF detection methods share features, they provide unique information and can be viewed as complementary in informing about measurement equivalence. VL - 44 SN - 0025-7079 (Print)0025-7079 (Linking) N1 - Teresi, Jeanne AAG15294/AG/NIA NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tReviewUnited StatesMedical careMed Care. 2006 Nov;44(11 Suppl 3):S39-49. ER - TY - JOUR T1 - Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme JF - Clin Rehabil Y1 - 2006 A1 - Haley, S. M. A1 - Fragala-Pinkham, M. A. A1 - Ni, P. VL - 20 ER - TY - JOUR T1 - Sequential Computerized Mastery Tests—Three Simulation Studies JF - International Journal of Testing Y1 - 2006 A1 - Wiberg, Marie VL - 6 UR - http://www.tandfonline.com/doi/abs/10.1207/s15327574ijt0601_3 ER - TY - JOUR T1 - SIMCAT 1.0: A SAS computer program for simulating computer adaptive testing JF - Applied Psychological Measurement Y1 - 2006 A1 - Raîche, G. A1 - Blais, J-G. KW - computer adaptive testing KW - computer program KW - estimated proficiency level KW - Monte Carlo methodologies KW - Rasch logistic model AB - Monte Carlo methodologies are frequently applied to study the sampling distribution of the estimated proficiency level in adaptive testing. These methods eliminate real situational constraints. However, these Monte Carlo methodologies are not currently supported by the available software programs, and when these programs are available, their flexibility is limited. SIMCAT 1.0 is aimed at the simulation of adaptive testing sessions under different adaptive expected a posteriori (EAP) proficiency-level estimation methods (Blais & Raîche, 2005; Raîche & Blais, 2005) based on the one-parameter Rasch logistic model. These methods are all adaptive in the a priori proficiency-level estimation, the proficiency-level estimation bias correction, the integration interval, or a combination of these factors. The use of these adaptive EAP estimation methods diminishes considerably the shrinking, and therefore biasing, effect of the estimated a priori proficiency level encountered when this a priori is fixed at a constant value independently of the computed previous value of the proficiency level. SIMCAT 1.0 also computes empirical and estimated skewness and kurtosis coefficients, such as the standard error, of the estimated proficiency-level sampling distribution. In this way, the program allows one to compare empirical and estimated properties of the estimated proficiency-level sampling distribution under different variations of the EAP estimation method: standard error and bias, like the skewness and kurtosis coefficients. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 30 SN - 0146-6216 (Print) ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. A1 - Mioduski, J. A1 - Werenke, M. A1 - Stratford, P. VL - 59 ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Mioduski, J. E. A1 - Werneke, M. W. A1 - Stratford, P. W. KW - Back Pain Functional Scale KW - computerized adaptive testing KW - Item Response Theory KW - Lumbar spine KW - Rehabilitation KW - True-score equating AB - Objective: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (qIRT) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (qCAT). Methods: We performed a secondary analysis of retrospective intake rehabilitation data. Results: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed that was 72% more efficient than using all 25 items to estimate LFS measures. qIRT and qCAT measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. qCAT measures were as precise as qIRT measures. Conclusion: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity. VL - 59 ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Cook, K. F. A1 - Mioduski, J. E. A1 - Teal, C. R. A1 - Crane, P. K. KW - computerized adaptive testing KW - Flexilevel Scale of Shoulder Function KW - Item Response Theory KW - Rehabilitation AB -

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

VL - 59 IS - 3 ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Cook, K. F. A1 - Mioduski, J. E. A1 - Teal, C. R. A1 - Crane, P. K. KW - *Computer Simulation KW - *Range of Motion, Articular KW - Activities of Daily Living KW - Adult KW - Aged KW - Aged, 80 and over KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Prospective Studies KW - Reproducibility of Results KW - Research Support, N.I.H., Extramural KW - Research Support, U.S. Gov't, Non-P.H.S. KW - Shoulder Dislocation/*physiopathology/psychology/rehabilitation KW - Shoulder Pain/*physiopathology/psychology/rehabilitation KW - Shoulder/*physiopathology KW - Sickness Impact Profile KW - Treatment Outcome AB - BACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability. VL - 59 N1 - 0895-4356 (Print)Journal ArticleValidation Studies ER - TY - JOUR T1 - Técnicas para detectar patrones de respuesta atípicos [Aberrant patterns detection methods] JF - Anales de Psicología Y1 - 2006 A1 - Núñez, R. M. N. A1 - Pina, J. A. L. KW - aberrant patterns detection KW - Classical Test Theory KW - generalizability theory KW - Item Response KW - Item Response Theory KW - Mathematics KW - methods KW - person-fit KW - Psychometrics KW - psychometry KW - Test Validity KW - test validity analysis KW - Theory AB - La identificación de patrones de respuesta atípicos es de gran utilidad para la construcción de tests y de bancos de ítems con propiedades psicométricas así como para el análisis de validez de los mismos. En este trabajo de revisión se han recogido los más relevantes y novedosos métodos de ajuste de personas que se han elaborado dentro de cada uno de los principales ámbitos de trabajo de la Psicometría: el escalograma de Guttman, la Teoría Clásica de Tests (TCT), la Teoría de la Generalizabilidad (TG), la Teoría de Respuesta al Ítem (TRI), los Modelos de Respuesta al Ítem No Paramétricos (MRINP), los Modelos de Clase Latente de Orden Restringido (MCL-OR) y el Análisis de Estructura de Covarianzas (AEC).Aberrant patterns detection has a great usefulness in order to make tests and item banks with psychometric characteristics and validity analysis of tests and items. The most relevant and newest person-fit methods have been reviewed. All of them have been made in each one of main areas of Psychometry: Guttman's scalogram, Classical Test Theory (CTT), Generalizability Theory (GT), Item Response Theory (IRT), Non-parametric Response Models (NPRM), Order-Restricted Latent Class Models (OR-LCM) and Covariance Structure Analysis (CSA). VL - 22 SN - 0212-9728 N1 - Spain: Universidad de Murcia ER - TY - JOUR T1 - A testlet assembly design for the uniform CPA Examination JF - Applied Measurement in Education Y1 - 2006 A1 - Luecht, Richard A1 - Brumfield, Terry A1 - Breithaupt, Krista VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_2 ER - TY - THES T1 - Validitätssteigerungen durch adaptives Testen [Increasing validity by adaptive testing]. Y1 - 2006 A1 - Frey, A. ER - TY - CONF T1 - A variant of the progressive restricted item exposure control procedure in computerized adaptive testing systems based on the 3PL and the partial credit model T2 - Paper presented at the annual meetings of the American Educational Research Association Y1 - 2006 A1 - McClarty, L. K. A1 - Sperling, R. A1 - Dodd, B. G. JF - Paper presented at the annual meetings of the American Educational Research Association CY - San Francisco ER - TY - CHAP T1 - Adaptive orientation methods in computer adaptive testing Y1 - 2005 A1 - Economides, A. A. CY - Proceedings E-Learn 2005 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, pp. 1290-1295, Vancouver, Canada, AACE, October 2005. ER - TY - BOOK T1 - Adaptive selection of personality items to inform a neural network predicting job performance Y1 - 2005 A1 - Thissen-Roe, A. CY - Unpublished doctoral dissertation, University of Washington N1 - {PDF file, 488 KB} ER - TY - CHAP T1 - Applications of item response theory to improve health outcomes assessment: Developing item banks, linking instruments, and computer-adaptive testing T2 - Outcomes assessment in cancer Y1 - 2005 A1 - Hambleton, R. K. ED - C. C. Gotay ED - C. Snyder KW - Computer Assisted Testing KW - Health KW - Item Response Theory KW - Measurement KW - Test Construction KW - Treatment Outcomes AB - (From the chapter) The current chapter builds on Reise's introduction to the basic concepts, assumptions, popular models, and important features of IRT and discusses the applications of item response theory (IRT) modeling to health outcomes assessment. In particular, we highlight the critical role of IRT modeling in: developing an instrument to match a study's population; linking two or more instruments measuring similar constructs on a common metric; and creating item banks that provide the foundation for tailored short-form instruments or for computerized adaptive assessments. (PsycINFO Database Record (c) 2005 APA ) JF - Outcomes assessment in cancer PB - Cambridge University Press CY - Cambridge, UK N1 - Using Smart Source ParsingOutcomes assessment in cancer: Measures, methods, and applications. (pp. 445-464). New York, NY : Cambridge University Press. xiv, 662 pp ER - TY - JOUR T1 - Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. M. A1 - Raczek, A. E. A1 - Coster, W. J. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Computer Simulation KW - *Disability Evaluation KW - Adolescent KW - Child KW - Child, Preschool KW - Cross-Sectional Studies KW - Disabled Children/*rehabilitation KW - Female KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care)/*methods KW - Rehabilitation Centers KW - Rehabilitation/*standards KW - Sensitivity and Specificity AB - OBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time. VL - 86 SN - 0003-9993 (Print) N1 - Haley, Stephen MRaczek, Anastasia ECoster, Wendy JDumas, Helene MFragala-Pinkham, Maria AK02 hd45354-01a1/hd/nichdR43 hd42388-01/hd/nichdResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, P.H.S.United StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2005 May;86(5):932-9. ER - TY - JOUR T1 - Assessing Mobility in Children Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability Inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. A1 - Raczek, A. A1 - Coster, W. A1 - Dumas, H. A1 - Fragalapinkham, M. VL - 86 SN - 00039993 ER - TY - JOUR T1 - An Authoring Environment for Adaptive Testing JF - Educational Technology & Society Y1 - 2005 A1 - Guzmán, E A1 - Conejo, R A1 - García-Hervás, E KW - Adaptability KW - Adaptive Testing KW - Authoring environment KW - Item Response Theory AB -

SIETTE is a web-based adaptive testing system. It implements Computerized Adaptive Tests. These tests are tailor-made, theory-based tests, where questions shown to students, finalization of the test, and student knowledge estimation is accomplished adaptively. To construct these tests, SIETTE has an authoring environment comprising a suite of tools that helps teachers create questions and tests properly, and analyze students’ performance after taking a test. In this paper, we present this authoring environment in the
framework of adaptive testing. As will be shown, this set of visual tools, that contain some adaptable eatures, can be useful for teachers lacking skills in this kind of testing. Additionally, other systems that implement adaptive testing will be studied. 

VL - 8 IS - 3 ER - TY - JOUR T1 - Automated Simultaneous Assembly for Multistage Testing JF - International Journal of Testing Y1 - 2005 A1 - Breithaupt, Krista A1 - Ariel, Adelaide A1 - Veldkamp, Bernard P. VL - 5 UR - http://www.tandfonline.com/doi/abs/10.1207/s15327574ijt0503_8 ER - TY - JOUR T1 - A Bayesian student model without hidden nodes and its comparison with item response theory JF - International Journal of Artificial Intelligence in Education Y1 - 2005 A1 - Desmarais, M. C. A1 - Pu, X. KW - Bayesian Student Model KW - computer adaptive testing KW - hidden nodes KW - Item Response Theory AB - The Bayesian framework offers a number of techniques for inferring an individual's knowledge state from evidence of mastery of concepts or skills. A typical application where such a technique can be useful is Computer Adaptive Testing (CAT). A Bayesian modeling scheme, POKS, is proposed and compared to the traditional Item Response Theory (IRT), which has been the prevalent CAT approach for the last three decades. POKS is based on the theory of knowledge spaces and constructs item-to-item graph structures without hidden nodes. It aims to offer an effective knowledge assessment method with an efficient algorithm for learning the graph structure from data. We review the different Bayesian approaches to modeling student ability assessment and discuss how POKS relates to them. The performance of POKS is compared to the IRT two parameter logistic model. Experimental results over a 34 item Unix test and a 160 item French language test show that both approaches can classify examinees as master or non-master effectively and efficiently, with relatively comparable performance. However, more significant differences are found in favor of POKS for a second task that consists in predicting individual question item outcome. Implications of these results for adaptive testing and student modeling are discussed, as well as the limitations and advantages of POKS, namely the issue of integrating concepts into its structure. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - IOS Press: Netherlands VL - 15 SN - 1560-4292 (Print); 1560-4306 (Electronic) ER - TY - JOUR T1 - A closer look at using judgments of item difficulty to change answers on computerized adaptive tests JF - Journal of Educational Measurement Y1 - 2005 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. VL - 42 ER - TY - BOOK T1 - A comparison of adaptive mastery testing using testlets with the 3-parameter logistic model Y1 - 2005 A1 - Jacobs-Cassuto, M.S. CY - Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN ER - TY - JOUR T1 - A comparison of item-selection methods for adaptive tests with content constraints JF - Journal of Educational Measurement Y1 - 2005 A1 - van der Linden, W. J. VL - 42 ER - TY - JOUR T1 - A comparison of item-selection methods for adaptive tests with content constraints JF - Journal of Educational Measurement Y1 - 2005 A1 - van der Linden, W. J. KW - Adaptive Testing KW - Algorithms KW - content constraints KW - item selection method KW - shadow test approach KW - spiraling method KW - weighted deviations method AB - In test assembly, a fundamental difference exists between algorithms that select a test sequentially or simultaneously. Sequential assembly allows us to optimize an objective function at the examinee's ability estimate, such as the test information function in computerized adaptive testing. But it leads to the non-trivial problem of how to realize a set of content constraints on the test—a problem more naturally solved by a simultaneous item-selection method. Three main item-selection methods in adaptive testing offer solutions to this dilemma. The spiraling method moves item selection across categories of items in the pool proportionally to the numbers needed from them. Item selection by the weighted-deviations method (WDM) and the shadow test approach (STA) is based on projections of the future consequences of selecting an item. These two methods differ in that the former calculates a projection of a weighted sum of the attributes of the eventual test and the latter a projection of the test itself. The pros and cons of these methods are analyzed. An empirical comparison between the WDM and STA was conducted for an adaptive version of the Law School Admission Test (LSAT), which showed equally good item-exposure rates but violations of some of the constraints and larger bias and inaccuracy of the ability estimator for the WDM. PB - Blackwell Publishing: United Kingdom VL - 42 SN - 0022-0655 (Print) ER - TY - JOUR T1 - Computer adaptive testing JF - Journal of Applied Measurement Y1 - 2005 A1 - Gershon, R. C. KW - *Internet KW - *Models, Statistical KW - *User-Computer Interface KW - Certification KW - Health Surveys KW - Humans KW - Licensure KW - Microcomputers KW - Quality of Life AB - The creation of item response theory (IRT) and Rasch models, inexpensive accessibility to high speed desktop computers, and the growth of the Internet, has led to the creation and growth of computerized adaptive testing or CAT. This form of assessment is applicable for both high stakes tests such as certification or licensure exams, as well as health related quality of life surveys. This article discusses the historical background of CAT including its many advantages over conventional (typically paper and pencil) alternatives. The process of CAT is then described including descriptions of the specific differences of using CAT based upon 1-, 2- and 3-parameter IRT and various Rasch models. Numerous specific topics describing CAT in practice are described including: initial item selection, content balancing, test difficulty, test length and stopping rules. The article concludes with the author's reflections regarding the future of CAT. VL - 6 SN - 1529-7713 (Print) N1 - Gershon, Richard CReviewUnited StatesJournal of applied measurementJ Appl Meas. 2005;6(1):109-27. ER - TY - JOUR T1 - Computer adaptive testing JF - Journal of Applied Measurement Y1 - 2005 A1 - Gershon, R. C. VL - 6 ER - TY - JOUR T1 - A computer adaptive testing approach for assessing physical functioning in children and adolescents JF - Developmental Medicine and Child Neuropsychology Y1 - 2005 A1 - Haley, S. M. A1 - Ni, P. A1 - Fragala-Pinkham, M. A. A1 - Skrinar, A. M. A1 - Corzo, D. KW - *Computer Systems KW - Activities of Daily Living KW - Adolescent KW - Age Factors KW - Child KW - Child Development/*physiology KW - Child, Preschool KW - Computer Simulation KW - Confidence Intervals KW - Demography KW - Female KW - Glycogen Storage Disease Type II/physiopathology KW - Health Status Indicators KW - Humans KW - Infant KW - Infant, Newborn KW - Male KW - Motor Activity/*physiology KW - Outcome Assessment (Health Care)/*methods KW - Reproducibility of Results KW - Self Care KW - Sensitivity and Specificity AB - The purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning. VL - 47 SN - 0012-1622 (Print) N1 - Haley, Stephen MNi, PengshengFragala-Pinkham, Maria ASkrinar, Alison MCorzo, DeyaniraComparative StudyResearch Support, Non-U.S. Gov'tEnglandDevelopmental medicine and child neurologyDev Med Child Neurol. 2005 Feb;47(2):113-20. ER - TY - CHAP T1 - Computer adaptive testing quality requirements Y1 - 2005 A1 - Economides, A. A. CY - Proceedings E-Learn 2005 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, pp. 288-295, Vancouver, Canada, AACE, October 2005. ER - TY - JOUR T1 - A computer-assisted test design and diagnosis system for use by classroom teachers JF - Journal of Computer Assisted Learning Y1 - 2005 A1 - He, Q. A1 - Tymms, P. KW - Computer Assisted Testing KW - Computer Software KW - Diagnosis KW - Educational Measurement KW - Teachers AB - Computer-assisted assessment (CAA) has become increasingly important in education in recent years. A variety of computer software systems have been developed to help assess the performance of students at various levels. However, such systems are primarily designed to provide objective assessment of students and analysis of test items, and focus has been mainly placed on higher and further education. Although there are commercial professional systems available for use by primary and secondary educational institutions, such systems are generally expensive and require skilled expertise to operate. In view of the rapid progress made in the use of computer-based assessment for primary and secondary students by education authorities here in the UK and elsewhere, there is a need to develop systems which are economic and easy to use and can provide the necessary information that can help teachers improve students' performance. This paper presents the development of a software system that provides a range of functions including generating items and building item banks, designing tests, conducting tests on computers and analysing test results. Specifically, the system can generate information on the performance of students and test items that can be easily used to identify curriculum areas where students are under performing. A case study based on data collected from five secondary schools in Hong Kong involved in the Curriculum, Evaluation and Management Centre's Middle Years Information System Project, Durham University, UK, has been undertaken to demonstrate the use of the system for diagnostic and performance analysis. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) VL - 21 ER - TY - JOUR T1 - Computerized adaptive testing: a mixture item selection approach for constrained situations JF - British Journal of Mathematical and Statistical Psychology Y1 - 2005 A1 - Leung, C. K. A1 - Chang, Hua-Hua A1 - Hau, K. T. KW - *Computer-Aided Design KW - *Educational Measurement/methods KW - *Models, Psychological KW - Humans KW - Psychometrics/methods AB - In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (theta) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item-overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non-statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS-MI), integrating the MS and MI approaches, in testing with non-statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS-MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS-MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed. VL - 58 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Leung, Chi-KeungChang, Hua-HuaHau, Kit-TaiEnglandBr J Math Stat Psychol. 2005 Nov;58(Pt 2):239-57. ER - TY - JOUR T1 - Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, J. S. VL - 29 SN - 0146-6216 ER - TY - JOUR T1 - Computerized Adaptive Testing With the Partial Credit Model: Estimation Procedures, Population Distributions, and Item Pool Characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, Joanna S. A1 - Dodd, Barbara G. A1 - Fitzpatrick, Steven J. A1 - Shieh, Yann Yann AB -

The primary purpose of this research is to examine the impact of estimation methods, actual latent trait distributions, and item pool characteristics on the performance of a simulated computerized adaptive testing (CAT) system. In this study, three estimation procedures are compared for accuracy of estimation: maximum likelihood estimation (MLE), expected a priori (EAP), and Warm's weighted likelihood estimation (WLE). Some research has shown that MLE and EAP perform equally well under certain conditions in polytomous CAT systems, such that they match the actual latent trait distribution. However, little research has compared these methods when prior estimates of. distributions are extremely poor. In general, it appears that MLE, EAP, and WLE procedures perform equally well when using an optimal item pool. However, the use of EAP procedures may be advantageous under nonoptimal testing conditions when the item pool is not appropriately matched to the examinees.

VL - 29 UR - http://apm.sagepub.com/content/29/6/433.abstract ER - TY - JOUR T1 - Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, J. A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. A1 - Shieh, Y. Y. VL - 29 ER - TY - Generic T1 - Computerizing statewide assessments in Minnesota: A report on the feasibility of converting the Minnesota Comprehensive Assessments to a computerized adaptive format Y1 - 2005 A1 - Peterson, K.A. A1 - Davison. M. L. A1 - Hjelseth, L. PB - Office of Educational Accountability, College of Education and Human Development, University of Minnesota ER - TY - ABST T1 - Constraining item exposure in computerized adaptive testing with shadow tests Y1 - 2005 A1 - van der Linden, W. J. A1 - Veldkamp, B. P. CY - Law School Admission Council Computerized Testing Report 02-03 ER - TY - JOUR T1 - Constructing a Computerized Adaptive Test for University Applicants With Disabilities JF - Applied Measurement in Education Y1 - 2005 A1 - Moshinsky, Avital A1 - Kazin, Cathrael VL - 18 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1804_3 ER - TY - JOUR T1 - Contemporary measurement techniques for rehabilitation outcomes assessment JF - Journal of Rehabilitation Medicine Y1 - 2005 A1 - Jette, A. M. A1 - Haley, S. M. KW - *Disability Evaluation KW - Activities of Daily Living/classification KW - Disabled Persons/classification/*rehabilitation KW - Health Status Indicators KW - Humans KW - Outcome Assessment (Health Care)/*methods/standards KW - Recovery of Function KW - Research Support, N.I.H., Extramural KW - Research Support, U.S. Gov't, Non-P.H.S. KW - Sensitivity and Specificity computerized adaptive testing AB - In this article, we review the limitations of traditional rehabilitation functional outcome instruments currently in use within the rehabilitation field to assess Activity and Participation domains as defined by the International Classification of Function, Disability, and Health. These include a narrow scope of functional outcomes, data incompatibility across instruments, and the precision vs feasibility dilemma. Following this, we illustrate how contemporary measurement techniques, such as item response theory methods combined with computer adaptive testing methodology, can be applied in rehabilitation to design functional outcome instruments that are comprehensive in scope, accurate, allow for compatibility across instruments, and are sensitive to clinically important change without sacrificing their feasibility. Finally, we present some of the pressing challenges that need to be overcome to provide effective dissemination and training assistance to ensure that current and future generations of rehabilitation professionals are familiar with and skilled in the application of contemporary outcomes measurement. VL - 37 N1 - 1650-1977 (Print)Journal ArticleReview ER - TY - JOUR T1 - Controlling item exposure and test overlap in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2005 A1 - Chen, S-Y. A1 - Lei, P-W. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Content (Test) computerized adaptive testing AB - This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 29 ER - TY - JOUR T1 - Controlling item exposure and test overlap in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2005 A1 - Chen, S.Y. A1 - Lei, P. W. VL - 29(2) ER - TY - JOUR T1 - Controlling Item Exposure and Test Overlap in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2005 A1 - Chen, Shu-Ying A1 - Lei, Pui-Wa AB -

This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent.

VL - 29 UR - http://apm.sagepub.com/content/29/3/204.abstract ER - TY - JOUR T1 - Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer JF - Evaluation and the Health Professions Y1 - 2005 A1 - Eton, D. T. A1 - Lai, J. S. A1 - Cella, D. A1 - Reeve, B. B. A1 - Talcott, J. A. A1 - Clark, J. A. A1 - McPherson, C. P. A1 - Litwin, M. S. A1 - Moinpour, C. M. KW - *Quality of Life KW - *Questionnaires KW - Adult KW - Aged KW - Data Collection/methods KW - Humans KW - Intestine, Large/*physiopathology KW - Male KW - Middle Aged KW - Prostatic Neoplasms/*physiopathology KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Statistics, Nonparametric AB - Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. VL - 28 N1 - 0163-2787 (Print)Journal Article ER - TY - JOUR T1 - Design and evaluation of an XML-based platform-independent computerized adaptive testing system JF - IEEE Transactions on Education Y1 - 2005 A1 - Ho, R.-G., A1 - Yen, Y.-C. VL - 48(2) ER - TY - JOUR T1 - Development of a computer-adaptive test for depression (D-CAT) JF - Quality of Life Research Y1 - 2005 A1 - Fliege, H. A1 - Becker, J. A1 - Walter, O. B. A1 - Bjorner, J. B. A1 - Klapp, B. F. A1 - Rose, M. VL - 14 ER - TY - CHAP T1 - The development of the adaptive item language assessment (AILA) for mixed-ability students Y1 - 2005 A1 - Giouroglou, H. A1 - Economides, A. A. CY - Proceedings E-Learn 2005 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, 643-650, Vancouver, Canada, AACE, October 2005. ER - TY - JOUR T1 - Dynamic assessment of health outcomes: Time to let the CAT out of the bag? JF - Health Services Research Y1 - 2005 A1 - Cook, K. F. A1 - O'Malley, K. J. A1 - Roddey, T. S. KW - computer adaptive testing KW - Item Response Theory KW - self reported health outcomes AB - Background: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 40 SN - 0017-9124 (Print); 1475-6773 (Electronic) ER - TY - CONF T1 - The effectiveness of using multiple item pools in computerized adaptive testing T2 - Annual meeting of the National Council on Measurement in Education Y1 - 2005 A1 - Zhang, J. A1 - Chang, H. JF - Annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CHAP T1 - Features of the estimated sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules Y1 - 2005 A1 - Blais, J-G. A1 - Raîche, G. CY - D. G. Englehard (Eds.), Objective measurement: Theory into practice. Volume 6. ER - TY - CONF T1 - Identifying practical indices for enhancing item pool security T2 - Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) Y1 - 2005 A1 - Yi, Q. A1 - Zhang, J. A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) CY - Montreal, Canada ER - TY - CHAP T1 - An implemented theoretical framework for a common European foreign language adaptive assessment Y1 - 2005 A1 - Giouroglou, H. A1 - Economides, A. A. CY - Proceedings ICODL 2005, 3rd ternational Conference on Open and Distance Learning 'Applications of Pedagogy and Technology',339-350,Greek Open University, Patra, Greece ER - TY - ABST T1 - Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach Y1 - 2005 A1 - van der Linden, W. J. A1 - Chang, Hua-Hua CY - Law School Admission Council, Computerized Testing Report 01-09 ER - TY - JOUR T1 - Increasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests JF - Journal of Educational Measurement Y1 - 2005 A1 - Li, Y. H. A1 - Schafer, W. D. KW - algorithm KW - computerized adaptive testing KW - item exposure rate KW - shadow test KW - varied target function AB - A computerized adaptive testing (CAT) algorithm that has the potential to increase the homogeneity of CATs item-exposure rates without significantly sacrificing the precision of ability estimates was proposed and assessed in the shadow-test (van der Linden & Reese, 1998) CAT context. This CAT algorithm was formed by a combination of maximizing or minimizing varied target functions while assembling shadow tests. There were four target functions to be separately used in the first, second, third, and fourth quarter test of CAT. The elements to be used in the four functions were associated with (a) a random number assigned to each item, (b) the absolute difference between an examinee's current ability estimate and an item difficulty, (c) the absolute difference between an examinee's current ability estimate and an optimum item difficulty, and (d) item information. The results indicated that this combined CAT fully utilized all the items in the pool, reduced the maximum exposure rates, and achieved more homogeneous exposure rates. Moreover, its precision in recovering ability estimates was similar to that of the maximum item-information method. The combined CAT method resulted in the best overall results compared with the other individual CAT item-selection methods. The findings from the combined CAT are encouraging. Future uses are discussed. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 42 SN - 0022-0655 (Print) ER - TY - JOUR T1 - Infeasibility in automated test assembly models: A comparison study of different methods JF - Journal of Educational Measurement Y1 - 2005 A1 - Huitzing, H. A. A1 - Veldkamp, B. P. A1 - Verschoor, A. J. KW - Algorithms KW - Item Content (Test) KW - Models KW - Test Construction AB - Several techniques exist to automatically put together a test meeting a number of specifications. In an item bank, the items are stored with their characteristics. A test is constructed by selecting a set of items that fulfills the specifications set by the test assembler. Test assembly problems are often formulated in terms of a model consisting of restrictions and an objective to be maximized or minimized. A problem arises when it is impossible to construct a test from the item pool that meets all specifications, that is, when the model is not feasible. Several methods exist to handle these infeasibility problems. In this article, test assembly models resulting from two practical testing programs were reconstructed to be infeasible. These models were analyzed using methods that forced a solution (Goal Programming, Multiple-Goal Programming, Greedy Heuristic), that analyzed the causes (Relaxed and Ordered Deletion Algorithm (RODA), Integer Randomized Deletion Algorithm (IRDA), Set Covering (SC), and Item Sampling), or that analyzed the causes and used this information to force a solution (Irreducible Infeasible Set-Solver). Specialized methods such as the IRDA and the Irreducible Infeasible Set-Solver performed best. Recommendations about the use of different methods are given. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 42 ER - TY - JOUR T1 - An item bank was created to improve the measurement of cancer-related fatigue JF - Journal of Clinical Epidemiology Y1 - 2005 A1 - Lai, J-S. A1 - Cella, D. A1 - Dineen, K. A1 - Bode, R. A1 - Von Roenn, J. A1 - Gershon, R. C. A1 - Shevrin, D. KW - Adult KW - Aged KW - Aged, 80 and over KW - Factor Analysis, Statistical KW - Fatigue/*etiology/psychology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/*complications/psychology KW - Psychometrics KW - Questionnaires AB - OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. VL - 58 SN - 0895-4356 (Print)0895-4356 (Linking) N1 - Lai, Jin-SheiCella, DavidDineen, KellyBode, RitaVon Roenn, JamieGershon, Richard CShevrin, DanielEnglandJ Clin Epidemiol. 2005 Feb;58(2):190-7. ER - TY - JOUR T1 - [Item characteristic curve equating under graded response models in IRT] JF - Acta Psychologica Sinica Y1 - 2005 A1 - Jun, Z. A1 - Dongming, O. A1 - Shuyuan, X. A1 - Haiqi, D. A1 - Shuqing, Q. KW - graded response models KW - item characteristic curve KW - Item Response Theory AB - In one of the largest qualificatory tests--economist test, to guarantee the comparability among different years, construct item bank and prepare for computerized adaptive testing, item characteristic curve equating and anchor test equating design under graded models in IRT are used, which have realized the item and ability parameter equating of test data in five years and succeeded in establishing an item bank. Based on it, cut scores of different years are compared by equating and provide demonstrational gist to constitute the eligibility standard of economist test. PB - Science Press: China VL - 37 SN - 0439-755X (Print) ER - TY - JOUR T1 - Item response theory in computer adaptive testing: implications for outcomes measurement in rehabilitation JF - Rehabil Psychol Y1 - 2005 A1 - Ware, J. E A1 - Gandek, B. A1 - Sinclair, S. J. A1 - Bjorner, J. B. VL - 50 ER - TY - JOUR T1 - An item response theory-based pain item bank can enhance measurement precision JF - Journal of Pain and Symptom Management Y1 - 2005 A1 - Lai, J-S. A1 - Dineen, K. A1 - Reeve, B. B. A1 - Von Roenn, J. A1 - Shervin, D. A1 - McGuire, M. A1 - Bode, R. K. A1 - Paice, J. A1 - Cella, D. KW - computerized adaptive testing AB - Cancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice. VL - 30 N1 - 0885-3924Journal Article ER - TY - JOUR T1 - La Validez desde una óptica psicométrica [Validity from a psychometric perspective] JF - Acta Comportamentalia Y1 - 2005 A1 - Muñiz, J. KW - Factor Analysis KW - Measurement KW - Psychometrics KW - Scaling (Testing) KW - Statistical KW - Technology KW - Test Validity AB - El estudio de la validez constituye el eje central de los análisis psicométricos de los instrumentos de medida. En esta comunicación se traza una breve nota histórica de los distintos modos de concebir la validez a lo largo de los tiempos, se comentan las líneas actuales, y se tratan de vislumbrar posibles vías futuras, teniendo en cuenta el impacto que las nuevas tecnologías informáticas están ejerciendo sobre los propios instrumentos de medida en Psicología y Educación. Cuestiones como los nuevos formatos multimedia de los ítems, la evaluación a distancia, el uso intercultural de las pruebas, las consecuencias de su uso, o los tests adaptativos informatizados, reclaman nuevas formas de evaluar y conceptualizar la validez. También se analizan críticamente algunos planteamientos recientes sobre el concepto de validez. The study of validity constitutes a central axis of psychometric analyses of measurement instruments. This paper presents a historical sketch of different modes of conceiving validity, with commentary on current views, and it attempts to predict future lines of research by considering the impact of new computerized technologies on measurement instruments in psychology and education. Factors such as the new multimedia format of items, distance assessment, the intercultural use of tests, the consequences of the latter, or the development of computerized adaptive tests demand new ways of conceiving and evaluating validity. Some recent thoughts about the concept of validity are also critically analyzed. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 13 ER - TY - JOUR T1 - Measuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach JF - American Journal of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Siebens, H. A1 - Andres, P. L. A1 - Pengsheng, N. A1 - Coster, W. J. A1 - Haley, S. M. KW - Activities of Daily Living/*classification KW - Adult KW - Aged KW - Cohort Studies KW - Continuity of Patient Care KW - Disability Evaluation KW - Female KW - Health Services Research KW - Humans KW - Male KW - Middle Aged KW - Postoperative Care/*rehabilitation KW - Prognosis KW - Recovery of Function KW - Rehabilitation Centers KW - Rehabilitation/*standards KW - Sensitivity and Specificity KW - Sickness Impact Profile KW - Treatment Outcome AB - OBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions. VL - 84 N1 - 0894-9115 (Print)Comparative StudyJournal ArticleResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, P.H.S. ER - TY - JOUR T1 - Monte Carlo Test Assembly for Item Pool Analysis and Extension JF - Applied Psychological Measurement Y1 - 2005 A1 - Belov, Dmitry I. A1 - Armstrong, Ronald D. AB -

A new test assembly algorithm based on a Monte Carlo random search is presented in this article. A major advantage of the Monte Carlo test assembly over other approaches (integer programming or enumerative heuristics) is that it performs a uniform sampling from the item pool, which provides every feasible item combination (test) with an equal chance of being built during an assembly. This allows the authors to address the following issues of pool analysis and extension: compare the strengths and weaknesses of different pools, identify the most restrictive constraint(s) for test assembly, and identify properties of the items that should be added to a pool to achieve greater usability of the pool. Computer experiments with operational pools are given.

VL - 29 UR - http://apm.sagepub.com/content/29/4/239.abstract ER - TY - CHAP T1 - Personalized feedback in CAT Y1 - 2005 A1 - Economides, A. A. CY - WSEAS Transactions on Advances in Engineering Education, Issue 3, Volume 2,174-181, July 2005. ER - TY - JOUR T1 - The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes JF - Clinical and Experimental Rheumatology Y1 - 2005 A1 - Fries, J.F. A1 - Bruce, B. A1 - Cella, D. KW - computerized adaptive testing AB - PROMIS (Patient-Reported-Outcomes Measurement Information System) is an NIH Roadmap network project intended to improve the reliability, validity, and precision of PROs and to provide definitive new instruments that will exceed the capabilities of classic instruments and enable improved outcome measurement for clinical research across all NIH institutes. Item response theory (IRT) measurement models now permit us to transition conventional health status assessment into an era of item banking and computerized adaptive testing (CAT). Item banking uses IRT measurement models and methods to develop item banks from large pools of items from many available questionnaires. IRT allows the reduction and improvement of items and assembles domains of items which are unidimensional and not excessively redundant. CAT provides a model-driven algorithm and software to iteratively select the most informative remaining item in a domain until a desired degree of precision is obtained. Through these approaches the number of patients required for a clinical trial may be reduced while holding statistical power constant. PROMIS tools, expected to improve precision and enable assessment at the individual patient level which should broaden the appeal of PROs, will begin to be available to the general medical community in 2008. VL - 23 ER - TY - JOUR T1 - Propiedades psicométricas de un test Adaptativo Informatizado para la medición del ajuste emocional [Psychometric properties of an Emotional Adjustment Computerized Adaptive Test] JF - Psicothema Y1 - 2005 A1 - Aguado, D. A1 - Rubio, V. J. A1 - Hontangas, P. M. A1 - Hernández, J. M. KW - Computer Assisted Testing KW - Emotional Adjustment KW - Item Response KW - Personality Measures KW - Psychometrics KW - Test Validity KW - Theory AB - En el presente trabajo se describen las propiedades psicométricas de un Test Adaptativo Informatizado para la medición del ajuste emocional de las personas. La revisión de la literatura acerca de la aplicación de los modelos de la teoría de la respuesta a los ítems (TRI) muestra que ésta se ha utilizado más en el trabajo con variables aptitudinales que para la medición de variables de personalidad, sin embargo diversos estudios han mostrado la eficacia de la TRI para la descripción psicométrica de dichasvariables. Aun así, pocos trabajos han explorado las características de un Test Adaptativo Informatizado, basado en la TRI, para la medición de una variable de personalidad como es el ajuste emocional. Nuestros resultados muestran la eficiencia del TAI para la evaluación del ajuste emocional, proporcionando una medición válida y precisa, utilizando menor número de elementos de medida encomparación con las escalas de ajuste emocional de instrumentos fuertemente implantados. Psychometric properties of an emotional adjustment computerized adaptive test. In the present work it was described the psychometric properties of an emotional adjustment computerized adaptive test. An examination of Item Response Theory (IRT) research literature indicates that IRT has been mainly used for assessing achievements and ability rather than personality factors. Nevertheless last years have shown several studies wich have successfully used IRT to personality assessment instruments. Even so, a few amount of works has inquired the computerized adaptative test features, based on IRT, for the measurement of a personality traits as it’s the emotional adjustment. Our results show the CAT efficiency for the emotional adjustment assessment so this provides a valid and accurate measurement; by using a less number of items in comparison with the emotional adjustment scales from the most strongly established questionnaires. VL - 17 ER - TY - JOUR T1 - A Randomized Experiment to Compare Conventional, Computerized, and Computerized Adaptive Administration of Ordinal Polytomous Attitude Items JF - Applied Psychological Measurement Y1 - 2005 A1 - Hol, A. Michiel A1 - Vorst, Harrie C. M. A1 - Mellenbergh, Gideon J. AB -

A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima’s graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule.

VL - 29 UR - http://apm.sagepub.com/content/29/3/159.abstract ER - TY - JOUR T1 - A randomized experiment to compare conventional, computerized, and computerized adaptive administration of ordinal polytomous attitude items JF - Applied Psychological Measurement Y1 - 2005 A1 - Hol, A. M. A1 - Vorst, H. C. M. A1 - Mellenbergh, G. J. KW - Computer Assisted Testing KW - Test Administration KW - Test Items AB - A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima's graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 29 ER - TY - RPRT T1 - Recent trends in comparability studies Y1 - 2005 A1 - Paek, P. KW - computer adaptive testing KW - Computerized assessment KW - differential item functioning KW - Mode effects JF - PEM Research Report 05-05 PB - Pearson SN - 05-05 ER - TY - CONF T1 - Rescuing CAT by fixing the problems T2 - National Council on Measurement in Education Y1 - 2005 A1 - Chang, S-H. A1 - Zhang, J. JF - National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments JF - Journal of Clinical Epidemiology Y1 - 2005 A1 - Hart, D. L. A1 - Mioduski, J. E. A1 - Stratford, P. W. KW - *Health Status Indicators KW - Activities of Daily Living KW - Adolescent KW - Adult KW - Aged KW - Aged, 80 and over KW - Ankle Joint/physiopathology KW - Diagnosis, Computer-Assisted/*methods KW - Female KW - Hip Joint/physiopathology KW - Humans KW - Joint Diseases/physiopathology/*rehabilitation KW - Knee Joint/physiopathology KW - Lower Extremity/*physiopathology KW - Male KW - Middle Aged KW - Research Support, N.I.H., Extramural KW - Research Support, U.S. Gov't, P.H.S. KW - Retrospective Studies AB - BACKGROUND AND OBJECTIVE: To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (theta(IRT)) and measures generated using the simulated CATs (theta(CAT)). METHODS: Secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. theta(IRT) and theta(CAT) measures discriminated patients by symptom acuity, age, and surgical history in similar ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity. VL - 58 N1 - 0895-4356 (Print)Journal ArticleMulticenter StudyValidation Studies ER - TY - JOUR T1 - Somministrazione di test computerizzati di tipo adattivo: Un' applicazione del modello di misurazione di Rasch [Administration of computerized and adaptive tests: An application of the Rasch Model] JF - Testing Psicometria Metodologia Y1 - 2005 A1 - Miceli, R. A1 - Molinengo, G. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Response Theory computerized adaptive testing KW - Models KW - Psychometrics AB - The aim of the present study is to describe the characteristics of a procedure for administering computerized and adaptive tests (Computer Adaptive Testing or CAT). Items to be asked to the individuals are interactively chosen and are selected from a "bank" in which they were previously calibrated and recorded on the basis of their difficulty level. The selection of items is performed by increasingly more accurate estimates of the examinees' ability. The building of an item-bank on Psychometrics and the implementation of this procedure allow a first validation through Monte Carlo simulations. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) VL - 12 ER - TY - ABST T1 - Strategies for controlling item exposure in computerized adaptive testing with the partial credit model Y1 - 2005 A1 - Davis, L. L. A1 - Dodd, B. CY - Pearson Educational Measurement Research Report 05-01 ER - TY - JOUR T1 - Test construction for cognitive diagnosis JF - Applied Psychological Measurement Y1 - 2005 A1 - Henson, R. K. A1 - Douglas, J. KW - (Measurement) KW - Cognitive Assessment KW - Item Analysis (Statistical) KW - Profiles KW - Test Construction KW - Test Interpretation KW - Test Items AB - Although cognitive diagnostic models (CDMs) can be useful in the analysis and interpretation of existing tests, little has been developed to specify how one might construct a good test using aspects of the CDMs. This article discusses the derivation of a general CDM index based on Kullback-Leibler information that will serve as a measure of how informative an item is for the classification of examinees. The effectiveness of the index is examined for items calibrated using the deterministic input noisy "and" gate model (DINA) and the reparameterized unified model (RUM) by implementing a simple heuristic to construct a test from an item bank. When compared to randomly constructed tests from the same item bank, the heuristic shows significant improvement in classification rates. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 29 ER - TY - JOUR T1 - Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire JF - Alcoholism: Clinical & Experimental Research Y1 - 2005 A1 - Kahler, C. W. A1 - Strong, D. R. A1 - Read, J. P. A1 - De Boeck, P. A1 - Wilson, M. A1 - Acton, G. S. A1 - Palfai, T. P. A1 - Wood, M. D. A1 - Mehta, P. D. A1 - Neale, M. C. A1 - Flay, B. R. A1 - Conklin, C. A. A1 - Clayton, R. R. A1 - Tiffany, S. T. A1 - Shiffman, S. A1 - Krueger, R. F. A1 - Nichol, P. E. A1 - Hicks, B. M. A1 - Markon, K. E. A1 - Patrick, C. J. A1 - Iacono, William G. A1 - McGue, Matt A1 - Langenbucher, J. W. A1 - Labouvie, E. A1 - Martin, C. S. A1 - Sanjuan, P. M. A1 - Bavly, L. A1 - Kirisci, L. A1 - Chung, T. A1 - Vanyukov, M. A1 - Dunn, M. A1 - Tarter, R. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. A1 - Watt, M. KW - Psychometrics KW - Substance-Related Disorders AB - Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association VL - 29 N1 - MiscellaneousArticleMiscellaneous Article ER - TY - JOUR T1 - Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics JF - Applied Psychological Measurement Y1 - 2005 A1 - Li, Y. H. AB - Under a multidimensional item response theory (MIRT) computerized adaptive testing (CAT) testing scenario, a trait estimate (θ) in onedimension will provide clues for subsequentlyseeking a solution in other dimensions. Thisfeature may enhance the efficiency of MIRT CAT’s item selection and its scoring algorithms compared with its counterpart, the unidimensional CAT (UCAT). The present study used existing Reading and Math test data to generate simulated item parameters. A confirmatory item factor analysis model was applied to the data using NOHARM to produce interpretable MIRT item parameters. Results showed that MIRT CAT, conditional on theconstraints, was quite capable of producing accurate estimates on both measures. Compared with UCAT, MIRT CAT slightly increased the accuracy of both trait estimates, especially for the low-level or high-level trait examinees in both measures, and reduced the rate of unused items in the item pool. Index terms: computerized adaptive testing (CAT), item response theory (IRT), dimensionality, 0-1 linear programming, constraints, item exposure, reading assessment, mathematics assessment. VL - 29 SN - 0146-6216 ER - TY - JOUR T1 - Trait Parameter Recovery Using Multidimensional Computerized Adaptive Testing in Reading and Mathematics JF - Applied Psychological Measurement Y1 - 2005 A1 - Li, Yuan H. A1 - Schafer, William D. AB -

Under a multidimensional item response theory (MIRT) computerized adaptive testing (CAT) testing scenario, a trait estimate (θ) in one dimension will provide clues for subsequently seeking a solution in other dimensions. This feature may enhance the efficiency of MIRT CAT’s item selection and its scoring algorithms compared with its counterpart, the unidimensional CAT (UCAT). The present study used existing Reading and Math test data to generate simulated item parameters. A confirmatory item factor analysis model was applied to the data using NOHARM to produce interpretable MIRT item parameters. Results showed that MIRT CAT, conditional on the constraints, was quite capable of producing accurate estimates on both measures. Compared with UCAT, MIRT CAT slightly increased the accuracy of both trait estimates, especially for the low-level or high-level trait examinees in both measures, and reduced the rate of unused items in the item pool.

VL - 29 UR - http://apm.sagepub.com/content/29/1/3.abstract ER - TY - RPRT T1 - The use of person-fit statistics in computerized adaptive testing Y1 - 2005 A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. L. A. JF - LSAC Research Report Series PB - Law School Administration Council CY - Newton, PA. USA SN - Computerized Testing Report 97-14 ER - TY - JOUR T1 - Validation of a computerized adaptive testing version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) JF - Psychological Assessment Y1 - 2005 A1 - Simms, L. J., A1 - Clark, L. A. AB - This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting. VL - 17(1) ER - TY - JOUR T1 - Validation of a computerized adaptive version of the Schedule of Non-Adaptive and Adaptive Personality (SNAP) JF - Psychological Assessment Y1 - 2005 A1 - Simms, L. J. A1 - Clark, L.J. AB - This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting. VL - 17 ER - TY - CHAP T1 - The ABCs of Computerized Adaptive Testing Y1 - 2004 A1 - Gershon, R. C. CY - T. M. Wood and W. Zhi (Eds.), Measurement issues and practice in physical activity. Champaign, IL: Human kinetics. ER - TY - CONF T1 - Achieving accuracy of retest calibration for a national CAT placement examination with a restricted test length T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Wang, X. B. A1 - Wiley, A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - #WA04-01 {PDF file, 837 KB} ER - TY - JOUR T1 - Activity outcome measurement for postacute care JF - Medical Care Y1 - 2004 A1 - Haley, S. M. A1 - Coster, W. J. A1 - Andres, P. L. A1 - Ludlow, L. H. A1 - Ni, P. A1 - Bond, T. L. A1 - Sinclair, S. J. A1 - Jette, A. M. KW - *Self Efficacy KW - *Sickness Impact Profile KW - Activities of Daily Living/*classification/psychology KW - Adult KW - Aftercare/*standards/statistics & numerical data KW - Aged KW - Boston KW - Cognition/physiology KW - Disability Evaluation KW - Factor Analysis, Statistical KW - Female KW - Human KW - Male KW - Middle Aged KW - Movement/physiology KW - Outcome Assessment (Health Care)/*methods/statistics & numerical data KW - Psychometrics KW - Questionnaires/standards KW - Rehabilitation/*standards/statistics & numerical data KW - Reproducibility of Results KW - Sensitivity and Specificity KW - Support, U.S. Gov't, Non-P.H.S. KW - Support, U.S. Gov't, P.H.S. AB - BACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings. VL - 42 N1 - 0025-7079Journal ArticleMulticenter Study ER - TY - CHAP T1 - Adaptive computerized educational systems: A case study T2 - Evidence-based educational methods Y1 - 2004 A1 - Ray, R. D. ED - R. W. Malott KW - Artificial KW - Computer Assisted Instruction KW - Computer Software KW - Higher Education KW - Individualized KW - Instruction KW - Intelligence KW - Internet KW - Undergraduate Education AB - (Created by APA) Adaptive instruction describes adjustments typical of one-on-one tutoring as discussed in the college tutorial scenario. So computerized adaptive instruction refers to the use of computer software--almost always incorporating artificially intelligent services--which has been designed to adjust both the presentation of information and the form of questioning to meet the current needs of an individual learner. This chapter describes a system for Internet-delivered adaptive instruction. The author attempts to demonstrate a sharp difference between the teaching that takes place outside of the classroom in universities and the kind that is at least afforded, if not taken advantage of by many, students in a more personalized educational setting such as those in the small liberal arts colleges. The author describes a computer-based technology that allows that gap to be bridged with the advantage of at least having more highly prepared learners sitting in college classrooms. A limited range of emerging research that supports that proposition is cited. (PsycINFO Database Record (c) 2005 APA ) JF - Evidence-based educational methods T3 - Educational Psychology Series PB - Elsevier Academic Press CY - San Diego, CA. USA N1 - Using Smart Source ParsingEvidence-based educational methods. A volume in the educational psychology series. (pp. 143-170). San Diego, CA : Elsevier Academic Press, [URL:http://www.academicpress.com]. xxiv, 382 pp ER - TY - JOUR T1 - Adaptive exploration of user knowledge in computer based testing JF - WSEAS Transactions on Communications Y1 - 2004 A1 - Lamboudis, D. A1 - Economides, A. A. VL - 3 (1) ER - TY - JOUR T1 - Adaptive Testing With Regression Trees in the Presence of Multidimensionality JF - Journal of Educational and Behavioral Statistics Y1 - 2004 A1 - Yan, Duanli A1 - Lewis, Charles A1 - Stocking, Martha AB -

It is unrealistic to suppose that standard item response theory (IRT) models will be appropriate for all the new and currently considered computer-based tests. In addition to developing new models, we also need to give attention to the possibility of constructing and analyzing new tests without the aid of strong models. Computerized adaptive testing currently relies heavily on IRT. Alternative, empirically based, nonparametric adaptive testing algorithms exist, but their properties are little known. This article introduces a nonparametric, tree-based algorithm for adaptive testing and shows that it may be superior to conventional, IRT-based adaptive testing in cases where the IRT assumptions are not satisfied. In particular, it shows that the tree-based approach clearly outperformed (one-dimensional) IRT when the pool was strongly two-dimensional.

VL - 29 UR - http://jeb.sagepub.com/cgi/content/abstract/29/3/293 ER - TY - ABST T1 - The AMC Linear Disability Score project in a population requiring residential care: psychometric properties Y1 - 2004 A1 - Holman, R. A1 - Lindeboom, R. A1 - Vermeulen, M. A1 - de Haan, R. J. KW - *Disability Evaluation KW - *Health Status Indicators KW - Activities of Daily Living/*classification KW - Adult KW - Aged KW - Aged, 80 and over KW - Data Collection/methods KW - Female KW - Humans KW - Logistic Models KW - Male KW - Middle Aged KW - Netherlands KW - Pilot Projects KW - Probability KW - Psychometrics/*instrumentation KW - Questionnaires/standards KW - Residential Facilities/*utilization KW - Severity of Illness Index AB - BACKGROUND: Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. METHOD: This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items. RESULTS: Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items). CONCLUSIONS: It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status. JF - Health and Quality of Life Outcomes VL - 2 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Holman, RebeccaLindeboom, RobertVermeulen, Marinusde Haan, Rob JResearch Support, Non-U.S. Gov'tValidation StudiesEnglandHealth and quality of life outcomesHealth Qual Life Outcomes. 2004 Aug 3;2:42. U2 - 514531 ER - TY - BOOK T1 - The application of cognitive diagnosis and computerized adaptive testing to a large-scale assessment Y1 - 2004 A1 - McGlohen, MK CY - Unpublished doctoral dissertation, University of Texas at Austin ER - TY - JOUR T1 - Assisted self-adapted testing: A comparative study JF - European Journal of Psychological Assessment Y1 - 2004 A1 - Hontangas, P. A1 - Olea, J. A1 - Ponsoda, V. A1 - Revuelta, J. A1 - Wise, S. L. KW - Adaptive Testing KW - Anxiety KW - Computer Assisted Testing KW - Psychometrics KW - Test AB - A new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 20 ER - TY - CONF T1 - Automated Simultaneous Assembly of Multi-Stage Testing for the Uniform CPA Examination T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Breithaupt, K A1 - Ariel, A. A1 - Veldkamp, B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 201 KB} ER - TY - CONF T1 - Combining computer adaptive testing technology with cognitively diagnostic assessment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - McGlohen, MK A1 - Chang, Hua-Hua A1 - Wills, J. T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 782 KB} ER - TY - JOUR T1 - Computer adaptive testing: a strategy for monitoring stroke rehabilitation across settings JF - Stroke Rehabilitation Y1 - 2004 A1 - Andres, P. L. A1 - Black-Schaffer, R. M. A1 - Ni, P. A1 - Haley, S. M. KW - *Computer Simulation KW - *User-Computer Interface KW - Adult KW - Aged KW - Aged, 80 and over KW - Cerebrovascular Accident/*rehabilitation KW - Disabled Persons/*classification KW - Female KW - Humans KW - Male KW - Middle Aged KW - Monitoring, Physiologic/methods KW - Severity of Illness Index KW - Task Performance and Analysis AB - Current functional assessment instruments in stroke rehabilitation are often setting-specific and lack precision, breadth, and/or feasibility. Computer adaptive testing (CAT) offers a promising potential solution by providing a quick, yet precise, measure of function that can be used across a broad range of patient abilities and in multiple settings. CAT technology yields a precise score by selecting very few relevant items from a large and diverse item pool based on each individual's responses. We demonstrate the potential usefulness of a CAT assessment model with a cross-sectional sample of persons with stroke from multiple rehabilitation settings. VL - 11 SN - 1074-9357 (Print) N1 - Andres, Patricia LBlack-Schaffer, Randie MNi, PengshengHaley, Stephen MR01 hd43568/hd/nichdEvaluation StudiesResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.United StatesTopics in stroke rehabilitationTop Stroke Rehabil. 2004 Spring;11(2):33-9. ER - TY - CONF T1 - Computer adaptive testing and the No Child Left Behind Act T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2004 A1 - Kingsbury, G. G. A1 - Hauser, C. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Diego CA N1 - {PDF file, 117 KB} ER - TY - CHAP T1 - Computer-adaptive testing Y1 - 2004 A1 - Luecht, RM CY - B. Everett, and D. Howell (Eds.), Encyclopedia of statistics in behavioral science. New York: Wiley. ER - TY - ABST T1 - Computer-based test designs with optimal and non-optimal tests for making pass-fail decisions Y1 - 2004 A1 - Hambleton, R. K. A1 - Xing, D. CY - Research Report, University of Massachusetts, Amherst, MA ER - TY - JOUR T1 - A computerized adaptive knowledge test as an assessment tool in general practice: a pilot study JF - Medical Teacher Y1 - 2004 A1 - Roex, A. A1 - Degryse, J. KW - *Computer Systems KW - Algorithms KW - Educational Measurement/*methods KW - Family Practice/*education KW - Humans KW - Pilot Projects AB - Advantageous to assessment in many fields, CAT (computerized adaptive testing) use in general practice has been scarce. In adapting CAT to general practice, the basic assumptions of item response theory and the case specificity must be taken into account. In this context, this study first evaluated the feasibility of converting written extended matching tests into CAT. Second, it questioned the content validity of CAT. A stratified sample of students was invited to participate in the pilot study. The items used in this test, together with their parameters, originated from the written test. The detailed test paths of the students were retained and analysed thoroughly. Using the predefined pass-fail standard, one student failed the test. There was a positive correlation between the number of items and the candidate's ability level. The majority of students were presented with questions in seven of the 10 existing domains. Although proved to be a feasible test format, CAT cannot substitute for the existing high-stakes large-scale written test. It may provide a reliable instrument for identifying candidates who are at risk of failing in the written test. VL - 26 N1 - 0142-159xJournal Article ER - TY - JOUR T1 - Computerized adaptive measurement of depression: A simulation study JF - BMC Psychiatry Y1 - 2004 A1 - Gardner, W. A1 - Shear, K. A1 - Kelleher, K. J. A1 - Pajer, K. A. A1 - Mammen, O. A1 - Buysse, D. A1 - Frank, E. KW - *Computer Simulation KW - Adult KW - Algorithms KW - Area Under Curve KW - Comparative Study KW - Depressive Disorder/*diagnosis/epidemiology/psychology KW - Diagnosis, Computer-Assisted/*methods/statistics & numerical data KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Internet KW - Male KW - Mass Screening/methods KW - Patient Selection KW - Personality Inventory/*statistics & numerical data KW - Pilot Projects KW - Prevalence KW - Psychiatric Status Rating Scales/*statistics & numerical data KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Research Support, U.S. Gov't, P.H.S. KW - Severity of Illness Index KW - Software AB - Background: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity. VL - 4 ER - TY - CHAP T1 - Computerized adaptive testing Y1 - 2004 A1 - Segall, D. O. CY - Encyclopedia of social measurement. Academic Press. N1 - {PDF file, 180 KB} ER - TY - CHAP T1 - Computerized adaptive testing and item banking Y1 - 2004 A1 - Bjorner, J. B. A1 - Kosinski, M. A1 - Ware, J. E A1 - Jr. CY - P. M. Fayers and R. D. Hays (Eds.) Assessing Quality of Life. Oxford: Oxford University Press. N1 - {PDF file 371 KB} ER - TY - JOUR T1 - Computerized Adaptive Testing for Effective and Efficient Measurement in Counseling and Education JF - Measurement and Evaluation in Counseling and Development Y1 - 2004 A1 - Weiss, D. J. VL - 37 ER - TY - JOUR T1 - Computerized adaptive testing with multiple-form structures JF - Applied Psychological Measurement Y1 - 2004 A1 - Armstrong, R. D. A1 - Jones, D. H. A1 - Koppel, N. B. A1 - Pashley, P. J. KW - computerized adaptive testing KW - Law School Admission Test KW - multiple-form structure KW - testlets AB - A multiple-form structure (MFS) is an ordered collection or network of testlets (i.e., sets of items). An examinee's progression through the network of testlets is dictated by the correctness of an examinee's answers, thereby adapting the test to his or her trait level. The collection of paths through the network yields the set of all possible test forms, allowing test specialists the opportunity to review them before they are administered. Also, limiting the exposure of an individual MFS to a specific period of time can enhance test security. This article provides an overview of methods that have been developed to generate parallel MFSs. The approach is applied to the assembly of an experimental computerized Law School Admission Test (LSAT). (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 28 SN - 0146-6216 (Print) ER - TY - JOUR T1 - Computerized Adaptive Testing With Multiple-Form Structures JF - Applied Psychological Measurement Y1 - 2004 A1 - Armstrong, Ronald D. A1 - Jones, Douglas H. A1 - Koppel, Nicole B. A1 - Pashley, Peter J. AB -

A multiple-form structure (MFS) is an orderedcollection or network of testlets (i.e., sets of items).An examinee’s progression through the networkof testlets is dictated by the correctness of anexaminee’s answers, thereby adapting the test tohis or her trait level. The collection of pathsthrough the network yields the set of all possibletest forms, allowing test specialists the opportunityto review them before they are administered. Also,limiting the exposure of an individual MFS to aspecific period of time can enhance test security.This article provides an overview of methods thathave been developed to generate parallel MFSs.The approach is applied to the assembly of anexperimental computerized Law School Admission Test (LSAT).

VL - 28 UR - http://apm.sagepub.com/content/28/3/147.abstract ER - TY - JOUR T1 - Computers in clinical assessment: Historical developments, present status, and future challenges JF - Journal of Clinical Psychology Y1 - 2004 A1 - Butcher, J. N. A1 - Perry, J. L. A1 - Hahn, J. A. KW - clinical assessment KW - computerized testing method KW - Internet KW - psychological assessment services AB - Computerized testing methods have long been regarded as a potentially powerful asset for providing psychological assessment services. Ever since computers were first introduced and adapted to the field of assessment psychology in the 1950s, they have been a valuable aid for scoring, data processing, and even interpretation of test results. The history and status of computer-based personality and neuropsychological tests are discussed in this article. Several pertinent issues involved in providing test interpretation by computer are highlighted. Advances in computer-based test use, such as computerized adaptive testing, are described and problems noted. Today, there is great interest in expanding the availability of psychological assessment applications on the Internet. Although these applications show great promise, there are a number of problems associated with providing psychological tests on the Internet that need to be addressed by psychologists before the Internet can become a major medium for psychological service delivery. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - John Wiley & Sons: US VL - 60 SN - 0021-9762 (Print); 1097-4679 (Electronic) ER - TY - JOUR T1 - Constraining Item Exposure in Computerized Adaptive Testing With Shadow Tests JF - Journal of Educational and Behavioral Statistics Y1 - 2004 A1 - van der Linden, Wim J. A1 - Veldkamp, Bernard P. AB -

Item-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter’s (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator.

VL - 29 UR - http://jeb.sagepub.com/cgi/content/abstract/29/3/273 ER - TY - JOUR T1 - Constraining item exposure in computerized adaptive testing with shadow tests JF - Journal of Educational and Behavioral Statistics Y1 - 2004 A1 - van der Linden, W. J. A1 - Veldkamp, B. P. KW - computerized adaptive testing KW - item exposure control KW - item ineligibility constraints KW - Probability KW - shadow tests AB - Item-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter’s (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator. PB - American Educational Research Assn: US VL - 29 SN - 1076-9986 (Print) ER - TY - JOUR T1 - Constructing rotating item pools for constrained adaptive testing JF - Journal of Educational Measurement Y1 - 2004 A1 - Ariel, A. A1 - Veldkamp, B. P. A1 - van der Linden, W. J. KW - computerized adaptive tests KW - constrained adaptive testing KW - item exposure KW - rotating item pools AB - Preventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools, which are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. A test assembly model, motivated by Gulliksen's matched random subtests method, was explored to help solve the problem of dividing a master pool into a set of smaller pools. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 41 SN - 0022-0655 (Print) ER - TY - CONF T1 - The context effects of multidimensional CAT on the accuracy of multidimensional abilities and the item exposure rates T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2004 A1 - Li, Y. H. A1 - Schafer, W. D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Diego CA N1 - {Incomplete PDF file, 202 KB} ER - TY - BOOK T1 - Contributions to the theory and practice of computerized adaptive testing Y1 - 2004 A1 - Theo Eggen CY - Arnhem, The Netherlands: Citogroep ER - TY - CONF T1 - Detecting exposed test items in computer-based testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Han, N. A1 - Hambleton, R. K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 1.245 MB} ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory Y1 - 2004 A1 - Bjorner, J. B. JF - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory CY - Item Banks, and Computer-Adaptive Testing,” Bethesda MD N1 - {PDF file, 406 KB} ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory Y1 - 2004 A1 - Chang, C-H. JF - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory CY - Item Banks, and Computer-Adaptive Testing,” Bethesda MD N1 - {PDF file, 181 KB} ER - TY - JOUR T1 - The development and evaluation of a software prototype for computer-adaptive testing JF - Computers and Education Y1 - 2004 A1 - Lilley, M A1 - Barker, T A1 - Britton, C KW - computerized adaptive testing VL - 43 ER - TY - JOUR T1 - Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2004 A1 - Chen, S-Y. A1 - Ankenmann, R. D. KW - computerized adaptive testing KW - item selection rules KW - practical constraints AB - The purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 41 SN - 0022-0655 (Print) ER - TY - JOUR T1 - Estimating ability and item-selection strategy in self-adapted testing: A latent class approach JF - Journal of Educational and Behavioral Statistics Y1 - 2004 A1 - Revuelta, J. KW - estimating ability KW - item-selection strategies KW - psychometric model KW - self-adapted testing AB - This article presents a psychometric model for estimating ability and item-selection strategies in self-adapted testing. In contrast to computer adaptive testing, in self-adapted testing the examinees are allowed to select the difficulty of the items. The item-selection strategy is defined as the distribution of difficulty conditional on the responses given to previous items. The article shows that missing responses in self-adapted testing are missing at random and can be ignored in the estimation of ability. However, the item-selection strategy cannot always be ignored in such an estimation. An EM algorithm is presented to estimate an examinee's ability and strategies, and a model fit is evaluated using Akaike's information criterion. The article includes an application with real data to illustrate how the model can be used in practice for evaluating hypotheses, estimating ability, and identifying strategies. In the example, four strategies were identified and related to examinees' ability. It was shown that individual examinees tended not to follow a consistent strategy throughout the test. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - American Educational Research Assn: US VL - 29 SN - 1076-9986 (Print) ER - TY - RPRT T1 - Evaluating scale stability of a computer adaptive testing system Y1 - 2004 A1 - Guo, F. A1 - Wang, L. PB - GMAC CY - McLean, VA ER - TY - BOOK T1 - Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment Y1 - 2004 A1 - Zenisky, A. L. CY - Unpublished doctoral dissertation, University of Massachusetts, Amherst ER - TY - JOUR T1 - Évaluation et multimédia dans l'apprentissage d'une L2 [Assessment and multimedia in learning an L2] JF - ReCALL Y1 - 2004 A1 - Laurier, M. KW - Adaptive Testing KW - Computer Assisted Instruction KW - Educational KW - Foreign Language Learning KW - Program Evaluation KW - Technology computerized adaptive testing AB - In the first part of this paper different areas where technology may be used for second language assessment are described. First, item banking operations, which are generally based on item Response Theory but not necessarily restricted to dichotomously scored items, facilitate assessment task organization and require technological support. Second, technology may help to design more authentic assessment tasks or may be needed in some direct testing situations. Third, the assessment environment may be more adapted and more stimulating when technology is used to give the student more control. The second part of the paper presents different functions of assessment. The monitoring function (often called formative assessment) aims at adapting the classroom activities to students and to provide continuous feedback. Technology may be used to train the teachers in monitoring techniques, to organize data or to produce diagnostic information; electronic portfolios or quizzes that are built in some educational software may also be used for monitoring. The placement function is probably the one in which the application of computer adaptive testing procedures (e.g. French CAPT) is the most appropriate. Automatic scoring devices may also be used for placement purposes. Finally the certification function requires more valid and more reliable tools. Technology may be used to enhance the testing situation (to make it more authentic) or to facilitate data processing during the construction of a test. Almond et al. (2002) propose a four component model (Selection, Presentation, Scoring and Response) for designing assessment systems. Each component must be planned taking into account the assessment function. VL - 16 ER - TY - JOUR T1 - Evaluation of the CATSIB DIF procedure in a pretest setting JF - Journal of Educational and Behavioral Statistics Y1 - 2004 A1 - Nandakumar, R. A1 - Roussos, L. A. KW - computerized adaptive tests KW - differential item functioning AB - A new procedure, CATSIB, for assessing differential item functioning (DIF) on computerized adaptive tests (CATs) is proposed. CATSIB, a modified SIBTEST procedure, matches test takers on estimated ability and controls for impact-induced Type I error inflation by employing a CAT version of the SIBTEST "regression correction." The performance of CATSIB in terms of detection of DIF in pretest items was evaluated in a simulation study. Simulated test takers were adoptively administered 25 operational items from a pool of 1,000 and were linearly administered 16 pretest items that were evaluated for DIF. Sample size varied from 250 to 500 in each group. Simulated impact levels ranged from a 0- to 1-standard-deviation difference in mean ability levels. The results showed that CATSIB with the regression correction displayed good control over Type 1 error, whereas CATSIB without the regression correction displayed impact-induced Type 1 error inflation. With 500 test takers in each group, power rates were exceptionally high (84% to 99%) for values of DIF at the boundary between moderate and large DIF. For smaller samples of 250 test takers in each group, the corresponding power rates ranged from 47% to 95%. In addition, in all cases, CATSIB was very accurate in estimating the true values of DIF, displaying at most only minor estimation bias. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - American Educational Research Assn: US VL - 29 SN - 1076-9986 (Print) ER - TY - JOUR T1 - ffects of practical constraints on item selection rules at the early stages of computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2004 A1 - Chen, Y.-Y. A1 - Ankenmann, R. D. VL - 41 ER - TY - JOUR T1 - Impact of Test Design, Item Quality, and Item Bank Size on the Psychometric Properties of Computer-Based Credentialing Examinations JF - Educational and Psychological Measurement Y1 - 2004 A1 - Xing, Dehui A1 - Hambleton, Ronald K. AB -

Computer-based testing by credentialing agencies has become common; however, selecting a test design is difficult because several good ones are available—parallel forms, computer adaptive (CAT), and multistage (MST). In this study, three computerbased test designs under some common examination conditions were investigated. Item bank size and item quality had a practically significant impact on decision consistency and accuracy. Even in nearly ideal situations, the choice of test design was not a factor in the results. Two conclusions follow from the findings: (a) More time and resources should be committed to expanding the size and quality of item banks, and (b) designs that individualize an exam administration such as MST and CAT may not be helpful when the primary purpose of the examination is to make pass-fail decisions and conditions are present for using parallel forms with a target information function that can be centered on the passing score.

VL - 64 UR - http://epm.sagepub.com/content/64/1/5.abstract ER - TY - JOUR T1 - Implementation and Measurement Efficiency of Multidimensional Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2004 A1 - Wang, Wen-Chung A1 - Chen, Po-Hsi AB -

Multidimensional adaptive testing (MAT) procedures are proposed for the measurement of several latent traits by a single examination. Bayesian latent trait estimation and adaptive item selection are derived. Simulations were conducted to compare the measurement efficiency of MAT with those of unidimensional adaptive testing and random administration. The results showed that the higher the correlation between latent traits, the more latent traits there were, and the more scoring levels there were in the items, the more efficient MAT was than the other two procedures. For tests containing multidimensional items, only MAT is applicable, whereas unidimensional adaptive testing is not. Issues in implementing MAT are discussed.

VL - 28 UR - http://apm.sagepub.com/content/28/5/295.abstract ER - TY - CONF T1 - Investigating the effects of selected multi-stage test design alternatives on credentialing outcomes T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Zenisky, A. L. A1 - Hambleton, R. K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 129 KB} ER - TY - Generic T1 - An investigation of two combination procedures of SPRT for three-category classification decisions in computerized classification test T2 - annual meeting of the American Educational Research Association Y1 - 2004 A1 - Jiao, H. A1 - Wang, S A1 - Lau, CA KW - computerized adaptive testing KW - Computerized classification testing KW - sequential probability ratio testing JF - annual meeting of the American Educational Research Association CY - San Antonio, Texas N1 - annual meeting of the American Educational Research Association, San Antonio ER - TY - ABST T1 - An investigation of two combination procedures of SPRT for three-category decisions in computerized classification test Y1 - 2004 A1 - Jiao, H. A1 - Wang, S A1 - Lau, A CY - Paper presented at the annual meeting of the American Educational Research Association, San Diego CA N1 - {PDF file, 649 KB} ER - TY - CONF T1 - Item parameter recovery with adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Do, B.-R. A1 - Chuah, S. C. A1 - F Drasgow JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - #DO04-01 {PDF file, 379 KB} ER - TY - JOUR T1 - Kann die Konfundierung von Konzentrationsleistung und Aktivierung durch adaptives Testen mit dern FAKT vermieden werden? [Avoiding the confounding of concentration performance and activation by adaptive testing with the FACT] JF - Zeitschrift für Differentielle und Diagnostische Psychologie Y1 - 2004 A1 - Frey, A. A1 - Moosbrugger, H. KW - Adaptive Testing KW - Computer Assisted Testing KW - Concentration KW - Performance KW - Testing computerized adaptive testing AB - The study investigates the effect of computerized adaptive testing strategies on the confounding of concentration performance with activation. A sample of 54 participants was administered 1 out of 3 versions (2 adaptive, 1 non-adaptive) of the computerized Frankfurt Adaptive Concentration Test FACT (Moosbrugger & Heyden, 1997) at three subsequent points in time. During the test administration changes in activation (electrodermal activity) were recorded. The results pinpoint a confounding of concentration performance with activation for the non-adaptive test version, but not for the adaptive test versions (p = .01). Thus, adaptive FACT testing strategies can remove the confounding of concentration performance with activation, thereby increasing the discriminant validity. In conclusion, an attention-focusing-hypothesis is formulated to explain the observed effect. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 25 ER - TY - CHAP T1 - A Learning Environment for English for Academic Purposes Based on Adaptive Tests and Task-Based Systems T2 - Intelligent Tutoring Systems Y1 - 2004 A1 - Gonçalves, Jean P. A1 - Aluisio, Sandra M. A1 - de Oliveira, Leandro H.M. A1 - Oliveira Jr., Osvaldo N. ED - Lester, James C. ED - Vicari, Rosa Maria ED - Paraguaçu, Fábio JF - Intelligent Tutoring Systems T3 - Lecture Notes in Computer Science PB - Springer Berlin / Heidelberg VL - 3220 SN - 978-3-540-22948-3 UR - http://dx.doi.org/10.1007/978-3-540-30139-4_1 ER - TY - CONF T1 - A learning environment for english for academic purposes based on adaptive tests and task-based systems T2 - Intelligent Tutoring Systems. Y1 - 2004 A1 - PITON-GONÇALVES, J. A1 - ALUISIO, S. M. A1 - MENDONCA, L. H. A1 - NOVAES, O. O. JF - Intelligent Tutoring Systems. PB - Springer Berlin Heidelberg ER - TY - JOUR T1 - Mokken Scale Analysis Using Hierarchical Clustering Procedures JF - Applied Psychological Measurement Y1 - 2004 A1 - van Abswoude, Alexandra A. H. A1 - Vermunt, Jeroen K. A1 - Hemker, Bas T. A1 - van der Ark, L. Andries AB -

Mokken scale analysis (MSA) can be used to assess and build unidimensional scales from an item pool that is sensitive to multiple dimensions. These scales satisfy a set of scaling conditions, one of which follows from the model of monotone homogeneity. An important drawback of the MSA program is that the sequential item selection and scale construction procedure may not find the dominant underlying dimensionality of the responses to a set of items. The authors investigated alternative hierarchical item selection procedures and compared the performance of four hierarchical methods and the sequential clustering method in the MSA context. The results showed that hierarchical clustering methods can improve the search process of the dominant dimensionality of a data matrix. In particular, the complete linkage and scale linkage methods were promising in finding the dimensionality of the item response data from a set of items.

VL - 28 UR - http://apm.sagepub.com/content/28/5/332.abstract ER - TY - CONF T1 - Mutual information item selection in multiple-category classification CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Weissman, A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - #WE04-02 ER - TY - CONF T1 - New methods for CBT item pool evaluation T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2004 A1 - Wang, L. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Diego CA N1 - #WA04-02 {PDF file, 1.005 MB} ER - TY - ABST T1 - Optimal testing with easy items in computerized adaptive testing (Measurement and Research Department Report 2004-2) Y1 - 2004 A1 - Theo Eggen A1 - Verschoor, A. J. CY - Arnhem, The Netherlands: Cito Group ER - TY - ABST T1 - Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project Y1 - 2004 A1 - Holman, R. A1 - Glas, C. A. A1 - Lindeboom, R. A1 - Zwinderman, A. H. A1 - de Haan, R. J. KW - *Disability Evaluation KW - *Health Surveys KW - *Logistic Models KW - *Questionnaires KW - Activities of Daily Living/*classification KW - Data Interpretation, Statistical KW - Health Status KW - Humans KW - Pilot Projects KW - Probability KW - Quality of Life KW - Severity of Illness Index AB - BACKGROUND: Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. METHODS: The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. RESULTS: The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. CONCLUSIONS: The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. JF - Health and Quality of Life Outcomes VL - 2 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Holman, RebeccaGlas, Cees A WLindeboom, RobertZwinderman, Aeilko Hde Haan, Rob JEnglandHealth Qual Life Outcomes. 2004 Jun 16;2:29. U2 - 441407 ER - TY - JOUR T1 - Pre-equating: a simulation study based on a large scale assessment model JF - Journal of Applied Measurement Y1 - 2004 A1 - Taherbhai, H. M. A1 - Young, M. J. KW - *Databases KW - *Models, Theoretical KW - Calibration KW - Human KW - Psychometrics KW - Reference Values KW - Reproducibility of Results AB - Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank. VL - 5 N1 - 1529-7713Journal Article ER - TY - CONF T1 - Protecting the integrity of computer-adaptive licensure tests: Results of a legal challenge T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2004 A1 - Cizek, G. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Diego CA N1 - {PDF file, 191 KB} ER - TY - JOUR T1 - Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain JF - Medical Care Y1 - 2004 A1 - Coster, W. J. A1 - Haley, S. M. A1 - Andres, P. L. A1 - Ludlow, L. H. A1 - Bond, T. L. A1 - Ni, P. S. KW - *Self Efficacy KW - *Sickness Impact Profile KW - Activities of Daily Living/*classification/psychology KW - Adult KW - Aged KW - Aged, 80 and over KW - Disability Evaluation KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods/statistics & numerical data KW - Questionnaires/*standards KW - Recovery of Function/physiology KW - Rehabilitation/*standards/statistics & numerical data KW - Reproducibility of Results KW - Research Support, U.S. Gov't, Non-P.H.S. KW - Research Support, U.S. Gov't, P.H.S. KW - Sensitivity and Specificity AB - BACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches. VL - 42 N1 - 0025-7079Journal Article ER - TY - JOUR T1 - Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care JF - Archives of Physical Medicine and Rehabilitation Y1 - 2004 A1 - Haley, S. M. A1 - Coster, W. J. A1 - Andres, P. L. A1 - Kosinski, M. A1 - Ni, P. KW - Boston KW - Factor Analysis, Statistical KW - Humans KW - Outcome Assessment (Health Care)/*methods KW - Prospective Studies KW - Questionnaires/standards KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals. VL - 85 SN - 0003-9993 (Print) N1 - Haley, Stephen MCoster, Wendy JAndres, Patricia LKosinski, MarkNi, PengshengR01 hd43568/hd/nichdComparative StudyMulticenter StudyResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.United StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2004 Apr;85(4):661-6. ER - TY - CONF T1 - A sequential Bayesian procedure for item calibration in multistage testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - van der Linden, W. J. A1 - Alan D Mead JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA ER - TY - JOUR T1 - Sequential estimation in variable length computerized adaptive testing JF - Journal of Statistical Planning and Inference Y1 - 2004 A1 - Chang, I. Y. AB - With the advent of modern computer technology, there have been growing e3orts in recent years to computerize standardized tests, including the popular Graduate Record Examination (GRE), the Graduate Management Admission Test (GMAT) and the Test of English as a Foreign Language (TOEFL). Many of such computer-based tests are known as the computerized adaptive tests, a major feature of which is that, depending on their performance in the course of testing, di3erent examinees may be given with di3erent sets of items (questions). In doing so, items can be e>ciently utilized to yield maximum accuracy for estimation of examinees’ ability traits. We consider, in this article, one type of such tests where test lengths vary with examinees to yield approximately same predetermined accuracy for all ability traits. A comprehensive large sample theory is developed for the expected test length and the sequential point and interval estimates of the latent trait. Extensive simulations are conducted with results showing that the large sample approximations are adequate for realistic sample sizes. VL - 121 SN - 03783758 ER - TY - JOUR T1 - A sharing item response theory model for computerized adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 2004 A1 - Segall, D. O. AB - A new sharing item response theory (SIRT) model is presented which explicitly models the effects of sharing item content between informants and testtakers. This model is used to construct adaptive item selection and scoring rules that provide increased precision and reduced score gains in instances where sharing occurs. The adaptive item selection rules are expressed as functions of the item’s exposure rate in addition to other commonly used properties (characterized by difficulty, discrimination, and guessing parameters). Based on the results of simulated item responses, the new item selection and scoring algorithms compare favorably to the Sympson-Hetter exposure control method. The new SIRT approach provides higher reliability and lower score gains in instances where sharing occurs. VL - 29 N1 - References .American Educational Research Assn, US ER - TY - JOUR T1 - Siette: a web-based tool for adaptive testing JF - International Journal of Artificial Intelligence in Education Y1 - 2004 A1 - Conejo, R A1 - Guzmán, E A1 - Millán, E A1 - Trella, M A1 - Pérez-De-La-Cruz, JL A1 - Ríos, A KW - computerized adaptive testing VL - 14 ER - TY - CHAP T1 - State-of-the-art and adaptive open-closed items in adaptive foreign language assessment Y1 - 2004 A1 - Giouroglou, H. A1 - Economides, A. A. CY - Proceedings 4th Hellenic Conference with ternational Participation: Informational and Communication Technologies in Education, Athens,747-756 ER - TY - JOUR T1 - Statistics for detecting disclosed items in a CAT environment JF - Metodologia de Las Ciencias del Comportamiento. Y1 - 2004 A1 - Lu, Y., A1 - Hambleton, R. K. VL - 5 IS - 2 ER - TY - JOUR T1 - Strategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model JF - Applied Psychological Measurement Y1 - 2004 A1 - Davis, L. L. KW - computerized adaptive testing KW - generalized partial credit model KW - item exposure AB - Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline condition, the randomesque, modified-within-.10-logits, Sympson-Hetter, conditional Sympson-Hetter, a-stratified with multiple-stratification, and enhanced a-stratified with multiple-stratification procedures were implemented to control exposure rates. Two variations of the randomesque and modified-within-.10-logits procedures were examined, which varied the size of the item group from which the next item to be administered was randomly selected. The results indicate that although the conditional Sympson-Hetter provides somewhat lower maximum exposure rates, the randomesque and modified-within-.10-logits procedures with the six-item group variation have great utility for controlling overlap rates and increasing pool utilization and should be given further consideration. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 28 SN - 0146-6216 (Print) ER - TY - JOUR T1 - Strategies for Controlling Item Exposure in Computerized Adaptive Testing With the Generalized Partial Credit Model JF - Applied Psychological Measurement Y1 - 2004 A1 - Davis, Laurie Laughlin AB -

Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline condition, the randomesque, modified-within-.10-logits, Sympson-Hetter, conditional Sympson-Hetter, a-stratified with multiple-stratification, and enhanced a-stratified with multiple-stratification procedures were implemented to control exposure rates. Two variations of the randomesque and modified-within-.10-logits procedures were examined, which varied the size of the item group from which the next item to be administered was randomly selected. The results indicate that although the conditional Sympson-Hetter provides somewhat lower maximum exposure rates, the randomesque and modified-within-.10-logits procedures with the six-item group variation have great utility for controlling overlap rates and increasing pool utilization and should be given further consideration.

VL - 28 UR - http://apm.sagepub.com/content/28/3/165.abstract ER - TY - JOUR T1 - Strategies for controlling testlet exposure rates in computerized adaptive testing systems JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2004 A1 - Boyd, Aimee Michelle AB - Exposure control procedures in computerized adaptive testing (CAT) systems protect item pools from being compromised, however, this impacts measurement precision. Previous research indicates that exposure control procedures perform differently for dichotomously scored versus polytomously scored CAT systems. For dichotomously scored CATs, conditional selection procedures are often the optimal choice, while randomization procedures perform best for polytomously scored CATs. CAT systems modeled with testlet response theory have not been examined to determine optimal exposure control procedures. This dissertation examined various exposure control procedures in testlet-based CAT systems using the three-parameter logistic testlet response theory model and the partial credit model. The exposure control procedures were the randomesque procedure, the modified within .10 logits procedure, two levels of the progressive restricted procedure, and two levels of the Sympson-Hetter procedure. Each of these was compared to a baseline no exposure control procedure, maximum information. The testlets were reading passages with six to ten multiple-choice items. The CAT systems consisted of maximum information testlet selection contingent on an exposure control procedure and content balancing for passage type and the number of items per passage; expected a posteriori ability estimation; and a fixed length stopping rule of seven testlets totaling fifty multiple-choice items. Measurement precision and exposure rates were examined to evaluate the effectiveness of the exposure control procedures for each measurement model. The exposure control procedures yielded similar results for measurement precision within the models. The exposure rates distinguished which exposure control procedures were most effective. The Sympson-Hetter conditions, which are conditional procedures, maintained the pre-specified maximum exposure rate, but performed very poorly in terms of pool utilization. The randomization procedures, randomesque and modified within .10 logits, yielded low maximum exposure rates, but used only about 70% of the testlet pool. Surprisingly, the progressive restricted procedure, which is a combination of both a conditional and randomization procedure, yielded the best results in its ability to maintain and control the maximum exposure rate and it used the entire testlet pool. The progressive restricted conditions were the optimal procedures for both the partial credit CAT systems and the three-parameter logistic testlet response theory CAT systems. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - CONF T1 - A study of multiple stage adaptive test designs T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Armstrong, R. D. A1 - Edmonds, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 288 KB} ER - TY - JOUR T1 - Test difficulty and stereotype threat on the GRE General Test JF - Journal of Applied Social Psychology Y1 - 2004 A1 - Stricker, L. J., A1 - Bejar, I. I. VL - 34(3) ER - TY - JOUR T1 - Testing vocabulary knowledge: Size, strength, and computer adaptiveness JF - Language Learning Y1 - 2004 A1 - Laufer, B. A1 - Goldstein, Z. AB - (from the journal abstract) In this article, we describe the development and trial of a bilingual computerized test of vocabulary size, the number of words the learner knows, and strength, a combination of four aspects of knowledge of meaning that are assumed to constitute a hierarchy of difficulty: passive recognition (easiest), active recognition, passive recall, and active recall (hardest). The participants were 435 learners of English as a second language. We investigated whether the above hierarchy was valid and which strength modality correlated best with classroom language performance. Results showed that the hypothesized hierarchy was present at all word frequency levels, that passive recall was the best predictor of classroom language performance, and that growth in vocabulary knowledge was different for the different strength modalities. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 54 N1 - References .Blackwell Publishing, United Kingdom ER - TY - CHAP T1 - Understanding computerized adaptive testing: From Robbins-Munro to Lord and beyond Y1 - 2004 A1 - Chang, Hua-Hua CY - D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 117-133). New York: Sage. ER - TY - JOUR T1 - Using patterns of summed scores in paper-and-pencil tests and computer-adaptive tests to detect misfitting item score patterns JF - Journal of Educational Measurement Y1 - 2004 A1 - Meijer, R. R. KW - Computer Assisted Testing KW - Item Response Theory KW - person Fit KW - Test Scores AB - Two new methods have been proposed to determine unexpected sum scores on subtests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted ρ, was compared with a method where the probability for each score combination was calculated using a highest density region (HDR). Furthermore, these methods were compared with the standardized log-likelihood statistic with and without a correction for the estimated latent trait value (denoted as l-super(*)-sub(z) and l-sub(z), respectively). Data were simulated on the basis of the one-parameter logistic model, and both parametric and nonparametric logistic regression was used to obtain estimates of the latent trait. Results showed that it is important to take the trait level into account when comparing subtest scores. In a nonparametric item response theory (IRT) context, on adapted version of the HDR method was a powerful alterative to ρ. In a parametric IRT context, results showed that l-super(*)-sub(z) had the highest power when the data were simulated conditionally on the estimated latent trait level. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 41 ER - TY - JOUR T1 - Using Set Covering with Item Sampling to Analyze the Infeasibility of Linear Programming Test Assembly Models JF - Applied Psychological Measurement Y1 - 2004 A1 - Huitzing, Hiddo A. AB -

This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be infeasible. Causes of infeasibility can be difficult to find. A method is proposed that constitutes a helpful tool for test assemblers to detect infeasibility before hand and, in the case of infeasibility, give insight into its causes. This method is based on SCIS. Although SCIS can help to detect feasibility or infeasibility, its power lies in pinpointing causes of infeasibility such as irreducible infeasible sets of constraints. Methods to resolve infeasibility are also given, minimizing the model deviations. A simulation study is presented, offering a guide to test assemblers to analyze and solve infeasibility.

VL - 28 UR - http://apm.sagepub.com/content/28/5/355.abstract ER - TY - JOUR T1 - Validating the German computerized adaptive test for anxiety on healthy sample (A-CAT) JF - Quality of Life Research Y1 - 2004 A1 - Becker, J. A1 - Walter, O. B. A1 - Fliege, H. A1 - Bjorner, J. B. A1 - Kocalevent, R. D. A1 - Schmid, G. A1 - Klapp, B. F. A1 - Rose, M. VL - 13 ER - TY - CONF T1 - Accuracy of reading and mathematics ability estimates under the shadow-test constraint MCAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Li, Y. H. A1 - Schafer, W. D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - ABST T1 - An adaptation of stochastic curtailment to truncate Wald’s SPRT in computerized adaptive testing Y1 - 2003 A1 - Finkelman, M. AB -

Computerized adaptive testing (CAT) has been shown to increase eÆciency in educational measurement. One common application of CAT is to classify students as either pro cient or not proficient in ability. A truncated form of Wald's sequential probability ratio test (SPRT), in which examination is halted after a prespeci ed number of questions, has been proposed to provide a diagnosis of prociency. This article studies the further truncation provided by stochastic curtailment, where an exam is stopped early if completion of the remaining questions would be unlikely to alter the classi cation of the examinee. In a simulation study presented, the increased truncation is shown to offer substantial improvement in test length with only a slight decrease in accuracy.

PB - National Center for Research on Evaluation, Standards, and Student Testing CY - Los Angeles ER - TY - CHAP T1 - Adaptive exploration of assessment results under uncertainty Y1 - 2003 A1 - Lamboudis, D. A1 - Economides, A. A. A1 - Papastergiou, A. CY - Proceedings 3rd IEEE ternational Conference on Advanced Learning Technologies, ICALT '03, 460-461, 2003. ER - TY - CONF T1 - An adaptive exposure control algorithm for computerized adaptive testing using a sharing item response theory model T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Segall, D. O. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 191 KB} ER - TY - JOUR T1 - Alpha-stratified adaptive testing with large numbers of content constraints JF - Applied Psychological Measurement Y1 - 2003 A1 - van der Linden, W. J. A1 - Chang, Hua-Hua VL - 27 ER - TY - CONF T1 - The assembly of multiple form structures T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Armstrong, R. D. A1 - Little, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 418 KB} ER - TY - ABST T1 - The assembly of multiple stage adaptive tests with discrete items Y1 - 2003 A1 - Armstrong, R. D. A1 - Edmonds, J.J. CY - Newtown, PA: Law School Admission Council Report ER - TY - CONF T1 - Assessing CAT security breaches by the item pooling index T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Chang, Hua-Hua A1 - Zhang, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CHAP T1 - Assessing question banks T2 - Reusing online resources: A sustanable approach to e-learning Y1 - 2003 A1 - Bull, J. A1 - Dalziel, J. A1 - Vreeland, T. KW - Computer Assisted Testing KW - Curriculum Based Assessment KW - Education KW - Technology computerized adaptive testing AB - In Chapter 14, Joanna Bull and James Daziel provide a comprehensive treatment of the issues surrounding the use of Question Banks and Computer Assisted Assessment, and provide a number of excellent examples of implementations. In their review of the technologies employed in Computer Assisted Assessment the authors include Computer Adaptive Testing and data generation. The authors reveal significant issues involving the impact of Intellectual Property rights and computer assisted assessment and make important suggestions for strategies to overcome these obstacles. (PsycINFO Database Record (c) 2005 APA )http://www-jime.open.ac.uk/2003/1/ (journal abstract) JF - Reusing online resources: A sustanable approach to e-learning PB - Kogan Page Ltd. CY - London, UK ER - TY - CONF T1 - Assessing the efficiency of item selection in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Weissman, A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 96 KB} ER - TY - JOUR T1 - a-Stratified multistage CAT design with content-blocking JF - British Journal of Mathematical and Statistical Psychology Y1 - 2003 A1 - Yi, Q. A1 - Chang, H.-H. VL - 56 ER - TY - CHAP T1 - Bayesian checks on outlying response times in computerized adaptive testing Y1 - 2003 A1 - van der Linden, W. J. CY - H. Yanai, A. Okada, K. Shigemasu, Y. Kano, Y. and J. J. Meulman, (Eds.), New developments in psychometrics (pp. 215-222). New York: Springer-Verlag. ER - TY - JOUR T1 - A Bayesian method for the detection of item preknowledge in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - McLeod L. D., Lewis, C., A1 - Thissen, D. VL - 27 ER - TY - JOUR T1 - A Bayesian method for the detection of item preknowledge in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - McLeod, L. A1 - Lewis, C. A1 - Thissen, D. KW - Adaptive Testing KW - Cheating KW - Computer Assisted Testing KW - Individual Differences computerized adaptive testing KW - Item KW - Item Analysis (Statistical) KW - Mathematical Modeling KW - Response Theory AB - With the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA ) VL - 27 ER - TY - CONF T1 - Calibrating CAT item pools and online pretest items using MCMC methods T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Segall, D. O. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 155 KB} ER - TY - CONF T1 - Calibrating CAT pools and online pretest items using marginal maximum likelihood methods T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Pommerich, M A1 - Segall, D. O. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 284 KB} ER - TY - CONF T1 - Calibrating CAT pools and online pretest items using nonparametric and adjusted marginal maximum likelihood methods T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Krass, I. A. A1 - Williams, B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - PDF file, 128 K ER - TY - JOUR T1 - Calibration of an item pool for assessing the burden of headaches: an application of item response theory to the Headache Impact Test (HIT) JF - Quality of Life Research Y1 - 2003 A1 - Bjorner, J. B. A1 - Kosinski, M. A1 - Ware, J. E., Jr. KW - *Cost of Illness KW - *Decision Support Techniques KW - *Sickness Impact Profile KW - Adolescent KW - Adult KW - Aged KW - Comparative Study KW - Disability Evaluation KW - Factor Analysis, Statistical KW - Headache/*psychology KW - Health Surveys KW - Human KW - Longitudinal Studies KW - Middle Aged KW - Migraine/psychology KW - Models, Psychological KW - Psychometrics/*methods KW - Quality of Life/*psychology KW - Software KW - Support, Non-U.S. Gov't AB - BACKGROUND: Measurement of headache impact is important in clinical trials, case detection, and the clinical monitoring of patients. Computerized adaptive testing (CAT) of headache impact has potential advantages over traditional fixed-length tests in terms of precision, relevance, real-time quality control and flexibility. OBJECTIVE: To develop an item pool that can be used for a computerized adaptive test of headache impact. METHODS: We analyzed responses to four well-known tests of headache impact from a population-based sample of recent headache sufferers (n = 1016). We used confirmatory factor analysis for categorical data and analyses based on item response theory (IRT). RESULTS: In factor analyses, we found very high correlations between the factors hypothesized by the original test constructers, both within and between the original questionnaires. These results suggest that a single score of headache impact is sufficient. We established a pool of 47 items which fitted the generalized partial credit IRT model. By simulating a computerized adaptive health test we showed that an adaptive test of only five items had a very high concordance with the score based on all items and that different worst-case item selection scenarios did not lead to bias. CONCLUSION: We have established a headache impact item pool that can be used in CAT of headache impact. VL - 12 N1 - 0962-9343Journal Article ER - TY - JOUR T1 - Can an item response theory-based pain item bank enhance measurement precision? JF - Clinical Therapeutics Y1 - 2003 A1 - Lai, J-S. A1 - Dineen, K. A1 - Cella, D. A1 - Von Roenn, J. VL - 25 JO - Clin Ther ER - TY - CONF T1 - Can We Assess Pre-K Kids With Computer-Based Tests: STAR Early Literacy Data T2 - Presentation to the 33rd Annual National Conference on Large-Scale Assessment. Y1 - 2003 A1 - J. R. McBride JF - Presentation to the 33rd Annual National Conference on Large-Scale Assessment. CY - San Antonio TX ER - TY - ABST T1 - CAT-ASVAB prototype Internet delivery system: Final report (FR-03-06) Y1 - 2003 A1 - Sticha, P. J. A1 - Barber, G. CY - Arlington VA: Human Resources Rsearch Organization N1 - {PDF file, 393 KB} ER - TY - CONF T1 - Cognitive CAT in foreign language assessment T2 - Proceedings 11th International PEG Conference Y1 - 2003 A1 - Giouroglou, H. A1 - Economides, A. A. JF - Proceedings 11th International PEG Conference CY - Powerful ICT Tools for Learning and Teaching, PEG '03, CD-ROM, 2003 ER - TY - JOUR T1 - A comparative study of item exposure control methods in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chang, S-W. A1 - Ansley, T. N. KW - Adaptive Testing KW - Computer Assisted Testing KW - Educational KW - Item Analysis (Statistical) KW - Measurement KW - Strategies computerized adaptive testing AB - This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA ) VL - 40 ER - TY - CONF T1 - A comparison of exposure control procedures in CAT systems based on different measurement models for testlets using the verbal reasoning section of the MCAT T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Boyd, A. M A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 405 KB} ER - TY - CONF T1 - A comparison of item exposure control procedures using a CAT system based on the generalized partial credit model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Burt, W. M A1 - Kim, S.-J A1 - Davis, L. L. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 265 KB} ER - TY - CONF T1 - A comparison of learning potential results at various educational levels T2 - Paper presented at the 6th Annual Society for Industrial and Organisational Psychology of South Africa (SIOPSA) conference Y1 - 2003 A1 - De Beer, M. JF - Paper presented at the 6th Annual Society for Industrial and Organisational Psychology of South Africa (SIOPSA) conference CY - 25-27 June 2003 N1 - {PDF file, 391 KB} ER - TY - CONF T1 - Comparison of multi-stage tests with computer adaptive and paper and pencil tests T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Rotou, O. A1 - Patsula, L. A1 - Steffen, M. A1 - Rizavi, S. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 695 KB} ER - TY - JOUR T1 - A computer adaptive testing simulation applied to the FIM instrument motor component JF - Arch Phys Med Rehabil Y1 - 2003 A1 - Dijkers, M.P. VL - 84 ER - TY - JOUR T1 - Computer-adaptive test for measuring personality factors using item response theory JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Macdonald, Paul Lawrence AB - The aim of the present research was to develop a computer adaptive test with the graded response model to measure the Five Factor Model of personality attributes. In the first of three studies, simulated items and simulated examinees were used to investigate systematically the impact of several variables on the accuracy and efficiency of a computer adaptive test. Item test banks containing more items, items with greater trait discrimination, and more response options resulted in increased accuracy and efficiency of the computer adaptive test. It was also found that large stopping rule values required fewer items before stopping but had less accuracy compared to smaller stopping rule values. This demonstrated a trade-off between accuracy and efficiency such that greater measurement accuracy can be obtained at a cost of decreased test efficiency. In the second study, the archival responses of 501 participants to five 30-item test banks measuring the Five Factor Model of personality were utilized in simulations of a computer adaptive personality test. The computer adaptive test estimates of participant trait scores were highly correlated with the item response theory trait estimates, and the magnitude of the correlation was related directly to the stopping rule value with higher correlations and less measurement error being associated with smaller stopping rule values. It was also noted that the performance of the computer adaptive test was dependent on the personality factor being measured whereby Conscientiousness required the most number of items to be administered and Neuroticism required the least. The results confirmed that a simulated computer adaptive test using archival personality data could accurately and efficiently attain trait estimates. In the third study, 276 student participants selected response options with a click of a mouse in a computer adaptive personality test (CAPT) measuring the Big Five factors of the Five Factor Model of personality structure. Participant responses to alternative measures of the Big Five were also collected using conventional paper-and-pencil personality questionnaires. It was found that the CAPT obtained trait estimates that were very accurate even with very few administered items. Similarly, the CAPT trait estimates demonstrated moderate to high concurrent validity with the alternative Big Five measures, and the strength of the estimates varied as a result of the similarity of the personality items and assessment methodology. It was also found that the computer adaptive test was accurately able to detect, with relatively few items, the relations between the measured personality traits and several socially interesting variables such as smoking behavior, alcohol consumption rating, and number of dates per month. Implications of the results of this research are discussed in terms of the utility of computer adaptive testing of personality characteristics. As well, methodological limitations of the studies are noted and directions for future research are considered. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - JOUR T1 - Computerized adaptive rating scales for measuring managerial performance JF - International Journal of Selection and Assessment Y1 - 2003 A1 - Schneider, R. J. A1 - Goff, M. A1 - Anderson, S. A1 - Borman, W. C. KW - Adaptive Testing KW - Algorithms KW - Associations KW - Citizenship KW - Computer Assisted Testing KW - Construction KW - Contextual KW - Item Response Theory KW - Job Performance KW - Management KW - Management Personnel KW - Rating Scales KW - Test AB - Computerized adaptive rating scales (CARS) had been developed to measure contextual or citizenship performance. This rating format used a paired-comparison protocol, presenting pairs of behavioral statements scaled according to effectiveness levels, and an iterative item response theory algorithm to obtain estimates of ratees' citizenship performance (W. C. Borman et al, 2001). In the present research, we developed CARS to measure the entire managerial performance domain, including task and citizenship performance, thus addressing a major limitation of the earlier CARS. The paper describes this development effort, including an adjustment to the algorithm that reduces substantially the number of item pairs required to obtain almost as much precision in the performance estimates. (PsycINFO Database Record (c) 2005 APA ) VL - 11 ER - TY - CHAP T1 - Computerized adaptive testing Y1 - 2003 A1 - Ponsoda, V. A1 - Olea, J. CY - R. Fernández-Ballesteros (Ed.): Encyclopaedia of Psychological Assessment. London: Sage. ER - TY - CONF T1 - Computerized adaptive testing: A comparison of three content balancing methods T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. A1 - Wen. Z. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 227 KB} ER - TY - JOUR T1 - Computerized adaptive testing: A comparison of three content balancing methods JF - The Journal of Technology, Learning and Assessment Y1 - 2003 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. AB - Content balancing is often a practical consideration in the design of computerized adaptive testing (CAT). This study compared three content balancing methods, namely, the constrained CAT (CCAT), the modified constrained CAT (MCCAT), and the modified multinomial model (MMM), under various conditions of test length and target maximum exposure rate. Results of a series of simulation studies indicate that there is no systematic effect of content balancing method in measurement efficiency and pool utilization. However, among the three methods, the MMM appears to consistently over-expose fewer items. VL - 2 ER - TY - JOUR T1 - Computerized adaptive testing using the nearest-neighbors criterion JF - Applied Psychological Measurement Y1 - 2003 A1 - Cheng, P. E. A1 - Liou, M. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis KW - Item Response Theory KW - Statistical Analysis KW - Statistical Estimation computerized adaptive testing KW - Statistical Tests AB - Item selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - JOUR T1 - Computerized adaptive testing using the nearest-neighbors criterion JF - Applied Psychological Measurement Y1 - 2003 A1 - Cheng, P. E. A1 - Liou, M. VL - 27 ER - TY - JOUR T1 - Computerized adaptive testing with item cloning JF - Applied Psychological Measurement Y1 - 2003 A1 - Glas, C. A. W. A1 - van der Linden, W. J. KW - computerized adaptive testing AB - (from the journal abstract) To increase the number of items available for adaptive testing and reduce the cost of item writing, the use of techniques of item cloning has been proposed. An important consequence of item cloning is possible variability between the item parameters. To deal with this variability, a multilevel item response (IRT) model is presented which allows for differences between the distributions of item parameters of families of item clones. A marginal maximum likelihood and a Bayesian procedure for estimating the hyperparameters are presented. In addition, an item-selection procedure for computerized adaptive testing with item cloning is presented which has the following two stages: First, a family of item clones is selected to be optimal at the estimate of the person parameter. Second, an item is randomly selected from the family for administration. Results from simulation studies based on an item pool from the Law School Admission Test (LSAT) illustrate the accuracy of these item pool calibration and adaptive testing procedures. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 27 N1 - References .Sage Publications, US ER - TY - CONF T1 - Constraining item exposure in computerized adaptive testing with shadow tests T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - van der Linden, W. J. A1 - Veldkamp, B. P. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - #vdLI03-02 ER - TY - CONF T1 - Constructing rotating item pools for constrained adaptive testing T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Ariel, A. A1 - Veldkamp, B. A1 - van der Linden, W. J. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 395 KB} ER - TY - CONF T1 - Controlling item exposure and item eligibility in computerized adaptive testing Y1 - 2003 A1 - van der Linden, W. J. A1 - Veldkamp, B. P. ER - TY - CONF T1 - Criterion item characteristic curve function for evaluating the differential weight procedure adjusted to on-line item calibration T2 - Paper presented at the annual meeting of the NCME Y1 - 2003 A1 - Samejima, F. JF - Paper presented at the annual meeting of the NCME CY - Chicago IL ER - TY - JOUR T1 - Developing an initial physical function item bank from existing sources JF - Journal of Applied Measurement Y1 - 2003 A1 - Bode, R. K. A1 - Cella, D. A1 - Lai, J. S. A1 - Heinemann, A. W. KW - *Databases KW - *Sickness Impact Profile KW - Adaptation, Psychological KW - Data Collection KW - Humans KW - Neoplasms/*physiopathology/psychology/therapy KW - Psychometrics KW - Quality of Life/*psychology KW - Research Support, U.S. Gov't, P.H.S. KW - United States AB - The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed. VL - 4 N1 - 1529-7713Journal Article ER - TY - JOUR T1 - Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function (FLEX-SF) JF - Medical Care (in press) Y1 - 2003 A1 - Cook, K. F. A1 - Roddey, T. S. A1 - Gartsman, G M A1 - Olson, S L N1 - #CO03-01 ER - TY - CONF T1 - Development of the Learning Potential Computerised Adaptive Test (LPCAT) T2 - Unpublished manuscript. Y1 - 2003 A1 - De Beer, M. JF - Unpublished manuscript. N1 - {PDF file, 563 KB} ER - TY - JOUR T1 - Development, reliability, and validity of a computerized adaptive version of the Schedule for Nonadaptive and Adaptive Personality JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Simms, L. J. AB - Computerized adaptive testing (CAT) and Item Response Theory (IRT) techniques were applied to the Schedule for Nonadaptive and Adaptive Personality (SNAP) to create a more efficient measure with little or no cost to test reliability or validity. The SNAP includes 15 factor analytically derived and relatively unidimensional traits relevant to personality disorder. IRT item parameters were calibrated on item responses from a sample of 3,995 participants who completed the traditional paper-and-pencil (P&P) SNAP in a variety of university, community, and patient settings. Computerized simulations were conducted to test various adaptive testing algorithms, and the results informed the construction of the CAT version of the SNAP (SNAP-CAT). A validation study of the SNAP-CAT was conducted on a sample of 413 undergraduates who completed the SNAP twice, separated by one week. Participants were randomly assigned to one of four groups who completed (1) a modified P&P version of the SNAP (SNAP-PP) twice (n = 106), (2) the SNAP-PP first and the SNAP-CAT second (n = 105), (3) the SNAP-CAT first and the SNAP-PP second (n = 102), and (4) the SNAP-CAT twice (n = 100). Results indicated that the SNAP-CAT was 58% and 60% faster than the traditional P&P version, at Times 1 and 2, respectively, and mean item savings across scales were 36% and 37%, respectively. These savings came with minimal cost to reliability or validity, and the two test forms were largely equivalent. Descriptive statistics, rank-ordering of scores, internal factor structure, and convergent/discriminant validity were highly comparable across testing modes and methods of scoring, and very few differences between forms replicated across testing sessions. In addition, participants overwhelmingly preferred the computerized version to the P&P version. However, several specific problems were identified for the Self-harm and Propriety scales of the SNAP-CAT that appeared to be broadly related to IRT calibration difficulties. Reasons for these anomalous findings are discussed, and follow-up studies are suggested. Despite these specific problems, the SNAP-CAT appears to be a viable alternative to the traditional P&P SNAP. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 63 ER - TY - ABST T1 - Effect of extra time on GRE® Quantitative and Verbal Scores (Research Report 03-13) Y1 - 2003 A1 - Bridgeman, B. A1 - Cline, F. A1 - Hessinger, J. CY - Princeton NJ: Educational Testing service N1 - {PDF file, 88 KB} ER - TY - CONF T1 - The effect of item selection method on the variability of CAT’s ability estimates when item parameters are contaminated with measurement errors T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Li, Y. H. A1 - Schafer, W. D. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 275 KB} ER - TY - ABST T1 - The effects of model misfit in computerized classification test Y1 - 2003 A1 - Jiao, H. A1 - Lau, A. C. CY - Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago IL N1 - {PDF file, 432 KB} ER - TY - JOUR T1 - The effects of model specification error in item response theory-based computerized classification test using sequential probability ratio test JF - Dissertation Abstracts International Section A: Humanities & Social Sciences Y1 - 2003 A1 - Jiao, H. AB - This study investigated the effects of model specification error on classification accuracy, error rates, and average test length in Item Response Theory (IRT) based computerized classification test (CCT) using sequential probability ratio test (SPRT) in making binary decisions from examinees' dichotomous responses. This study consisted of three sub-studies. In each sub-study, one of the three unidimensional dichotomous IRT models, the 1-parameter logistic (IPL), the 2-parameter logistic (2PL), and the 3-parameter logistic (3PL) model was set as the true model and the other two models were treated as the misfit models. Item pool composition, test length, and stratum depth were manipulated to simulate different test conditions. To ensure the validity of the study results, the true model based CCTs using the true and the recalibrated item parameters were compared first to study the effect of estimation error in item parameters in CCTs. Then, the true model and the misfit model based CCTs were compared to accomplish the research goal, The results indicated that estimation error in item parameters did not affect classification results based on CCTs using SPRT. The effect of model specification error depended on the true model, the misfit model, and the item pool composition. When the IPL or the 2PL IRT model was the true model, the use of another IRT model had little impact on the CCT results. When the 3PL IRT model was the true model, the use of the 1PL model raised the false positive error rates. The influence of using the 2PL instead of the 3PL model depended on the item pool composition. When the item discrimination parameters varied greatly from uniformity of one, the use of the 2PL IRT model raised the false negative error rates to above the nominal level. In the simulated test conditions with test length and item exposure constraints, using a misfit model in CCTs most often affected the average test length. Its effects on error rates and classification accuracy were negligible. It was concluded that in CCTs using SPRT, IRT model selection and evaluation is indispensable (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - CONF T1 - Effects of test administration mode on item parameter estimates T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Yi, Q. A1 - Harris, D. J. A1 - Wang, T. A1 - Ban, J-C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 233 KB} ER - TY - CONF T1 - Evaluating a new approach to detect aberrant responses in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Lu, Y., A1 - Robin, F. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - stimation of Ability Level by Using Only Observable Quantities in Adaptive Testing. ER - TY - CONF T1 - Evaluating computer-based test security by generalized item overlap rates T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Zhang, J. A1 - Lu, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - CONF T1 - Evaluating computerized adaptive testing design for the MCAT with realistic simulated data T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Lu, Y., A1 - Pitoniak, M. A1 - Rizavi, S. A1 - Way, W. D. A1 - Steffan, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 985 KB} ER - TY - CONF T1 - Evaluating stability of online item calibrations under varying conditions T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Evaluating the comparability of English- and French-speaking examinees on a science achievement test administered using two-stage testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Puhan, G. A1 - Gierl, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - PDF file, 568 K ER - TY - CONF T1 - The evaluation of exposure control procedures for an operational CAT. T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 2003 A1 - French, B. F. A1 - Thompson, T. T. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - Chicago IL ER - TY - JOUR T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model JF - Journal of Applied Measurement Y1 - 2003 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. J. KW - *Computers KW - *Educational Measurement KW - *Models, Theoretical KW - Automation KW - Decision Making KW - Humans KW - Reproducibility of Results AB - The purpose of the present investigation was to systematically examine the effectiveness of the Sympson-Hetter technique and rotated content balancing relative to no exposure control and no content rotation conditions in a computerized adaptive testing system (CAT) based on the partial credit model. A series of simulated fixed and variable length CATs were run using two data sets generated to multiple content areas for three sizes of item pools. The 2 (exposure control) X 2 (content rotation) X 2 (test length) X 3 (item pool size) X 2 (data sets) yielded a total of 48 conditions. Results show that while both procedures can be used with no deleterious effect on measurement precision, the gains in exposure control, pool utilization, and item overlap appear quite modest. Difficulties involved with setting the exposure control parameters in small item pools make questionable the utility of the Sympson-Hetter technique with similar item pools. VL - 4 N1 - 1529-7713Journal Article ER - TY - CONF T1 - Exposure control using adaptive multi-stage item bundles T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Luecht, RM JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 116 KB} ER - TY - CONF T1 - Exposure control using adaptive multi-stage item bundles T2 - annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Luecht, RM JF - annual meeting of the National Council on Measurement in Education CY - Chicago, IL. USA ER - TY - JOUR T1 - The feasibility of applying item response theory to measures of migraine impact: a re-analysis of three clinical studies JF - Quality of Life Research Y1 - 2003 A1 - Bjorner, J. B. A1 - Kosinski, M. A1 - Ware, J. E., Jr. KW - *Sickness Impact Profile KW - Adolescent KW - Adult KW - Aged KW - Comparative Study KW - Cost of Illness KW - Factor Analysis, Statistical KW - Feasibility Studies KW - Female KW - Human KW - Male KW - Middle Aged KW - Migraine/*psychology KW - Models, Psychological KW - Psychometrics/instrumentation/*methods KW - Quality of Life/*psychology KW - Questionnaires KW - Support, Non-U.S. Gov't AB - BACKGROUND: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. OBJECTIVES: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact--the Migraine Specific Questionnaire (MSQ). METHODS: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. RESULTS: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. CONCLUSIONS: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing. VL - 12 N1 - 0962-9343Journal Article ER - TY - JOUR T1 - A feasibility study of on-the-fly item generation in adaptive testing JF - Journal of Technology, Learning, and Assessment Y1 - 2003 A1 - Bejar, I. I. A1 - Lawless, R. R., A1 - Morley, M. E., A1 - Wagner, M. E., A1 - Bennett R. E., A1 - Revuelta, J. VL - 2 IS - 3 N1 - {PDF file, 427 KB} ER - TY - CONF T1 - Implementing an alternative to Sympson-Hetter item-exposure control in constrained adaptive testing T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach JF - Applied Psychological Measurement Y1 - 2003 A1 - van der Linden, W. J. A1 - Chang, Hua-Hua VL - 27 ER - TY - CONF T1 - Implementing the a-stratified method with b blocking in computerized adaptive testing with the generalized partial credit model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Yi, Q. A1 - Wang, T. A1 - Wang, S JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - #YI03-01 {PDF file, 496 KB} ER - TY - JOUR T1 - Incorporation of Content Balancing Requirements in Stratification Designs for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2003 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. KW - computerized adaptive testing AB - Studied three stratification designs for computerized adaptive testing in conjunction with three well-developed content balancing methods. Simulation study results show substantial differences in item overlap rate and pool utilization among different methods. Recommends an optimal combination of stratification design and content balancing method. (SLD) VL - 63 ER - TY - JOUR T1 - Incorporation Of Content Balancing Requirements In Stratification Designs For Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2003 A1 - Leung, Chi-Keung A1 - Chang, Hua-Hua A1 - Hau, Kit-Tai AB -

In computerized adaptive testing, the multistage a-stratified design advocates a new philosophy on pool management and item selection in which, contradictory to common practice, less discriminating items are used first. The method is effective in reducing item-overlap rate and enhancing pool utilization. This stratification method has been extended in different ways to deal with the practical issues of content constraints and the positive correlation between item difficulty and discrimination. Nevertheless, these modified designs on their own do not automatically satisfy content requirements. In this study, three stratification designs were examined in conjunction with three well developed content balancing methods. The performance of each of these nine combinational methods was evaluated in terms of their item security, measurement efficiency, and pool utilization. Results showed substantial differences in item-overlap rate and pool utilization among different methods. An optimal combination of stratification design and content balancing method is recommended.

VL - 63 UR - http://epm.sagepub.com/content/63/2/257.abstract ER - TY - CONF T1 - Increasing the homogeneity of CAT’s item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Li, Y. H. A1 - Schafer, W. D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - (PDF file, 418 K ER - TY - CONF T1 - Information theoretic approaches to item selection T2 - Paper presented at the 13th international meeting of the Psychometric Society Y1 - 2003 A1 - Weissman, A. JF - Paper presented at the 13th international meeting of the Psychometric Society CY - Sardinia, Italy ER - TY - CONF T1 - Issues in maintaining scale consistency for the CAT-ASVAB T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Nicewander, W. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Item banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale JF - Quality of Life Research Y1 - 2003 A1 - Lai, J-S. A1 - Crane, P. K. A1 - Cella, D. A1 - Chang, C-H. A1 - Bode, R. K. A1 - Heinemann, A. W. KW - *Health Status Indicators KW - *Questionnaires KW - Adult KW - Fatigue/*diagnosis/etiology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/complications KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Research Support, U.S. Gov't, P.H.S. KW - Sickness Impact Profile AB - Fatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum. VL - 12 N1 - 0962-9343Journal Article ER - TY - JOUR T1 - Item exposure constraints for testlets in the verbal reasoning section of the MCAT JF - Applied Psychological Measurement Y1 - 2003 A1 - Davis, L. L. A1 - Dodd, B. G. KW - Adaptive Testing KW - Computer Assisted Testing KW - Entrance Examinations KW - Item Response Theory KW - Random Sampling KW - Reasoning KW - Verbal Ability computerized adaptive testing AB - The current study examined item exposure control procedures for testlet scored reading passages in the Verbal Reasoning section of the Medical College Admission Test with four computerized adaptive testing (CAT) systems using the partial credit model. The first system used a traditional CAT using maximum information item selection. The second used random item selection to provide a baseline for optimal exposure rates. The third used a variation of Lunz and Stahl's randomization procedure. The fourth used Luecht and Nungester's computerized adaptive sequential testing (CAST) system. A series of simulated fixed-length CATs was run to determine the optimal item length selection procedure. Results indicated that both the randomization procedure and CAST performed well in terms of exposure control and measurement precision, with the CAST system providing the best overall solution when all variables were taken into consideration. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - CONF T1 - Item pool design for computerized adaptive tests T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Reckase, M. D. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 135 KB} ER - TY - CHAP T1 - Item selection in polytomous CAT Y1 - 2003 A1 - Veldkamp, B. P. CY - H. Yanai, A. Okada, K. Shigemasu, Y Kano, and J. J. Meulman (eds.), New developments in psychometrics (pp. 207-214). Tokyo, Japan: Springer-Verlag. N1 - #VE03207 {PDF file, 79 KB} ER - TY - CHAP T1 - Item selection in polytomous CAT T2 - New developments in psychometrics Y1 - 2003 A1 - Veldkamp, B. P. ED - A. Okada ED - K. Shigenasu ED - Y. Kano ED - J. Meulman KW - computerized adaptive testing JF - New developments in psychometrics PB - Psychometric Society, Springer CY - Tokyo, Japan ER - TY - CONF T1 - Maintaining scale in computer adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Smith, R. L. A1 - Rizavi, S. A1 - Paez, R. A1 - Damiano, M. A1 - Herbert, E. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 367 KB} ER - TY - ABST T1 - A method to determine targets for multi-stage adaptive tests Y1 - 2003 A1 - Armstrong, R. D. A1 - Roussos, L. CY - Unpublished manuscript N1 - {PDF file, 207 KB} ER - TY - CONF T1 - Methods for item set selection in adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Lu, Y., A1 - Rizavi, S. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - PDF file, 443 K ER - TY - CONF T1 - Multidimensional computerized adaptive testing in recovering reading and mathematics abilities T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Li, Y. H. A1 - Schafer, W. D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago, IL N1 - {PDF file, 592 KB} ER - TY - ABST T1 - A multidimensional IRT mechanism for better understanding adaptive test behavior Y1 - 2003 A1 - Jodoin, M. CY - Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago IL ER - TY - CONF T1 - Online calibration and scale stability of a CAT program T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Guo, F. A1 - Wang, G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 274 KB} ER - TY - JOUR T1 - An optimal design approach to criterion-referenced computerized testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Wiberg, M. VL - 28 ER - TY - JOUR T1 - Optimal stratification of item pools in α-stratified computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - Chang, Hua-Hua A1 - van der Linden, W. J. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Content (Test) KW - Item Response Theory KW - Mathematical Modeling KW - Test Construction computerized adaptive testing AB - A method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - CONF T1 - Optimal testing with easy items in computerized adaptive testing T2 - Paper presented at the conference of the International Association for Educational Assessment Y1 - 2003 A1 - Theo Eggen A1 - Verschoor, A. JF - Paper presented at the conference of the International Association for Educational Assessment CY - Manchester UK ER - TY - CONF T1 - Predicting item exposure parameters in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Chen, S-Y. A1 - Doong, H. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 239 KB} ER - TY - JOUR T1 - Psychometric and Psychological Effects of Item Selection and Review on Computerized Testing JF - Educational and Psychological Measurement Y1 - 2003 A1 - Revuelta, Javier A1 - Ximénez, M. Carmen A1 - Olea, Julio AB -

Psychometric properties of computerized testing, together with anxiety and comfort of examinees, are investigated in relation to item selection routine and the opportunity for response review. Two different hypotheses involving examinee anxiety were used to design test properties: perceived control and perceived performance. The study involved three types of administration of a computerized English test for Spanish speakers (adaptive, easy adaptive, and fixed) and four review conditions (no review, review at end, review by blocks of 5 items, and review item-by-item). These were applied to a sample of 557 first-year psychology undergraduate students to examine main and interaction effects of test type and review on psychometric and psychological variables. Statistically significant effects were found in test precision among the different types of test. Response review improved ability estimates and increased testing time. No psychological effects on anxiety were found. Examinees in all review conditions considered more important the possibility of review than those who were not allowed to review. These results concur with previous findings on examinees' preference for item review and raise some issues that should be addressed in the field of tests with item review.

VL - 63 UR - http://epm.sagepub.com/content/63/5/791.abstract ER - TY - JOUR T1 - Psychometric and psychological effects of item selection and review on computerized testing JF - Educational and Psychological Measurement Y1 - 2003 A1 - Revuelta, J. A1 - Ximénez, M. C. A1 - Olea, J. VL - 63 ER - TY - JOUR T1 - Psychometric properties of several computer-based test designs with ideal and constrained item pools JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Jodoin, M. G. AB - The purpose of this study was to compare linear fixed length test (LFT), multi stage test (MST), and computer adaptive test (CAT) designs under three levels of item pool quality, two levels of match between test and item pool content specifications, two levels of test length, and several levels of exposure control expected to be practical for a number of testing programs. This design resulted in 132 conditions that were evaluated using a simulation study with 9000 examinees on several measures of overall measurement precision including reliability, the mean error and root mean squared error between true and estimated ability levels, classification precision including decision accuracy, false positive and false negative rates, and Kappa for cut scores corresponding to 30%, 50%, and 85% failure rates, and conditional measurement precision with the conditional root mean squared error between true and estimated ability levels conditioned on 25 true ability levels. Test reliability, overall and conditional measurement precision, and classification precision increased with item pool quality and test length, and decreased with less adequate match between item pool and test specification match. In addition, as the maximum exposure rate decreased and the type of exposure control implemented became more restrictive, test reliability, overall and conditional measurement precision, and classification precision decreased. Within item pool quality, match between test and item pool content specifications, test length, and exposure control, CAT designs showed superior psychometric properties as compared to MST designs which in turn were superior to LFT designs. However, some caution is warranted in interpreting these results since the ability of the automated test assembly software to construct test that met specifications was limited in conditions where pool usage was high. The practical importance of the differences between test designs on the evaluation criteria studied is discussed with respect to the inferences test users seek to make from test scores and nonpsychometric factors that may be important in some testing programs. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - CONF T1 - Recalibration of IRT item parameters in CAT: Sparse data matrices and missing data treatments T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Harmes, J. C. A1 - Parshall, C. G. A1 - Kromrey, J. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - (PDF file, 626 K ER - TY - JOUR T1 - The relationship between item exposure and test overlap in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chen, S. A1 - Ankenmann, R. D. A1 - Spray, J. A. VL - 40 ER - TY - JOUR T1 - The relationship between item exposure and test overlap in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chen, S. A1 - Ankenmann, R. D. A1 - Spray, J. A. VL - 40 ER - TY - JOUR T1 - The relationship between item exposure and test overlap in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chen, S-Y. A1 - Ankemann, R. D. A1 - Spray, J. A. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Human Computer KW - Interaction computerized adaptive testing KW - Item Analysis KW - Item Analysis (Test) KW - Test Items AB - The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA ) VL - 40 ER - TY - ABST T1 - A sequential Bayes procedure for item calibration in multi-stage testing Y1 - 2003 A1 - van der Linden, W. J. A1 - Alan D Mead CY - Manuscript in preparation ER - TY - CONF T1 - A simulation study to compare CAT strategies for cognitive diagnosis T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Xu, X. A1 - Chang, Hua-Hua A1 - Douglas, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 250 KB} ER - TY - JOUR T1 - Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates JF - Applied Psychological Measurement Y1 - 2003 A1 - Swaminathan, H. A1 - Hambleton, R. K. A1 - Sireci, S. G. A1 - Xing, D. A1 - Rizavi, S. M. AB - Large item banks with properly calibrated test items are essential for ensuring the validity of computer-based tests. At the same time, item calibrations with small samples are desirable to minimize the amount of pretesting and limit item exposure. Bayesian estimation procedures show considerable promise with small examinee samples. The purposes of the study were (a) to examine how prior information for Bayesian item parameter estimation can be specified and (b) to investigate the relationship between sample size and the specification of prior information on the accuracy of item parameter estimates. The results of the simulation study were clear: Estimation of item response theory (IRT) model item parameters can be improved considerably. Improvements in the one-parameter model were modest; considerable improvements with the two- and three-parameter models were observed. Both the study of different forms of priors and ways to improve the judgmental data used in forming the priors appear to be promising directions for future research. VL - 27 N1 - Sage Publications, US ER - TY - JOUR T1 - Some alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 2003 A1 - van der Linden, W. J. KW - Adaptive Testing KW - Computer Assisted Testing KW - Test Items computerized adaptive testing AB - TheHetter and Sympson (1997; 1985) method is a method of probabilistic item-exposure control in computerized adaptive testing. Setting its control parameters to admissible values requires an iterative process of computer simulations that has been found to be time consuming, particularly if the parameters have to be set conditional on a realistic set of values for the examinees’ ability parameter. Formal properties of the method are identified that help us explain why this iterative process can be slow and does not guarantee admissibility. In addition, some alternatives to the SH method are introduced. The behavior of these alternatives was estimated for an adaptive test from an item pool from the Law School Admission Test (LSAT). Two of the alternatives showed attractive behavior and converged smoothly to admissibility for all items in a relatively small number of iteration steps. VL - 28 ER - TY - CONF T1 - Standard-setting issues in computerized-adaptive testing T2 - Paper Prepared for Presentation at the Annual Conference of the Canadian Society for Studies in Education Y1 - 2003 A1 - Gushta, M. M. JF - Paper Prepared for Presentation at the Annual Conference of the Canadian Society for Studies in Education CY - Halifax, Nova Scotia, May 30th, 2003 ER - TY - JOUR T1 - Statistical detection and estimation of differential item functioning in computerized adaptive testing JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Feng, X. AB - Differential item functioning (DIF) is an important issue in large scale standardized testing. DIF refers to the unexpected difference in item performances among groups of equally proficient examinees, usually classified by ethnicity or gender. Its presence could seriously affect the validity of inferences drawn from a test. Various statistical methods have been proposed to detect and estimate DIF. This dissertation addresses DIF analysis in the context of computerized adaptive testing (CAT), whose item selection algorithm adapts to the ability level of each individual examinee. In a CAT, a DIF item may be more consequential and more detrimental be cause fewer items are administered in a CAT than in a traditional paper-and-pencil test and because the remaining sequence of items presented to examinees depends in part on their responses to the DIF item. Consequently, an efficient, stable and flexible method to detect and estimate CAT DIF becomes necessary and increasingly important. We propose simultaneous implementations of online calibration and DIF testing. The idea is to perform online calibration of an item of interest separately in the focal and reference groups. Under any specific parametric IRT model, we can use the (online) estimated latent traits as covariates and fit a nonlinear regression model to each of the two groups. Because of the use of the estimated, not the true , the regression fit has to adjust for the covariate "measurement errors". It turns out that this situation fits nicely into the framework of nonlinear error-in-variable modelling, which has been extensively studied in statistical literature. We develop two bias-correction methods using asymptotic expansion and conditional score theory. After correcting the bias caused by measurement error, one can perform a significance test to detect DIF with the parameter estimates for different groups. This dissertation also discusses some general techniques to handle measurement error modelling with different IRT models, including the three-parameter normal ogive model and polytomous response models. Several methods of estimating DIF are studied as well. Large sample properties are established to justify the proposed methods. Extensive simulation studies show that the resulting methods perform well in terms of Type-I error rate control, accuracy in estimating DIF and power against both unidirectional and crossing DIF. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - CONF T1 - Strategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Davis, L. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - PDF file, 620 K ER - TY - JOUR T1 - Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Davis, L. L. AB - Choosing a strategy for controlling the exposure of items to examinees has become an integral part of test development for computerized adaptive testing (CAT). Item exposure can be controlled through the use of a variety of algorithms which modify the CAT item selection process. This may be done through a randomization, conditional selection, or stratification approach. The effectiveness of each procedure as well as the degree to which measurement precision is sacrificed has been extensively studied with dichotomously scored item pools. However, only recently have researchers begun to examine these procedures in polytomously scored item pools. The current study investigated the performance of six different exposure control mechanisms under three polytomous IRT models in terms of measurement precision, test security, and ease of implementation. The three models examined in the current study were the partial credit, generalized partial credit, and graded response models. In addition to a no exposure control baseline condition, the randomesque, within .10 logits, Sympson-Hetter, conditional Sympson-Hetter, a-Stratified, and enhanced a-Stratified procedures were implemented to control item exposure rates. The a-Stratified and enhanced a-Stratified procedures were not evaluated with the partial credit model. Two variations of the randomesque and within .10 logits procedures were also examined which varied the size of the item group from which the next item to be administered was randomly selected. The results of this study were remarkably similar for all three models and indicated that the randomesque and within .10 logits procedures, when implemented with the six item group variation, provide the best option for controlling exposure rates when impact to measurement precision and ease of implementation are considered. The three item group variations of the procedures were, however, ineffective in controlling exposure, overlap, and pool utilization rates to desired levels. The Sympson-Hetter and conditional Sympson-Hetter procedures were difficult and time consuming to implement, and while they did control exposure rates to the target level, their performance in terms of item overlap (for the Sympson-Hetter) and pool utilization were disappointing. The a-Stratified and enhanced a-Stratified procedures both turned in surprisingly poor performances across all variables. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - BOOK T1 - Strategies for controlling testlet exposure rates in computerized adaptive testing systems Y1 - 2003 A1 - Boyd, A. M CY - Unpublished Ph.D. Dissertation, The University of Texas at Austin. N1 - {PDF file, 485 KB} ER - TY - JOUR T1 - Student modeling and ab initio language learning JF - System Y1 - 2003 A1 - Heift, T. A1 - Schulze, M. AB - Provides examples of student modeling techniques that have been employed in computer-assisted language learning over the past decade. Describes two systems for learning German: "German Tutor" and "Geroline." Shows how a student model can support computerized adaptive language testing for diagnostic purposes in a Web-based language learning environment that does not rely on parsing technology. (Author/VWL) VL - 31 ER - TY - JOUR T1 - A study of the feasibility of Internet administration of a computerized health survey: The Headache Impact Test (HIT) JF - Quality of Life Research Y1 - 2003 A1 - Bayliss, M.S. A1 - Dewey, J.E. A1 - Dunlap, I A1 - et. al. VL - 12 ER - TY - JOUR T1 - Ten recommendations for advancing patient-centered outcomes measurement for older persons JF - Annals of Internal Medicine Y1 - 2003 A1 - McHorney, C. A. KW - *Health Status Indicators KW - Aged KW - Geriatric Assessment/*methods KW - Humans KW - Patient-Centered Care/*methods KW - Research Support, U.S. Gov't, Non-P.H.S. AB - The past 50 years have seen great progress in the measurement of patient-based outcomes for older populations. Most of the measures now used were created under the umbrella of a set of assumptions and procedures known as classical test theory. A recent alternative for health status assessment is item response theory. Item response theory is superior to classical test theory because it can eliminate test dependency and achieve more precise measurement through computerized adaptive testing. Computerized adaptive testing reduces test administration times and allows varied and precise estimates of ability. Several key challenges must be met before computerized adaptive testing becomes a productive reality. I discuss these challenges for the health assessment of older persons in the form of 10 "Ds": things we need to deliberate, debate, decide, and do. VL - 139 N1 - 1539-3704Journal ArticleReview ER - TY - CONF T1 - Test information targeting strategies for adaptive multistage testlet designs T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Luecht, RM A1 - Burgin, W. L. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - PDF file, 179 K ER - TY - ABST T1 - Tests adaptativos informatizados (Computerized adaptive testing) Y1 - 2003 A1 - Olea, J. A1 - Ponsoda, V. CY - Madrid: UNED Ediciones N1 - [In Spanish] ER - TY - CONF T1 - Test-score comparability, ability estimation, and item-exposure control in computerized adaptive testing T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Timing behavior in computerized adaptive testing: Response times for correct and incorrect answers are not related to general fluid intelligence/Zum Zeitverhalten beim computergestützten adaptiveb Testen: Antwortlatenzen bei richtigen und falschen Lösun JF - Zeitschrift für Differentielle und Diagnostische Psychologie Y1 - 2003 A1 - Rammsayer, Thomas A1 - Brandler, Susanne KW - Adaptive Testing KW - Cognitive Ability KW - Intelligence KW - Perception KW - Reaction Time computerized adaptive testing AB - Examined the effects of general fluid intelligence on item response times for correct and false responses in computerized adaptive testing. After performing the CFT3 intelligence test, 80 individuals (aged 17-44 yrs) completed perceptual and cognitive discrimination tasks. Results show that response times were related neither to the proficiency dimension reflected by the task nor to the individual level of fluid intelligence. Furthermore, the false > correct-phenomenon as well as substantial positive correlations between item response times for false and correct responses were shown to be independent of intelligence levels. (PsycINFO Database Record (c) 2005 APA ) VL - 24 ER - TY - CONF T1 - To stratify or not: An investigation of CAT item selection procedures under practical constraints T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Deng, H. A1 - Ansley, T. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 186 KB} ER - TY - ABST T1 - Using moving averages to assess test and item security in computer-based testing (Center for Educational Assessment Research Report No 468) Y1 - 2003 A1 - Han, N. CY - Amherst, MA: University of Massachusetts, School of Education. ER - TY - JOUR T1 - Using response times to detect aberrant responses in computerized adaptive testing JF - Psychometrika Y1 - 2003 A1 - van der Linden, W. J. A1 - van Krimpen-Stoop, E. M. L. A. KW - Adaptive Testing KW - Behavior KW - Computer Assisted Testing KW - computerized adaptive testing KW - Models KW - person Fit KW - Prediction KW - Reaction Time AB - A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered. VL - 68 ER - TY - CONF T1 - Accuracy of the ability estimate and the item exposure rate under multidimensional adaptive testing with item constraints T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Li, Y. H. A1 - Yu, N. Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - #LI02-01 ER - TY - ABST T1 - Adaptive testing without IRT in the presence of multidimensionality (Research Report 02-09) Y1 - 2002 A1 - Yan, D. A1 - Lewis, C. A1 - Stocking, M. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Advances in quality of life measurements in oncology patients JF - Seminars in Oncology Y1 - 2002 A1 - Cella, D. A1 - Chang, C-H. A1 - Lai, J. S. A1 - Webster, K. KW - *Quality of Life KW - *Sickness Impact Profile KW - Cross-Cultural Comparison KW - Culture KW - Humans KW - Language KW - Neoplasms/*physiopathology KW - Questionnaires AB - Accurate assessment of the quality of life (QOL) of patients can provide important clinical information to physicians, especially in the area of oncology. Changes in QOL are important indicators of the impact of a new cytotoxic therapy, can affect a patient's willingness to continue treatment, and may aid in defining response in the absence of quantifiable endpoints such as tumor regression. Because QOL is becoming an increasingly important aspect in the management of patients with malignant disease, it is vital that the instruments used to measure QOL are reliable and accurate. Assessment of QOL involves a multidimensional approach that includes physical, functional, social, and emotional well-being, and the most comprehensive instruments measure at least three of these domains. Instruments to measure QOL can be generic (eg, the Nottingham Health Profile), targeted toward specific illnesses (eg, Functional Assessment of Cancer Therapy - Lung), or be a combination of generic and targeted. Two of the most widely used examples of the combination, or hybrid, instruments are the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 Items and the Functional Assessment of Chronic Illness Therapy. A consequence of the increasing international collaboration in clinical trials has been the growing necessity for instruments that are valid across languages and cultures. To assure the continuing reliability and validity of QOL instruments in this regard, item response theory can be applied. Techniques such as item response theory may be used in the future to construct QOL item banks containing large sets of validated questions that represent various levels of QOL domains. As QOL becomes increasingly important in understanding and approaching the overall management of cancer patients, the tools available to clinicians and researchers to assess QOL will continue to evolve. While the instruments currently available provide reliable and valid measurement, further improvements in precision and application are anticipated. VL - 29 N1 - 0093-7754 (Print)Journal ArticleReview ER - TY - JOUR T1 - Applicable adaptive testing models for school teachers JF - Educational Media International Y1 - 2002 A1 - Chang-Hwa, W. A. A1 - Chuang, C-L. AB - The purpose of this study was to investigate the attitudinal effects on SPRT adaptive testing environment for junior high school students. Subjects were 39 eighth graders from a selected junior high school. Major instruments for the study were the Junior High School Natural Sciences Adaptive Testing System driven by the SPRT algorithm, and a self-developed attitudinal questionnaire, factors examined include: test anxiety, examinee preference, adaptability of the test, and acceptance of the test result. The major findings were that overall, junior high school students" attitudes towards computerized adaptive tests were positive, no significant correlations existed between test attitude and the test length. The results indicated that junior high school students generally have positive attitudes towards adaptive testing.Modèles de tests d"adaptation à l"usage des enseignants. L"objectif de cette étude était d"enquêter sur les effets causés par une passation de tests d"adaptation ( selon l"algorithme "Sequential Probability Radio Test " (SPRT) ) dans une classe de trente-neuf élèves de huitième année du secondaire inférieur. Les principaux instruments utilisés ont été ceux du système de tests d"adaptation (avec le SPRT) et destiné aux classes de sciences naturelles du degré secondaire inférieur. Un questionnaire d"attitude, développé par nos soins, a également été utilisé pour examiner les facteurs suivants: test d"anxiété, préférence des candidats, adaptabilité du test et acceptation des résultats. Les principales conclusions ont été que, dans l"ensemble, l"attitude des élèves du secondaire inférieur face aux tests d"adaptation informatisés a été positive, aucune corrélation significative existant entre cette attitude et la longueur des tests. Les résultats démontrent aussi que les élèves du secondaire ont une attitude généralement positive envers les tests d"adaptation.Test Modelle zur Anwendung durch Klassenlehrer Zweck dieser Untersuchung war, die Auswirkungen über die Einstellung von Jun. High School Schülern im Zusammenhang mit dem SPRT Testumfeld zu untersuchen. 39 Achtklässler einer Jun. High School nahmen an dem Test teil. Die Untersuchung stützte sich hauptsächlich auf das Jun. High School Natural. Sciences Adaptive Testing System, das auf dem SPRT Rechnungsverfahren basiert sowie einem selbst erstellten Fragebogen mit folgenden Faktoren: Testängste, Präferenzen der Testperson, Geeignetheit des Tests, Anerkennung des Testergebnisses. Es stellte sich heraus, dass die Einstellung der Studenten zu den Computer adaptierten Tests im allgemeinen positiv waren; es ergaben sich keine bedeutsamen Wechselwirkungen zwischen persönlicher Testeinstellung und Testlänge. Die Ergebnisse belegen, dass Jun. High School Schüler im allgemeinen eine positive Haltung zu adaptierten Tests haben. VL - 39 ER - TY - JOUR T1 - Application of an empirical Bayes enhancement of Mantel-Haenszel differential item functioning analysis to a computerized adaptive test JF - Applied Psychological Measurement Y1 - 2002 A1 - Zwick, R. A1 - Thayer, D. T. VL - 26 ER - TY - BOOK T1 - Assessing the efficiency of item selection in computerized adaptive testing Y1 - 2002 A1 - Weissman, A. CY - Unpublished doctoral dissertation, University of Pittsburgh. ER - TY - JOUR T1 - Assessing tobacco beliefs among youth using item response theory models JF - Drug and Alcohol Dependence Y1 - 2002 A1 - Panter, A. T. A1 - Reeve, B. B. KW - *Attitude to Health KW - *Culture KW - *Health Behavior KW - *Questionnaires KW - Adolescent KW - Adult KW - Child KW - Female KW - Humans KW - Male KW - Models, Statistical KW - Smoking/*epidemiology AB - Successful intervention research programs to prevent adolescent smoking require well-chosen, psychometrically sound instruments for assessing smoking prevalence and attitudes. Twelve thousand eight hundred and ten adolescents were surveyed about their smoking beliefs as part of the Teenage Attitudes and Practices Survey project, a prospective cohort study of predictors of smoking initiation among US adolescents. Item response theory (IRT) methods are used to frame a discussion of questions that a researcher might ask when selecting an optimal item set. IRT methods are especially useful for choosing items during instrument development, trait scoring, evaluating item functioning across groups, and creating optimal item subsets for use in specialized applications such as computerized adaptive testing. Data analytic steps for IRT modeling are reviewed for evaluating item quality and differential item functioning across subgroups of gender, age, and smoking status. Implications and challenges in the use of these methods for tobacco onset research and for assessing the developmental trajectories of smoking among youth are discussed. VL - 68 N1 - 0376-8716Journal Article ER - TY - JOUR T1 - Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? JF - Journal of Educational Measurement Y1 - 2002 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. A1 - Hendrickson, A. B. A1 - Ihrig, D. VL - 39 ER - TY - CONF T1 - Comparing three item selection approaches for computerized adaptive testing with content balancing requirement T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 226 KB} ER - TY - CONF T1 - A comparison of computer mastery models when pool characteristics vary T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Smith, R. L. A1 - Lewis, C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 692 KB} ER - TY - JOUR T1 - A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model JF - Applied Psychological Measurement Y1 - 2002 A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chang, Hua-Hua KW - (Statistical) KW - Adaptive Testing KW - Algorithms computerized adaptive testing KW - Computer Assisted Testing KW - Item Analysis KW - Item Response Theory KW - Mathematical Modeling AB - The use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 26 ER - TY - JOUR T1 - A comparison of non-deterministic procedures for the adaptive assessment of knowledge JF - Psychologische Beitrge Y1 - 2002 A1 - Hockemeyer, C. VL - 44 ER - TY - CONF T1 - Comparison of the psychometric properties of several computer-based test designs for credentialing exams T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Jodoin, M. A1 - Zenisky, A. L. A1 - Hambleton, R. K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 261 KB} ER - TY - JOUR T1 - Computer adaptive testing: The impact of test characteristics on perceived performance and test takers' reactions JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 2002 A1 - Tonidandel, S. KW - computerized adaptive testing AB - This study examined the relationship between characteristics of adaptive testing and test takers' subsequent reactions to the test. Participants took a computer adaptive test in which two features, the difficulty of the initial item and the difficulty of subsequent items, were manipulated. These two features of adaptive testing determined the number of items answered correctly by examinees and their subsequent reactions to the test. The data show that the relationship between test characteristics and reactions was fully mediated by perceived performance on the test. In addition, the impact of feedback on reactions to adaptive testing was also evaluated. In general, feedback that was consistent with perceptions of performance had a positive impact on reactions to the test. Implications for adaptive test design concerning maximizing test takers' reactions are discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 62 ER - TY - JOUR T1 - Computer-adaptive testing: The impact of test characteristics on perceived performance and test takers’ reactions JF - Journal of Applied Psychology Y1 - 2002 A1 - Tonidandel, S. A1 - Quiñones, M. A. A1 - Adams, A. A. VL - 87 ER - TY - JOUR T1 - Computerised adaptive testing JF - British Journal of Educational Technology Y1 - 2002 A1 - Latu, E. A1 - Chapman, E. KW - computerized adaptive testing AB - Considers the potential of computer adaptive testing (CAT). Discusses the use of CAT instead of traditional paper and pencil tests, identifies decisions that impact the efficacy of CAT, and concludes that CAT is beneficial when used to its full potential on certain types of tests. (LRW) VL - 33 ER - TY - CONF T1 - Confirmatory item factor analysis using Markov chain Monte Carlo estimation with applications to online calibration in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Segall, D. O. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, LA ER - TY - ABST T1 - Constraining item exposure in computerized adaptive testing with shadow tests (Research Report No. 02-06) Y1 - 2002 A1 - van der Linden, W. J. A1 - Veldkamp, B. P. CY - University of Twente, The Netherlands ER - TY - CONF T1 - Content-stratified random item selection in computerized classification testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Guille, R. Lipner, R. S. A1 - Norcini, J. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - #GU02-01 ER - TY - CHAP T1 - Controlling item exposure and maintaining item security Y1 - 2002 A1 - Davey, T. A1 - Nering, M. CY - C. N. Mills, M. T. Potenza, and J. J. Fremer (Eds.), Computer-Based Testing: Building the Foundation for Future Assessments (pp. 165-191). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. ER - TY - JOUR T1 - Data sparseness and on-line pretest item calibration-scaling methods in CAT JF - Journal of Educational Measurement Y1 - 2002 A1 - Ban, J-C. A1 - Hanson, B. A. A1 - Yi, Q. A1 - Harris, D. J. KW - Computer Assisted Testing KW - Educational Measurement KW - Item Response Theory KW - Maximum Likelihood KW - Methodology KW - Scaling (Testing) KW - Statistical Data AB - Compared and evaluated 3 on-line pretest item calibration-scaling methods (the marginal maximum likelihood estimate with 1 expectation maximization [EM] cycle [OEM] method, the marginal maximum likelihood estimate with multiple EM cycles [MEM] method, and M. L. Stocking's Method B) in terms of item parameter recovery when the item responses to the pretest items in the pool are sparse. Simulations of computerized adaptive tests were used to evaluate the results yielded by the three methods. The MEM method produced the smallest average total error in parameter estimation, and the OEM method yielded the largest total error (PsycINFO Database Record (c) 2005 APA ) VL - 39 ER - TY - JOUR T1 - Detection of person misfit in computerized adaptive tests with polytomous items JF - Applied Psychological Measurement Y1 - 2002 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. AB - Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. For a computerized adaptive test (CAT) using dichotomous items, several person-fit statistics for detecting mis.tting item score patterns have been proposed. Both for paper-and-pencil (P&P) tests and CATs, detection ofperson mis.t with polytomous items is hardly explored. In this study, the nominal and empirical null distributions ofthe standardized log-likelihood statistic for polytomous items are compared both for P&P tests and CATs. Results showed that the empirical distribution of this statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Second, a new person-fit statistic based on the cumulative sum (CUSUM) procedure from statistical process control was proposed. By means ofsimulated data, critical values were determined that can be used to classify a pattern as fitting or misfitting. The effectiveness of the CUSUM to detect simulees with item preknowledge was investigated. Detection rates using the CUSUM were high for realistic numbers ofdisclosed items. VL - 26 ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement Y1 - 2002 A1 - Thissen, D. JF - Paper presented at the conference “Advances in Health Outcomes Measurement CY - ” Bethesda, Maryland, June 23-25 N1 - {PDF file, 170 KB} ER - TY - CONF T1 - The development and evaluation of a computer-adaptive testing application for English language T2 - Paper presented at the 2002 Computer-Assisted Testing Conference Y1 - 2002 A1 - Lilley, M A1 - Barker, T JF - Paper presented at the 2002 Computer-Assisted Testing Conference CY - United Kingdom N1 - {PDF file, 308 KB} ER - TY - JOUR T1 - Development of an index of physical functional health status in rehabilitation JF - Archives of Physical Medicine and Rehabilitation Y1 - 2002 A1 - Hart, D. L. A1 - Wright, B. D. KW - *Health Status Indicators KW - *Rehabilitation Centers KW - Adolescent KW - Adult KW - Aged KW - Aged, 80 and over KW - Female KW - Health Surveys KW - Humans KW - Male KW - Middle Aged KW - Musculoskeletal Diseases/*physiopathology/*rehabilitation KW - Nervous System Diseases/*physiopathology/*rehabilitation KW - Physical Fitness/*physiology KW - Recovery of Function/physiology KW - Reproducibility of Results KW - Retrospective Studies AB - OBJECTIVE: To describe (1) the development of an index of physical functional health status (FHS) and (2) its hierarchical structure, unidimensionality, reproducibility of item calibrations, and practical application. DESIGN: Rasch analysis of existing data sets. SETTING: A total of 715 acute, orthopedic outpatient centers and 62 long-term care facilities in 41 states participating with Focus On Therapeutic Outcomes, Inc. PATIENTS: A convenience sample of 92,343 patients (40% male; mean age +/- standard deviation [SD], 48+/-17y; range, 14-99y) seeking rehabilitation between 1993 and 1999. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Patients completed self-report health status surveys at admission and discharge. The Medical Outcomes Study 36-Item Short-Form Health Survey's physical functioning scale (PF-10) is the foundation of the physical FHS. The Oswestry Low Back Pain Disability Questionnaire, Neck Disability Index, Lysholm Knee Questionnaire, items pertinent to patients with upper-extremity impairments, and items pertinent to patients with more involved neuromusculoskeletal impairments were cocalibrated into the PF-10. RESULTS: The final FHS item bank contained 36 items (patient separation, 2.3; root mean square measurement error, 5.9; mean square +/- SD infit, 0.9+/-0.5; outfit, 0.9+/-0.9). Analyses supported empirical item hierarchy, unidimensionality, reproducibility of item calibrations, and content and construct validity of the FHS-36. CONCLUSIONS: Results support the reliability and validity of FHS-36 measures in the present sample. Analyses show the potential for a dynamic, computer-controlled, adaptive survey for FHS assessment applicable for group analysis and clinical decision making for individual patients. VL - 83 N1 - 0003-9993 (Print)Journal Article ER - TY - CONF T1 - The Development of STAR Early Literacy T2 - Presentation to the 32rd Annual National Conference on Large-Scale Assessment. Y1 - 2002 A1 - J. R. McBride JF - Presentation to the 32rd Annual National Conference on Large-Scale Assessment. CY - Desert Springs CA ER - TY - THES T1 - DEVELOPMENT, RELIABILITY, AND VALIDITY OF A COMPUTERIZED ADAPTIVE VERSION OF THE SCHEDULE FOR NONADAPTIVE AND ADAPTIVE PERSONALITY Y1 - 2002 A1 - Simms, L. J. CY - Unpublished Ph. D. dissertation, University of Iowa, Iowa City Iowa ER - TY - JOUR T1 - The effect of test characteristics on aberrant response patterns in computer adaptive testing JF - Dissertation Abstracts International Section A: Humanities & Social Sciences Y1 - 2002 A1 - Rizavi, S. M. KW - computerized adaptive testing AB - The advantages that computer adaptive testing offers over linear tests have been well documented. The Computer Adaptive Test (CAT) design is more efficient than the Linear test design as fewer items are needed to estimate an examinee's proficiency to a desired level of precision. In the ideal situation, a CAT will result in examinees answering different number of items according to the stopping rule employed. Unfortunately, the realities of testing conditions have necessitated the imposition of time and minimum test length limits on CATs. Such constraints might place a burden on the CAT test taker resulting in aberrant response behaviors by some examinees. Occurrence of such response patterns results in inaccurate estimation of examinee proficiency levels. This study examined the effects of test lengths, time limits and the interaction of these factors with the examinee proficiency levels on the occurrence of aberrant response patterns. The focus of the study was on the aberrant behaviors caused by rushed guessing due to restrictive time limits. Four different testing scenarios were examined; fixed length performance tests with and without content constraints, fixed length mastery tests and variable length mastery tests without content constraints. For each of these testing scenarios, the effect of two test lengths, five different timing conditions and the interaction between these factors with three ability levels on ability estimation were examined. For fixed and variable length mastery tests, decision accuracy was also looked at in addition to the estimation accuracy. Several indices were used to evaluate the estimation and decision accuracy for different testing conditions. The results showed that changing time limits had a significant impact on the occurrence of aberrant response patterns conditional on ability. Increasing test length had negligible if not negative effect on ability estimation when rushed guessing occured. In case of performance testing high ability examinees while in classification testing middle ability examinees suffered the most. The decision accuracy was considerably affected in case of variable length classification tests. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 62 ER - TY - JOUR T1 - An EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model JF - Applied Psychological Measurement Y1 - 2002 A1 - Stark, S. A1 - F Drasgow KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Response Theory KW - Maximum Likelihood KW - Personnel Evaluation KW - Statistical Correlation KW - Statistical Estimation AB - Borman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. VL - 26 ER - TY - CONF T1 - An empirical comparison of achievement level estimates from adaptive tests and paper-and-pencil tests T2 - annual meeting of the American Educational Research Association Y1 - 2002 A1 - Kingsbury, G. G. KW - computerized adaptive testing JF - annual meeting of the American Educational Research Association CY - New Orleans, LA. USA ER - TY - CONF T1 - An empirical comparison of achievement level estimates from adaptive tests and paper-and-pencil tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Kingsbury, G. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 134 KB} ER - TY - ABST T1 - An empirical investigation of selected multi-stage testing design variables on test assembly and decision accuracy outcomes for credentialing exams (Center for Educational Assessment Research Report No 469) Y1 - 2002 A1 - Zenisky, A. L. CY - Amherst, MA: University of Massachusetts, School of Education. ER - TY - CONF T1 - Employing new ideas in CAT to a simulated reading test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Thompson, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 216 KB} ER - TY - JOUR T1 - Étude de la distribution d'échantillonnage de l'estimateur du niveau d'habileté en testing adaptatif en fonction de deux règles d'arrêt dans le contexte de l'application du modèle de Rasch [Study of the sampling distribution of the proficiecy estima JF - Mesure et évaluation en éducation Y1 - 2002 A1 - Raîche, G. A1 - Blais, J-G. VL - 24(2-3) N1 - (In French) ER - TY - JOUR T1 - Evaluation of selection procedures for computerized adaptive testing with polytomous items JF - Applied Psychological Measurement Y1 - 2002 A1 - van Rijn, P. W. A1 - Theo Eggen A1 - Hemker, B. T. A1 - Sanders, P. F. KW - computerized adaptive testing AB - In the present study, a procedure that has been used to select dichotomous items in computerized adaptive testing was applied to polytomous items. This procedure was designed to select the item with maximum weighted information. In a simulation study, the item information function was integrated over a fixed interval of ability values and the item with the maximum area was selected. This maximum interval information item selection procedure was compared to a maximum point information item selection procedure. Substantial differences between the two item selection procedures were not found when computerized adaptive tests were evaluated on bias and the root mean square of the ability estimate. VL - 26 N1 - References .Sage Publications, US ER - TY - JOUR T1 - Evaluation of selection procedures for computerized adaptive testing with polytomous items JF - Applied Psychological Measurement Y1 - 2002 A1 - van Rijn, P. W. A1 - Theo Eggen A1 - Hemker, B. T. A1 - Sanders, P. F. VL - 26 ER - TY - CONF T1 - An examination of decision-theory adaptive testing procedures T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Rudner, L. M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA N1 - {PDF file, 46 KB} ER - TY - ABST T1 - An exploration of potentially problematic adaptive tests Y1 - 2002 A1 - Stocking, M. A1 - Steffen, M. Golub-Smith, M. L. A1 - Eignor, D. R. CY - Research Report 02-05) N1 - Princeton NJ: Educational Testing Service. ER - TY - CONF T1 - Fairness issues in adaptive tests with strict time limits T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Bridgeman, B. A1 - Cline, F. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 1.287 MB} ER - TY - JOUR T1 - Feasibility and acceptability of computerized adaptive testing (CAT) for fatigue monitoring in clinical practice JF - Quality of Life Research Y1 - 2002 A1 - Davis, K. M. A1 - Chang, C-H. A1 - Lai, J-S. A1 - Cella, D. VL - 11(7) ER - TY - ABST T1 - A feasibility study of on-the-fly item generation in adaptive testing (GRE Board Report No 98-12) Y1 - 2002 A1 - Bejar, I. I. A1 - Lawless, R. R A1 - Morley, M. E A1 - Wagner, M. E. A1 - Bennett, R. E. A1 - Revuelta, J. CY - Educational Testing Service RR02-23. Princeton NJ: Educational Testing Service. Note = “{PDF file, 193 KB} ER - TY - CONF T1 - A further study on adjusting CAT item selection starting point for individual examinees T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Fan, M. A1 - Zhu. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - #FA02-01 ER - TY - CHAP T1 - Generating abstract reasoning items with cognitive theory T2 - Item generation for test development Y1 - 2002 A1 - Embretson, S. E. ED - P. Kyllomen KW - Cognitive Processes KW - Measurement KW - Reasoning KW - Test Construction KW - Test Items KW - Test Validity KW - Theories AB - (From the chapter) Developed and evaluated a generative system for abstract reasoning items based on cognitive theory. The cognitive design system approach was applied to generate matrix completion problems. Study 1 involved developing the cognitive theory with 191 college students who were administered Set I and Set II of the Advanced Progressive Matrices. Study 2 examined item generation by cognitive theory. Study 3 explored the psychometric properties and construct representation of abstract reasoning test items with 728 young adults. Five structurally equivalent forms of Abstract Reasoning Test (ART) items were prepared from the generated item bank and administered to the Ss. In Study 4, the nomothetic span of construct validity of the generated items was examined with 728 young adults who were administered ART items, and 217 young adults who were administered ART items and the Advanced Progressive Matrices. Results indicate the matrix completion items were effectively generated by the cognitive design system approach. (PsycINFO Database Record (c) 2005 APA ) JF - Item generation for test development PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingItem generation for test development. (pp. 219-250). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xxxii, 412 pp ER - TY - CONF T1 - Historique et concepts propres au testing adaptatif [Adaptive testing: Historical accounts and concepts] T2 - Presented at the 69th Congress of the Acfas. Sherbrooke: Association canadienne française pour l’avancement des sciences (Acfas). [In French] Y1 - 2002 A1 - Blais, J. G. JF - Presented at the 69th Congress of the Acfas. Sherbrooke: Association canadienne française pour l’avancement des sciences (Acfas). [In French] ER - TY - JOUR T1 - Hypergeometric family and item overlap rates in computerized adaptive testing JF - Psychometrika Y1 - 2002 A1 - Chang, Hua-Hua A1 - Zhang, J. KW - Adaptive Testing KW - Algorithms KW - Computer Assisted Testing KW - Taking KW - Test KW - Time On Task computerized adaptive testing AB - A computerized adaptive test (CAT) is usually administered to small groups of examinees at frequent time intervals. It is often the case that examinees who take the test earlier share information with examinees who will take the test later, thus increasing the risk that many items may become known. Item overlap rate for a group of examinees refers to the number of overlapping items encountered by these examinees divided by the test length. For a specific item pool, different item selection algorithms may yield different item overlap rates. An important issue in designing a good CAT item selection algorithm is to keep item overlap rate below a preset level. In doing so, it is important to investigate what the lowest rate could be for all possible item selection algorithms. In this paper we rigorously prove that if every item had an equal possibility to be selected from the pool in a fixed-length CAT, the number of overlapping item among any α randomly sampled examinees follows the hypergeometric distribution family for α ≥ 1. Thus, the expected values of the number of overlapping items among any randomly sampled α examinee can be calculated precisely. These values may serve as benchmarks in controlling item overlap rates for fixed-length adaptive tests. (PsycINFO Database Record (c) 2005 APA ) VL - 67 ER - TY - CONF T1 - Identify the lower bounds for item sharing and item pooling in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Chang, Hua-Hua A1 - Zhang, J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - CONF T1 - Impact of item quality and item bank size on the psychometric quality of computer-based credentialing exams T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Hambleton, R. K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - CONF T1 - Impact of selected factors on the psychometric quality of credentialing examinations administered with a sequential testlet design T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Hambleton, R. K. A1 - Jodoin, M. A1 - Zenisky, A. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - CONF T1 - Impact of test design, item quality and item bank size on the psychometric properties of computer-based credentialing exams T2 - Paper presented at the meeting of National Council on Measurement in Education Y1 - 2002 A1 - Xing, D. A1 - Hambleton, R. K. JF - Paper presented at the meeting of National Council on Measurement in Education CY - New Orleans N1 - PDF file, 500 K ER - TY - JOUR T1 - The implications of the use of non-optimal items in a Computer Adaptive Testing (CAT) environment JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 2002 A1 - Grodenchik, D. J. KW - computerized adaptive testing AB - This study describes the effects of manipulating item difficulty in a computer adaptive testing (CAT) environment. There are many potential benefits when using CATS as compared to traditional tests. These include increased security, shorter tests, and more precise measurement. According to IRT, the theory underlying CAT, as the computer continually recalculates ability, items that match that current estimate of ability are administered. Such items provide maximum information about examinees during the test. Herein, however, lies a potential problem. These optimal CAT items result in an examinee having only a 50% chance of a correct response. Some examinees may consider such items unduly challenging. Further, when test anxiety is a factor, it is possible that test scores may be negatively affected. This research was undertaken to determine the effects of administering easier CAT items on ability estimation and test length using computer simulations. Also considered was the administration of different numbers of initial items prior to the start of the adaptive portion of the test, using three different levels of measurement precision. Results indicate that regardless of the number of initial items administered, the level of precision employed, or the modifications made to item difficulty, the approximation of estimated ability to true ability is good in all cases. Additionally, the standard deviations of the ability estimates closely approximate the theoretical levels of precision used as stopping rules for the simulated CATs. Since optimal CAT items are not used, each item administered provides less information about examinees than optimal CAT items. This results in longer tests. Fortunately, using easier items that provide up to a 66.4% chance of a correct response results in tests that only modestly increase in length, across levels of precision. For larger standard errors, even easier items (up to a 73.5% chance of a correct response) result in only negligible to modest increases in test length. Examinees who find optimal CAT items difficult or examinees with test anxiety may find CATs that implement easier items enhance the already existing benefits of CAT. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 63 ER - TY - CONF T1 - Incorporating the Sympson-Hetter exposure control method into the a-stratified method with content blocking T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Yi, Q. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA N1 - PDF file, 387 K ER - TY - JOUR T1 - Information technology and literacy assessment JF - Reading and Writing Quarterly Y1 - 2002 A1 - Balajthy, E. KW - Computer Applications KW - Computer Assisted Testing KW - Information KW - Internet KW - Literacy KW - Models KW - Systems KW - Technology AB - This column discusses information technology and literacy assessment in the past and present. The author also describes computer-based assessments today including the following topics: computer-scored testing, computer-administered formal assessment, Internet formal assessment, computerized adaptive tests, placement tests, informal assessment, electronic portfolios, information management, and Internet information dissemination. A model of the major present-day applications of information technologies in reading and literacy assessment is also included. (PsycINFO Database Record (c) 2005 APA ) VL - 18 ER - TY - CHAP T1 - Innovative item types for computerized testing Y1 - 2002 A1 - Parshall, C. G. A1 - Davey, T. A1 - Pashley, P. CY - In W. J. van der Linden and C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice. Norwell MA: Kluwer (in press). ER - TY - CONF T1 - An investigation of procedures for estimating error indexes in proficiency estimation in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Shyu, C.-Y. A1 - Fan, M. A1 - Thompson, T, A1 - Hsu, Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - JOUR T1 - An item response model for characterizing test compromise JF - Journal of Educational and Behavioral Statistics Y1 - 2002 A1 - Segall, D. O. KW - computerized adaptive testing AB - This article presents an item response model for characterizing test-compromise that enables the estimation of item-preview and score-gain distributions observed in on-demand high-stakes testing programs. Model parameters and posterior distributions are estimated by Markov Chain Monte Carlo (MCMC) procedures. Results of a simulation study suggest that when at least some of the items taken by a small sample of test takers are known to be secure (uncompromised), the procedure can provide useful summaries of test-compromise and its impact on test scores. The article includes discussions of operational use of the proposed procedure, possible model violations and extensions, and application to computerized adaptive testing. VL - 27 N1 - References .American Educational Research Assn, US ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm JF - Applied Psychological Measurement Y1 - 2002 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. VL - 26 ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm JF - Applied Psychological Measurement Y1 - 2002 A1 - Leung, C. K. A1 - Chang, Hua-Hua A1 - Hau, K. T. VL - 26 SN - 0146-6216 ER - TY - JOUR T1 - La simulation d’un test adaptatif basé sur le modèle de Rasch [Simulation of a Rasch-based adaptive test] JF - Mesure et évaluation en éducation. Y1 - 2002 A1 - Raîche, G. N1 - (In French) {PDF file, 30 KB} ER - TY - CHAP T1 - Le testing adaptatif [Adaptive testing] Y1 - 2002 A1 - Raîche, G. CY - D. R. Bertrand and J.G. Blais (Eds) : Les théories modernes de la mesure [Modern theories of measurement]. Sainte-Foy: Presses de l’Université du Québec. N1 - (In French) {PDF file, 191 KB} ER - TY - CONF T1 - Mapping the Development of Pre-reading Skills with STAR Early Literacy T2 - Presentation to the Annual Meeting of the Society for the Scientific Study of Reading. Chicago. Y1 - 2002 A1 - J. R. McBride A1 - Tardrew, S.P. JF - Presentation to the Annual Meeting of the Society for the Scientific Study of Reading. Chicago. ER - TY - RPRT T1 - Mathematical-programming approaches to test item pool design Y1 - 2002 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. A1 - Ariel, A. KW - Adaptive Testing KW - Computer Assisted KW - Computer Programming KW - Educational Measurement KW - Item Response Theory KW - Mathematics KW - Psychometrics KW - Statistical Rotation computerized adaptive testing KW - Test Items KW - Testing AB - (From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT. PB - University of Twente, Faculty of Educational Science and Technology CY - Twente, The Netherlands SN - 02-09 N1 - Using Smart Source ParsingAdvances in psychology research, Vol. ( Hauppauge, NY : Nova Science Publishers, Inc, [URL:http://www.Novapublishers.com]. vi, 228 pp ER - TY - JOUR T1 - Measuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system JF - Archives of Physical Medicine and Rehabilitation Y1 - 2002 A1 - Cella, D. A1 - Nowinski, C. J. KW - *Chronic Disease KW - *Quality of Life KW - *Rehabilitation KW - Adult KW - Comparative Study KW - Health Status Indicators KW - Humans KW - Psychometrics KW - Questionnaires KW - Research Support, U.S. Gov't, P.H.S. KW - Sensitivity and Specificity AB - We focus on quality of life (QOL) measurement as applied to chronic illness. There are 2 major types of health-related quality of life (HRQOL) instruments-generic health status and targeted. Generic instruments offer the opportunity to compare results across patient and population cohorts, and some can provide normative or benchmark data from which to interpret results. Targeted instruments ask questions that focus more on the specific condition or treatment under study and, as a result, tend to be more responsive to clinically important changes than generic instruments. Each type of instrument has a place in the assessment of HRQOL in chronic illness, and consideration of the relative advantages and disadvantages of the 2 options best drives choice of instrument. The Functional Assessment of Chronic Illness Therapy (FACIT) system of HRQOL measurement is a hybrid of the 2 approaches. The FACIT system combines a core general measure with supplemental measures targeted toward specific diseases, conditions, or treatments. Thus, it capitalizes on the strengths of each type of measure. Recently, FACIT questionnaires were administered to a representative sample of the general population with results used to derive FACIT norms. These normative data can be used for benchmarking and to better understand changes in HRQOL that are often seen in clinical trials. Future directions in HRQOL assessment include test equating, item banking, and computerized adaptive testing. VL - 83 N1 - 0003-9993Journal Article ER - TY - ABST T1 - MIRTCAT [computer software] Y1 - 2002 A1 - Li, Y. H. CY - Upper Marlboro MD: Author ER - TY - ABST T1 - Modifications of the Sympson-Hetter method for item-exposure control in computerized adaptive testing Y1 - 2002 A1 - van der Linden, W. J. CY - Manuscript submitted for publication ER - TY - JOUR T1 - Multidimensional adaptive testing for mental health problems in primary care JF - Medical Care Y1 - 2002 A1 - Gardner, W. A1 - Kelleher, K. J. A1 - Pajer, K. A. KW - Adolescent KW - Child KW - Child Behavior Disorders/*diagnosis KW - Child Health Services/*organization & administration KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Linear Models KW - Male KW - Mass Screening/*methods KW - Parents KW - Primary Health Care/*organization & administration AB - OBJECTIVES: Efficient and accurate instruments for assessing child psychopathology are increasingly important in clinical practice and research. For example, screening in primary care settings can identify children and adolescents with disorders that may otherwise go undetected. However, primary care offices are notorious for the brevity of visits and screening must not burden patients or staff with long questionnaires. One solution is to shorten assessment instruments, but dropping questions typically makes an instrument less accurate. An alternative is adaptive testing, in which a computer selects the items to be asked of a patient based on the patient's previous responses. This research used a simulation to test a child mental health screen based on this technology. RESEARCH DESIGN: Using half of a large sample of data, a computerized version was developed of the Pediatric Symptom Checklist (PSC), a parental-report psychosocial problem screen. With the unused data, a simulation was conducted to determine whether the Adaptive PSC can reproduce the results of the full PSC with greater efficiency. SUBJECTS: PSCs were completed by parents on 21,150 children seen in a national sample of primary care practices. RESULTS: Four latent psychosocial problem dimensions were identified through factor analysis: internalizing problems, externalizing problems, attention problems, and school problems. A simulated adaptive test measuring these traits asked an average of 11.6 questions per patient, and asked five or fewer questions for 49% of the sample. There was high agreement between the adaptive test and the full (35-item) PSC: only 1.3% of screening decisions were discordant (kappa = 0.93). This agreement was higher than that obtained using a comparable length (12-item) short-form PSC (3.2% of decisions discordant; kappa = 0.84). CONCLUSIONS: Multidimensional adaptive testing may be an accurate and efficient technology for screening for mental health problems in primary care settings. VL - 40 SN - 0025-7079 (Print)0025-7079 (Linking) N1 - Gardner, WilliamKelleher, Kelly JPajer, Kathleen AMCJ-177022/PHS HHS/MH30915/MH/NIMH NIH HHS/MH50629/MH/NIMH NIH HHS/Med Care. 2002 Sep;40(9):812-23. ER - TY - JOUR T1 - Multidimensional adaptive testing with constraints on test content JF - Psychometrika Y1 - 2002 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. AB - The case of adaptive testing under a multidimensional response model with large numbers of constraints on the content of the test is addressed. The items in the test are selected using a shadow test approach. The 0–1 linear programming model that assembles the shadow tests maximizes posterior expected Kullback-Leibler information in the test. The procedure is illustrated for five different cases of multidimensionality. These cases differ in (a) the numbers of ability dimensions that are intentional or should be considered as ldquonuisance dimensionsrdquo and (b) whether the test should or should not display a simple structure with respect to the intentional ability dimensions. VL - 67 N1 - Psychometric Society, US ER - TY - CONF T1 - Optimum number of strata in the a-stratified adaptive testing design T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Wen, J.-B. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 114 KB} ER - TY - JOUR T1 - Outlier detection in high-stakes certification testing JF - Journal of Educational Measurement Y1 - 2002 A1 - Meijer, R. R. KW - Adaptive Testing KW - computerized adaptive testing KW - Educational Measurement KW - Goodness of Fit KW - Item Analysis (Statistical) KW - Item Response Theory KW - person Fit KW - Statistical Estimation KW - Statistical Power KW - Test Scores AB - Discusses recent developments of person-fit analysis in computerized adaptive testing (CAT). Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in CAT Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used. Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that different types of misfit can be distinguished Further applications using statistical process control methods to detect misfitting item score patterns are discussed. (PsycINFO Database Record (c) 2005 APA ) VL - 39 ER - TY - CONF T1 - Practical considerations about expected a posteriori estimation in adaptive testing: Adaptive a prior, adaptive corrections for bias, adaptive integration interval T2 - Paper presented at the annual meeting of the International Objective Measurement Workshops-XI Y1 - 2002 A1 - Raiche, G. A1 - Blais, J. G. JF - Paper presented at the annual meeting of the International Objective Measurement Workshops-XI CY - New Orleans, LA N1 - {PDF file, 100 KB} ER - TY - CONF T1 - A “rearrangement procedure” for administering adaptive tests with review options T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Papanastasiou, E. C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 410 KB} ER - TY - CONF T1 - Redeveloping the exposure control parameters of CAT items when a pool is modified T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Chang, S-W. A1 - Harris, D. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 1.113 MB} ER - TY - CONF T1 - Relative precision of ability estimation in polytomous CAT: A comparison under the generalized partial credit model and graded response model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Wang, S A1 - Wang, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - #WA02-01 {PDF file, 735 KB} ER - TY - CONF T1 - Reliability and decision accuracy of linear parallel form and multi stage tests with realistic and ideal item pools T2 - Paper presented at the International Conference on Computer-Based Testing and the Internet Y1 - 2002 A1 - Jodoin, M. G. JF - Paper presented at the International Conference on Computer-Based Testing and the Internet CY - Winchester, England ER - TY - CONF T1 - The robustness of the unidimensional 3PL IRT model when applied to two-dimensional data in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Zhao, J. C. A1 - McMorris, R. F. A1 - Pruzek, R. M. A1 - Chen, R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 1.356 MB} ER - TY - JOUR T1 - Self-adapted testing: An overview JF - International Journal of Continuing Engineering Education and Life-Long Learning Y1 - 2002 A1 - Wise, S. L. A1 - Ponsoda, V. A1 - Olea, J. VL - 12 ER - TY - CONF T1 - Some features of the estimated sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules T2 - Communication proposée au 11e Biannual International objective measurement workshop. New-Orleans : International Objective Measurement Workshops. Y1 - 2002 A1 - Raîche, G. A1 - Blais, J. G. JF - Communication proposée au 11e Biannual International objective measurement workshop. New-Orleans : International Objective Measurement Workshops. ER - TY - CONF T1 - Some features of the sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules T2 - Paper presented at the annual meeting of the International Objective Measurement Workshops-XI Y1 - 2002 A1 - Blais, J-G. A1 - Raiche, G. JF - Paper presented at the annual meeting of the International Objective Measurement Workshops-XI CY - New Orleans, LA N1 - {PDF file, 38 KB} ER - TY - ABST T1 - STAR Math 2 Computer-Adaptive Math Test and Database: Technical Manual Y1 - 2002 A1 - Renaissance-Learning-Inc. CY - Wisconsin Rapids, WI: Author ER - TY - CONF T1 - Statistical indexes for monitoring item behavior under computer adaptive testing environment T2 - (Original title: Detecting item misfit in computerized adaptive testing.) Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Zhu, R. A1 - Yu, F. A1 - Liu, S. M. JF - (Original title: Detecting item misfit in computerized adaptive testing.) Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 2.287 MB} ER - TY - BOOK T1 - Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items Y1 - 2002 A1 - Davis, L. L. CY - Unpublished doctoral dissertation, University of Texas, Austin N1 - {PDF file, 1.83 MB} ER - TY - ABST T1 - A strategy for controlling item exposure in multidimensional computerized adaptive testing Y1 - 2002 A1 - Lee, Y. H. A1 - Ip, E.H. A1 - Fuh, C.D. CY - Available from http://www3. tat.sinica.edu.tw/library/c_tec_rep/c-2002-11.pdf ER - TY - JOUR T1 - A structure-based approach to psychological measurement: Matching measurement models to latent structure JF - Assessment Y1 - 2002 A1 - Ruscio, John A1 - Ruscio, Ayelet Meron KW - Adaptive Testing KW - Assessment KW - Classification (Cognitive Process) KW - Computer Assisted KW - Item Response Theory KW - Psychological KW - Scaling (Testing) KW - Statistical Analysis computerized adaptive testing KW - Taxonomies KW - Testing AB - The present article sets forth the argument that psychological assessment should be based on a construct's latent structure. The authors differentiate dimensional (continuous) and taxonic (categorical) structures at the latent and manifest levels and describe the advantages of matching the assessment approach to the latent structure of a construct. A proper match will decrease measurement error, increase statistical power, clarify statistical relationships, and facilitate the location of an efficient cutting score when applicable. Thus, individuals will be placed along a continuum or assigned to classes more accurately. The authors briefly review the methods by which latent structure can be determined and outline a structure-based approach to assessment that builds on dimensional scaling models, such as item response theory, while incorporating classification methods as appropriate. Finally, the authors empirically demonstrate the utility of their approach and discuss its compatibility with traditional assessment methods and with computerized adaptive testing. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 9 ER - TY - JOUR T1 - Technology solutions for testing JF - School Administrator Y1 - 2002 A1 - Olson, A. AB - Northwest Evaluation Association in Portland, Oregon, consults with state and local educators on assessment issues. Describes several approaches in place at school districts that are using some combination of computer-based tests to measure student growth. The computerized adaptive test adjusts items based on a student's answer in "real time." On-demand testing provides almost instant scoring. (MLF) VL - 4 ER - TY - CHAP T1 - Test models for complex computer-based testing Y1 - 2002 A1 - Luecht, RM A1 - Clauser, B. E. CY - C. N. Mille,. M. T. Potenza, J. J. Fremer, and W. C. Ward (Eds.). Computer-based testing: Building the foundation for future assessments (pp. 67-88). Hillsdale NJ: Erlbaum. ER - TY - CONF T1 - A testlet assembly design for the uniform CPA examination T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education. Y1 - 2002 A1 - Luecht, RM A1 - Brumfield, T. A1 - Breithaupt, K JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education. CY - New Orleans N1 - PDF file 192 KB ER - TY - CONF T1 - To weight or not to weight – balancing influence of initial and later items in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 252 KB} ER - TY - CONF T1 - Updated item parameter estimates using sparse CAT data T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Smith, R. L. A1 - Rizavi, S. A1 - Paez, R. A1 - Rotou, O. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 986 KB} ER - TY - CONF T1 - Using judgments of item difficulty to change answers on computerized adaptive vocabulary tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - #VI02-01 ER - TY - CONF T1 - Using testlet response theory to evaluate the equivalence of automatically generated multiple-choice items T2 - Symposium conducted at the annual meeting of the National Council on Measurement in Eucation Y1 - 2002 A1 - Williamson, D. M. A1 - Bejar, I. I. JF - Symposium conducted at the annual meeting of the National Council on Measurement in Eucation CY - New Orleans LA ER - TY - CONF T1 - Utility of Learning Potential Computerised Adaptive Test (LPCAT) scores in predicting academic performance of bridging students: A comparison with other predictors T2 - Paper presented at the 5th Annual Society for Industrial and Organisational Psychology Congress Y1 - 2002 A1 - De Beer, M. JF - Paper presented at the 5th Annual Society for Industrial and Organisational Psychology Congress CY - Pretoria, South Africa ER - TY - CHAP T1 - The work ahead: A psychometric infrastructure for computerized adaptive tests T2 - Computer-based tests: Building the foundation for future assessment Y1 - 2002 A1 - F Drasgow ED - M. P. Potenza ED - J. J. Freemer ED - W. C. Ward KW - Adaptive Testing KW - Computer Assisted Testing KW - Educational KW - Measurement KW - Psychometrics AB - (From the chapter) Considers the past and future of computerized adaptive tests and computer-based tests and looks at issues and challenges confronting a testing program as it implements and operates a computer-based test. Recommendations for testing programs from The National Council of Measurement in Education Ad Hoc Committee on Computerized Adaptive Test Disclosure are appended. (PsycINFO Database Record (c) 2005 APA ) JF - Computer-based tests: Building the foundation for future assessment PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingComputer-based testing: Building the foundation for future assessments. (pp. 1-35). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xi, 326 pp ER - TY - CONF T1 - Adaptation of a-stratified method in variable length computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Wen, J.-B. A1 - Chang, Hua-Hua A1 - Hau, K.-T.  JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 384 KB} ER - TY - ABST T1 - Application of data mining to response data in a computerized adaptive test Y1 - 2001 A1 - Mendez, F. A. CY - Paper presented at the Annual Meeting of the National Council on Measurement in Education, Seattle WA ER - TY - CONF T1 - Application of score information for CAT pool development and its connection with "likelihood test information T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Krass, I. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - #{PDF file, 392 KB} ER - TY - JOUR T1 - Assessment in the twenty-first century: A role of computerised adaptive testing in national curriculum subjects JF - Teacher Development Y1 - 2001 A1 - Cowan, P. A1 - Morrison, H. KW - computerized adaptive testing AB - With the investment of large sums of money in new technologies forschools and education authorities and the subsequent training of teachers to integrate Information and Communications Technology (ICT) into their teaching strategies, it is remarkable that the old outdated models of assessment still remain. This article highlights the current problems associated with pen-and paper-testing and offers suggestions for an innovative and new approach to assessment for the twenty-first century. Based on the principle of the 'wise examiner' a computerised adaptive testing system which measures pupils' ability against the levels of the United Kingdom National Curriculum has been developed for use in mathematics. Using constructed response items, pupils are administered a test tailored to their ability with a reliability index of 0.99. Since the software administers maximally informative questions matched to each pupil's current ability estimate, no two pupils will receive the same set of items in the same order therefore removing opportunities for plagarism and teaching to the test. All marking is automated and a journal recording the outcome of the test and highlighting the areas of difficulty for each pupil is available for printing by the teacher. The current prototype of the system can be used on a school's network however the authors envisage a day when Examination Boards or the Qualifications and Assessment Authority (QCA) will administer Government tests from a central server to all United Kingdom schools or testing centres. Results will be issued at the time of testing and opportunities for resits will become more widespr VL - 5 ER - TY - CONF T1 - a-stratified CAT design with content-blocking T2 - Paper presented at the Annual Meeting of the Psychometric Society Y1 - 2001 A1 - Yi, Q. A1 - Chang, Hua-Hua JF - Paper presented at the Annual Meeting of the Psychometric Society CY - King of Prussia, PA N1 - {PDF file, 410 KB} ER - TY - CONF T1 - a-stratified computerized adaptive testing with unequal item exposure across strata T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Deng, H. A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - #DE01-01 ER - TY - JOUR T1 - a-Stratified multistage computerized adaptive testing with b blocking JF - Applied Psychological Measurement Y1 - 2001 A1 - Chang, Hua-Hua A1 - Qian, J. A1 - Ying, Z. AB - Chang & Ying’s (1999) computerized adaptive testing item-selection procedure stratifies the item bank according to a parameter values and requires b parameter values to be evenly distributed across all strata. Thus, a and b parameter values must be incorporated into how strata are formed. A refinement is proposed, based on Weiss’ (1973) stratification of items according to b values. Simulation studies using a retired item bank of a Graduate Record Examination test indicate that the new approach improved control of item exposure rates and reduced mean squared errors. VL - 25 SN - 0146-6216 ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing with b blocking JF - Applied Psychological Measurement Y1 - 2001 A1 - Chang, Hua-Hua A1 - Qian, J. A1 - Yang, Z. KW - computerized adaptive testing AB - Proposed a refinement, based on the stratification of items developed by D. Weiss (1973), of the computerized adaptive testing item selection procedure of H. Chang and Z. Ying (1999). Simulation studies using an item bank from the Graduate Record Examination show the benefits of the new procedure. (SLD) VL - 25 ER - TY - CONF T1 - Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. Hendrickson, A. B. A1 - Ihrig, D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - #VI01-01 ER - TY - RPRT T1 - CATSIB: A modified SIBTEST procedure to detect differential item functioning in computerized adaptive tests (Research report) Y1 - 2001 A1 - Nandakumar, R. A1 - Roussos, L. PB - Law School Admission Council CY - Newton, PA ER - TY - ABST T1 - CB BULATS: Examining the reliability of a computer based test using test-retest method Y1 - 2001 A1 - Geranpayeh, A. CY - Cambridge ESOL Research Notes, Issue 5, July 2001, pp N1 - #GE01-01 14-16. {PDF file, 456 KB} ER - TY - JOUR T1 - A comparative study of on line pretest item—Calibration/scaling methods in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2001 A1 - Ban, J. C. A1 - Hanson, B. A. A1 - Wang, T. A1 - Yi, Q. A1 - Harris, D. J. AB - The purpose of this study was to compare and evaluate five on-line pretest item-calibration/scaling methods in computerized adaptive testing (CAT): marginal maximum likelihood estimate with one EM cycle (OEM), marginal maximum likelihood estimate with multiple EM cycles (MEM), Stocking's Method A, Stocking's Method B, and BILOG/Prior. The five methods were evaluated in terms ofitem-parameter recovery, using three different sample sizes (300, 1000 and 3000). The MEM method appeared to be the best choice among these, because it produced the smallest parameter-estimation errors for all sample size conditions. MEM and OEM are mathematically similar, although the OEM method produced larger errors. MEM also was preferable to OEM, unless the amount of timeinvolved in iterative computation is a concern. Stocking's Method B also worked very well, but it required anchor items that either would increase test lengths or require larger sample sizes depending on test administration design. Until more appropriate ways of handling sparse data are devised, the BILOG/Prior method may not be a reasonable choice for small sample sizes. Stocking's Method A hadthe largest weighted total error, as well as a theoretical weakness (i.e., treating estimated ability as true ability); thus, there appeared to be little reason to use it VL - 38 ER - TY - CONF T1 - Comparison of the SPRT and CMT procedures in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Yi, Q. A1 - Hanson, B. A1 - Widiatmo, H. A1 - Harris, D. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA ER - TY - JOUR T1 - Computerized adaptive testing with equated number-correct scoring JF - Applied Psychological Measurement Y1 - 2001 A1 - van der Linden, W. J. AB - A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to equate CAT number-correct (NC) scores to a reference test. As a result, the CAT NC scores also are equated across administrations. The constraints are derived from van der Linden & Luecht’s (1998) set of conditions on item response functions that guarantees identical observed NC score distributions on two test forms. An item bank from the Law School Admission Test was used to compare the results of the algorithm with those for equipercentile observed-score equating, as well as the prediction of NC scores on a reference test using its test response function. The effects of the constraints on the statistical properties of the θ estimator in CAT were examined. VL - 25 N1 - Sage Publications, US ER - TY - JOUR T1 - Computerized adaptive testing with the generalized graded unfolding model JF - Applied Psychological Measurement Y1 - 2001 A1 - Roberts, J. S. A1 - Lin, Y. A1 - Laughlin, J. E. KW - Attitude Measurement KW - College Students computerized adaptive testing KW - Computer Assisted Testing KW - Item Response KW - Models KW - Statistical Estimation KW - Theory AB - Examined the use of the generalized graded unfolding model (GGUM) in computerized adaptive testing. The objective was to minimize the number of items required to produce equiprecise estimates of person locations. Simulations based on real data about college student attitudes toward abortion and on data generated to fit the GGUM were used. It was found that as few as 7 or 8 items were needed to produce accurate and precise person estimates using an expected a posteriori procedure. The number items in the item bank (20, 40, or 60 items) and their distribution on the continuum (uniform locations or item clusters in moderately extreme locations) had only small effects on the accuracy and precision of the estimates. These results suggest that adaptive testing with the GGUM is a good method for achieving estimates with an approximately uniform level of precision using a small number of items. (PsycINFO Database Record (c) 2005 APA ) VL - 25 ER - TY - UNPB T1 - Computerized-adaptive versus paper-and-pencil testing environments: An experimental analysis of examinee experience Y1 - 2001 A1 - Bringsjord, E. L. VL - Doctoral dissertation ER - TY - JOUR T1 - Concerns with computerized adaptive oral proficiency assessment. A commentary on "Comparing examinee attitudes Toward computer-assisted and other oral proficient assessments": Response to the Norris Commentary JF - Language Learning and Technology Y1 - 2001 A1 - Norris, J. M. A1 - Kenyon, D. M. A1 - Malabonga, V. AB - Responds to an article on computerized adaptive second language (L2) testing, expressing concerns about the appropriateness of such tests for informing language educators about the language skills of L2 learners and users and fulfilling the intended purposes and achieving the desired consequences of language test use.The authors of the original article respond. (Author/VWL) VL - 5 ER - TY - JOUR T1 - CUSUM-based person-fit statistics for adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 2001 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. VL - 26 ER - TY - CONF T1 - Data sparseness and online pretest calibration/scaling methods in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Ban, J A1 - Hanson, B. A. A1 - Yi, Q. A1 - Harris, D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle N1 - Also ACT Research Report 2002-1) ER - TY - CONF T1 - Deriving a stopping rule for sequential adaptive tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Grabovsky, I. A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 111 KB} ER - TY - ABST T1 - Detection of misfitting item-score patterns in computerized adaptive testing Y1 - 2001 A1 - Stoop, E. M. L. A. CY - Enschede, The Netherlands: Febodruk B N1 - #ST01-01 V. ER - TY - BOOK T1 - Development and evaluation of test assembly procedures for computerized adaptive testing Y1 - 2001 A1 - Robin, F. CY - Unpublished doctoral dissertation, University of Massachusetts, Amherst ER - TY - JOUR T1 - Development of an adaptive multimedia program to collect patient health data JF - American Journal of Preventative Medicine Y1 - 2001 A1 - Sutherland, L. A. A1 - Campbell, M. A1 - Ornstein, K. A1 - Wildemuth, B. A1 - Lobach, D. VL - 21 ER - TY - ABST T1 - The Development of STAR Early Literacy: A report of the School Renaissance Institute. Y1 - 2001 A1 - School-Renaissance-Institute CY - Madison, WI: Author. ER - TY - JOUR T1 - Developments in measurement of persons and items by means of item response models JF - Behaviormetrika Y1 - 2001 A1 - Sijtsma, K. KW - Cognitive KW - Computer Assisted Testing KW - Item Response Theory KW - Models KW - Nonparametric Statistical Tests KW - Processes AB - This paper starts with a general introduction into measurement of hypothetical constructs typical of the social and behavioral sciences. After the stages ranging from theory through operationalization and item domain to preliminary test or questionnaire have been treated, the general assumptions of item response theory are discussed. The family of parametric item response models for dichotomous items is introduced and it is explained how parameters for respondents and items are estimated from the scores collected from a sample of respondents who took the test or questionnaire. Next, the family of nonparametric item response models is explained, followed by the 3 classes of item response models for polytomous item scores (e.g., rating scale scores). Then, to what degree the mean item score and the unweighted sum of item scores for persons are useful for measuring items and persons in the context of item response theory is discussed. Methods for fitting parametric and nonparametric models to data are briefly discussed. Finally, the main applications of item response models are discussed, which include equating and item banking, computerized and adaptive testing, research into differential item functioning, person fit research, and cognitive modeling. (PsycINFO Database Record (c) 2005 APA ) VL - 28 ER - TY - JOUR T1 - Differences between self-adapted and computerized adaptive tests: A meta-analysis JF - Journal of Educational Measurement Y1 - 2001 A1 - Pitkin, A. K. A1 - Vispoel, W. P. KW - Adaptive Testing KW - Computer Assisted Testing KW - Scores computerized adaptive testing KW - Test KW - Test Anxiety AB - Self-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. The authors speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. (PsycINFO Database Record (c) 2005 APA ) VL - 38 ER - TY - CONF T1 - The effect of test and examinee characteristics on the occurrence of aberrant response patterns in a computerized adaptive test T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Rizavi, S. A1 - Swaminathan, H. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - #RI01-01 ER - TY - CONF T1 - Effective use of simulated data in an on-line item calibration in practical situations of computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Samejima, F. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA ER - TY - CONF T1 - Effects of changes in the examinees’ ability distribution on the exposure control methods in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Chang, S-W. A1 - Twu, B.-Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - #CH01-02 {PDF file, 695 KB} ER - TY - CONF T1 - Efficient on-line item calibration using a nonparametric method adjusted to computerized adaptive testing T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Samejima, F. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - JOUR T1 - Evaluation of an MMPI-A short form: Implications for adaptive testing JF - Journal of Personality Assessment Y1 - 2001 A1 - Archer, R. P. A1 - Tirrell, C. A. A1 - Elkins, D. E. KW - Adaptive Testing KW - Mean KW - Minnesota Multiphasic Personality Inventory KW - Psychometrics KW - Statistical Correlation KW - Statistical Samples KW - Test Forms AB - Reports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA ) VL - 76 ER - TY - JOUR T1 - An examination of conditioning variables used in computer adaptive testing for DIF analyses JF - Applied Measurement in Education Y1 - 2001 A1 - Walker, C. M. A1 - Beretvas, S. N A1 - Ackerman, T. A. VL - 14 ER - TY - CONF T1 - An examination of item review on a CAT using the specific information item selection algorithm T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Bowles, R A1 - Pommerich, M JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - PDF file, 325 KB} ER - TY - CONF T1 - An examination of item review on a CAT using the specific information item selection algorithm T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Bowles, R A1 - Pommerich, M JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - PDF file, 325 K ER - TY - ABST T1 - An examination of item review on computer adaptive tests Y1 - 2001 A1 - Bowles, R CY - Manuscript in preparation, University of Virginia ER - TY - CONF T1 - An examination of item selection rules by stratified CAT designs integrated with content balancing methods T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 2001 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 296 KB} ER - TY - ABST T1 - An examination of testlet scoring and item exposure constraints in the verbal reasoning section of the MCAT Y1 - 2001 A1 - Davis, L. L. A1 - Dodd, B. G. CY - MCAT Monograph Series: Association of American Medical Colleges ER - TY - CONF T1 - An examination of testlet scoring and item exposure constraints in the Verbal Reasoning section of the MCAT Y1 - 2001 A1 - Davis, L. L. A1 - Dodd, B. G. N1 - {PDF file, 653 KB} ER - TY - JOUR T1 - An examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales JF - Journal of Applied Psychology Y1 - 2001 A1 - Borman, W. C. A1 - Buck, D. E. A1 - Hanson, M. A. A1 - Motowidlo, S. J. A1 - Stark, S. A1 - F Drasgow KW - *Computer Simulation KW - *Employee Performance Appraisal KW - *Personnel Selection KW - Adult KW - Automatic Data Processing KW - Female KW - Human KW - Male KW - Reproducibility of Results KW - Sensitivity and Specificity KW - Support, U.S. Gov't, Non-P.H.S. KW - Task Performance and Analysis KW - Video Recording AB - This laboratory research compared the reliability, validity, and accuracy of a computerized adaptive rating scale (CARS) format and 2 relatively common and representative rating formats. The CARS is a paired-comparison rating task that uses adaptive testing principles to present pairs of scaled behavioral statements to the rater to iteratively estimate a ratee's effectiveness on 3 dimensions of contextual performance. Videotaped vignettes of 6 office workers were prepared, depicting prescripted levels of contextual performance, and 112 subjects rated these vignettes using the CARS format and one or the other competing format. Results showed 23%-37% lower standard errors of measurement for the CARS format. In addition, validity was significantly higher for the CARS format (d = .18), and Cronbach's accuracy coefficients showed significantly higher accuracy, with a median effect size of .08. The discussion focuses on possible reasons for the results. VL - 86 N1 - 214803450021-9010Journal ArticleValidation Studies ER - TY - BOOK T1 - The FastTEST Professional Testing System, Version 1.6 [Computer software] Y1 - 2001 A1 - Assessment-Systems-Corporation CY - St. Paul MN: Author ER - TY - JOUR T1 - Final answer? JF - American School Board Journal Y1 - 2001 A1 - Coyle, J. KW - computerized adaptive testing AB - The Northwest Evaluation Association helped an Indiana school district develop a computerized adaptive testing system that was aligned with its curriculum and geared toward measuring individual student growth. Now the district can obtain such information from semester to semester and year to year, get immediate results, and test students on demand. (MLH) VL - 188 ER - TY - CONF T1 - Impact of item location effects on ability estimation in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Liu, M. A1 - Zhu, R. A1 - Guo, F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - #LI01-01 ER - TY - CONF T1 - Impact of scoring options for not reached items in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Yi, Q. A1 - Widiatmo, H. A1 - Ban, J-C. A1 - Harris, D. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - PDF file, 232 K ER - TY - CONF T1 - Impact of several computer-based testing variables on the psychometric properties of credentialing examinations T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Xing, D. A1 - Hambleton, R. K. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - ABST T1 - Impact of several computer-based testing variables on the psychometric properties of credentialing examinations (Laboratory of Psychometric and Evaluative Research Report No 393) Y1 - 2001 A1 - Xing, D. A1 - Hambleton, R. K. CY - Amherst, MA: University of Massachusetts, School of Education. ER - TY - ABST T1 - Implementing constrained CAT with shadow tests for large item pools Y1 - 2001 A1 - Veldkamp, B. P. CY - Submitted for publication ER - TY - ABST T1 - Implementing content constraints in a-stratified adaptive testing using a shadow test approach (Research Report 01-001) Y1 - 2001 A1 - Chang, Hua-Hua A1 - van der Linden, W. J. CY - University of Twente, Department of Educational Measurement and Data Analysis ER - TY - CONF T1 - The influence of item characteristics and administration position on CAT Scores T2 - Paper presented at the 33rd annual meeting of the Northeastern Educational Research Association Y1 - 2001 A1 - Wang, L. A1 - Gawlick, L. JF - Paper presented at the 33rd annual meeting of the Northeastern Educational Research Association CY - Hudson Valley, NY, October 26, 2001 ER - TY - CONF T1 - Integrating stratification and information approaches for multiple constrained CAT T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Leung, C.-I. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA N1 - {PDF file, 322 KB} ER - TY - CONF T1 - An investigation of procedures for estimating error indexes in proficiency estimation in a realistic second-order equitable CAT environment T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Shyu, C.-Y. A1 - Fan, M. A1 - Thompson, T, A1 - Hsu. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA ER - TY - CONF T1 - An investigation of the impact of items that exhibit mild DIF on ability estimation in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Jennings, J. A. A1 - Dodd, B. G. A1 - Fitzpatrick, S. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - ABST T1 - Item and passage selection algorithm simulations for a computerized adaptive version of the verbal section of the Medical College Admission Test (MCAT) Y1 - 2001 A1 - Smith, R. W. A1 - Plake, B. S. A1 - De Ayala, R. J., CY - MCAT Monograph Series ER - TY - CONF T1 - Item pool design for computerized adaptive tests T2 - Invited small group session at the 6th Conference of the European Association of Psychological Assessment Y1 - 2001 A1 - Reckase, M. D. JF - Invited small group session at the 6th Conference of the European Association of Psychological Assessment CY - Aachen, Germany ER - TY - CHAP T1 - Item response theory applied to combinations of multiple-choice and constructed-response items--approximation methods for scale scores T2 - Test scoring Y1 - 2001 A1 - Thissen, D. A1 - Nelson, L. A. A1 - Swygert, K. A. KW - Adaptive Testing KW - Item Response Theory KW - Method) KW - Multiple Choice (Testing KW - Scoring (Testing) KW - Statistical Estimation KW - Statistical Weighting KW - Test Items KW - Test Scores AB - (From the chapter) The authors develop approximate methods that replace the scoring tables with weighted linear combinations of the component scores. Topics discussed include: a linear approximation for the extension to combinations of scores; the generalization of two or more scores; potential applications of linear approximations to item response theory in computerized adaptive tests; and evaluation of the pattern-of-summed-scores, and Gaussian approximation, estimates of proficiency. (PsycINFO Database Record (c) 2005 APA ) JF - Test scoring PB - Lawrence Erlbaum Associates CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingTest scoring. (pp. 293-341). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xii, 422 pp ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Should more discriminating items be used first? JF - Journal of Educational Measurement Y1 - 2001 A1 - Hau, Kit-Tai A1 - Chang, Hua-Hua KW - ability KW - Adaptive Testing KW - Computer Assisted Testing KW - Estimation KW - Statistical KW - Test Items computerized adaptive testing AB - During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA ) VL - 38 ER - TY - JOUR T1 - On maximizing item information and matching difficulty with ability JF - Psychometrika Y1 - 2001 A1 - Bickel, P. A1 - Buyske, S. A1 - Chang, Hua-Hua A1 - Ying, Z. VL - 66 ER - TY - CONF T1 - Measurement efficiency of multidimensional computerized adaptive testing T2 - Paper presented at the annual meeting of the American Psychological Association Y1 - 2001 A1 - Wang, W-C. A1 - Chen, B.-H. JF - Paper presented at the annual meeting of the American Psychological Association CY - San Francisco CA ER - TY - CONF T1 - Measuring test compromise in high-stakes computerized adaptive testing: A Bayesian Strategy for surrogate test-taker detection T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Segall, D. O. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - {PDF file, 275 KB} ER - TY - ABST T1 - A method for building a realistic model of test taker behavior for computerized adaptive testing (Research Report 01-22) Y1 - 2001 A1 - Stocking, M. L. A1 - Steffen, M. A1 - Eignor, D. R. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Methods to test invariant ability across subgroups of items in CAT T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Meijer, R. R. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - JOUR T1 - A minimax procedure in the context of sequential testing problems in psychodiagnostics JF - British Journal of Mathematical and Statistical Psychology Y1 - 2001 A1 - Vos, H. J. VL - 54 N1 - #VO01139 Vos, H J (2001) A minimax procedure in the context of sequential testing problems ER - TY - CONF T1 - Modeling variability in item parameters in CAT T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Glas, C. A. W. A1 - van der Linden, W. J. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - CONF T1 - Monitoring items for changes in performance in computerized adaptive tests T2 - Paper presented at the annual conference of the National Council on Measurement in Education Y1 - 2001 A1 - Smith, R. L. A1 - Wang, M.M. A1 - Wingersky, M. A1 - Zhao, C. JF - Paper presented at the annual conference of the National Council on Measurement in Education CY - Seattle, Washington ER - TY - CONF T1 - A monte carlo study of the feasibility of on-the-fly assessment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Revuelta, J. A1 - Bejar, I. I. A1 - Stocking, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - JOUR T1 - Multidimensional adaptive testing using the weighted likelihood estimation JF - Dissertation Abstracts International Section A: Humanities & Social Sciences Y1 - 2001 A1 - Tseng, F-L. KW - computerized adaptive testing AB - This study extended Warm's (1989) weighted likelihood estimation (WLE) to a multidimensional computerized adaptive test (MCAT) setting. WLE was compared with the maximum likelihood estimation (MLE), expected a posteriori (EAP), and maximum a posteriori (MAP) using a three-dimensional 3PL IRT model under a variety of computerized adaptive testing conditions. The dependent variables included bias, standard error of ability estimates (SE), square root of mean square error (RMSE), and test information. The independent variables were ability estimation methods, intercorrelation levels between dimensions, multidimensional structures, and ability combinations. Simulation results were presented in terms of descriptive statistics, such as figures and tables. In addition, inferential procedures were used to analyze bias by conceptualizing this Monte Carlo study as a statistical sampling experiment. The results of this study indicate that WLE and the other three estimation methods yield significantly more accurate ability estimates under an approximate simple test structure with one dominant dimension and several secondary dimensions. All four estimation methods, especially WLE, yield very large SEs when a three equally dominant multidimensional structure was employed. Consistent with previous findings based on unidimensional IRT model, MLE and WLE are less biased in the extreme of the ability scale; MLE and WLE yield larger SEs than the Bayesian methods; test information-based SEs underestimate actual SEs for both MLE and WLE in MCAT situations, especially at shorter test lengths; WLE reduced the bias of MLE under the approximate simple structure; test information-based SEs underestimates the actual SEs of MLE and WLE estimators in the MCAT conditions, similar to the findings of Warm (1989) in the unidimensional case. The results from the MCAT simulations did show some advantages of WLE in reducing the bias of MLE under the approximate simple structure with a fixed test length of 50 items, which was consistent with the previous research findings based on different unidimensional models. It is clear from the current results that all four methods perform very poorly when the multidimensional structures with multiple dominant factors were employed. More research efforts are urged to investigate systematically how different multidimensional structures affect the accuracy and reliability of ability estimation. Based on the simulated results in this study, there is no significant effect found on the ability estimation from the intercorrelation between dimensions. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 61 ER - TY - CONF T1 - Multidimensional adaptive testing using weighted likelihood estimation: A comparison of estimation methods T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Tseng, F.-E. A1 - Hsu, T.-C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - {PDF file, 988 KB} ER - TY - CONF T1 - Multidimensional IRT-based adaptive sequential mastery testing T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Vos, H. J. A1 - Glas, C. E. W. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - JOUR T1 - NCLEX-RN performance: predicting success on the computerized examination JF - Journal of Professional Nursing Y1 - 2001 A1 - Beeman, P. B. A1 - Waterhouse, J. K. KW - *Education, Nursing, Baccalaureate KW - *Educational Measurement KW - *Licensure KW - Adult KW - Female KW - Humans KW - Male KW - Predictive Value of Tests KW - Software AB - Since the adoption of the Computerized Adaptive Testing (CAT) format of the National Certification Licensure Examination for Registered Nurses (NCLEX-RN), no studies have been reported in the literature on predictors of successful performance by baccalaureate nursing graduates on the licensure examination. In this study, a discriminant analysis was used to identify which of 21 variables can be significant predictors of success on the CAT NCLEX-RN. The convenience sample consisted of 289 individuals who graduated from a baccalaureate nursing program between 1995 and 1998. Seven significant predictor variables were identified. The total number of C+ or lower grades earned in nursing theory courses was the best predictor, followed by grades in several individual nursing courses. More than 93 per cent of graduates were correctly classified. Ninety-four per cent of NCLEX "passes" were correctly classified, as were 92 per cent of NCLEX failures. This degree of accuracy in classifying CAT NCLEX-RN failures represents a marked improvement over results reported in previous studies of licensure examinations, and suggests the discriminant function will be helpful in identifying future students in danger of failure. J Prof Nurs 17:158-165, 2001. VL - 17 N1 - 8755-7223Journal Article ER - TY - CONF T1 - Nearest neighbors, simple strata, and probabilistic parameters: An empirical comparison of methods for item exposure control in CATs T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Parshall, C. G. A1 - Kromrey, J. D. A1 - Harmes, J. C. A1 - Sentovich, C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - CONF T1 - A new approach to simulation studies in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Chen, S-Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 251 KB} ER - TY - JOUR T1 - A new computer algorithm for simultaneous test construction of two-stage and multistage testing JF - Journal of Educational and Behavioral Statistics Y1 - 2001 A1 - Wu, I. L. VL - 26 ER - TY - JOUR T1 - Nouveaux développements dans le domaine du testing informatisé [New developments in the area of computerized testing] JF - Psychologie Française Y1 - 2001 A1 - Meijer, R. R. A1 - Grégoire, J. KW - Adaptive Testing KW - Computer Applications KW - Computer Assisted KW - Diagnosis KW - Psychological Assessment computerized adaptive testing AB - L'usage de l'évaluation assistée par ordinateur s'est fortement développé depuis la première formulation de ses principes de base dans les années soixante et soixante-dix. Cet article offre une introduction aux derniers développements dans le domaine de l'évaluation assistée par ordinateur, en particulier celui du testing adaptative informatisée (TAI). L'estimation de l'aptitude, la sélection des items et le développement d'une base d'items dans le cas du TAI sont discutés. De plus, des exemples d'utilisations innovantes de l'ordinateur dans des systèmes intégrés de testing et de testing via Internet sont présentés. L'article se termine par quelques illustrations de nouvelles applications du testing informatisé et des suggestions pour des recherches futures.Discusses the latest developments in computerized psychological assessment, with emphasis on computerized adaptive testing (CAT). Ability estimation, item selection, and item pool development in CAT are described. Examples of some innovative approaches to CAT are presented. (PsycINFO Database Record (c) 2005 APA ) VL - 46 ER - TY - CONF T1 - On-line Calibration Using PARSCALE Item Specific Prior Method: Changing Test Population and Sample Size T2 - Paper presented at National Council on Measurement in Education Annual Meeting Y1 - 2001 A1 - Guo, F. A1 - Stone, E. A1 - Cruz, D. JF - Paper presented at National Council on Measurement in Education Annual Meeting CY - Seattle, Washington ER - TY - ABST T1 - Online item parameter recalibration: Application of missing data treatments to overcome the effects of sparse data conditions in a computerized adaptive version of the MCAT Y1 - 2001 A1 - Harmes, J. C. A1 - Kromrey, J. D. A1 - Parshall, C. G. CY - Unpublished manuscript N1 - {PDF file, 406 KB} ER - TY - JOUR T1 - Outlier measures and norming methods for computerized adaptive tests JF - Journal of Educational and Behavioral Statistics Y1 - 2001 A1 - Bradlow, E. T. A1 - Weiss, R. E. KW - Adaptive Testing KW - Computer Assisted Testing KW - Statistical Analysis KW - Test Norms AB - Notes that the problem of identifying outliers has 2 important aspects: the choice of outlier measures and the method to assess the degree of outlyingness (norming) of those measures. Several classes of measures for identifying outliers in Computerized Adaptive Tests (CATs) are introduced. Some of these measures are constructed to take advantage of CATs' sequential choice of items; other measures are taken directly from paper and pencil (P&P) tests and are used for baseline comparisons. Assessing the degree of outlyingness of CAT responses, however, can not be applied directly from P&P tests because stopping rules associated with CATs yield examinee responses of varying lengths. Standard outlier measures are highly correlated with the varying lengths which makes comparison across examinees impossible. Therefore, 4 methods are presented and compared which map outlier statistics to a familiar probability scale (a p value). The methods are explored in the context of CAT data from a 1995 Nationally Administered Computerized Examination (NACE). (PsycINFO Database Record (c) 2005 APA ) VL - 26 ER - TY - ABST T1 - Overexposure and underexposure of items in computerized adaptive testing (Measurement and Research Department Reports 2001-1) Y1 - 2001 A1 - Theo Eggen CY - Arnhem, The Netherlands: CITO Groep N1 - {PDF file, 276 KB} ER - TY - JOUR T1 - Pasado, presente y futuro de los test adaptativos informatizados: Entrevista con Isaac I. Béjar [Past, present and future of computerized adaptive testing: Interview with Isaac I. Béjar] JF - Psicothema Y1 - 2001 A1 - Tejada, R. A1 - Antonio, J. KW - computerized adaptive testing AB - En este artículo se presenta el resultado de una entrevista con Isaac I. Bejar. El Dr. Bejar es actualmente Investigador Científico Principal y Director del Centro para el Diseño de Evaluación y Sistemas de Puntuación perteneciente a la División de Investigación del Servicio de Medición Educativa (Educa - tional Testing Service, Princeton, NJ, EE.UU.). El objetivo de esta entrevista fue conversar sobre el pasado, presente y futuro de los Tests Adaptativos Informatizados. En la entrevista se recogen los inicios de los Tests Adaptativos y de los Tests Adaptativos Informatizados y últimos avances que se desarrollan en el Educational Testing Service sobre este tipo de tests (modelos generativos, isomorfos, puntuación automática de ítems de ensayo…). Se finaliza con la visión de futuro de los Tests Adaptativos Informatizados y su utilización en España.Past, present and future of Computerized Adaptive Testing: Interview with Isaac I. Bejar. In this paper the results of an interview with Isaac I. Bejar are presented. Dr. Bejar is currently Principal Research Scientist and Director of Center for Assessment Design and Scoring, in Research Division at Educational Testing Service (Princeton, NJ, U.S.A.). The aim of this interview was to review the past, present and future of the Computerized Adaptive Tests. The beginnings of the Adaptive Tests and Computerized Adaptive Tests, and the latest advances developed at the Educational Testing Service (generative response models, isomorphs, automated scoring…) are reviewed. The future of Computerized Adaptive Tests is analyzed, and its utilization in Spain commented. VL - 13 SN - 0214-9915 ER - TY - JOUR T1 - Polytomous modeling of cognitive errors in computer adaptive testing JF - Journal of Applied Measurement Y1 - 2001 A1 - Wang, L.-S. A1 - Chun-Shan Li. AB - Used Monte Carlo simulation to compare the relative measurement efficiency of polytomous modeling and dichotomous modeling under different scoring schemes and termination criteria. Results suggest that polytomous computerized adaptive testing (CAT) yields marginal gains over dichotomous CAT when termination criteria are more stringent. Discusses conditions under which polytomous CAT cannot prevent the nonuniform gain in test information. (SLD) VL - 2 (4). ER - TY - CONF T1 - Pour une évaluation sur mesure des étudiants : défis et enjeux du testing adaptatif T2 - Commnication présentée à l’intérieur de la 23e session d’études de l’Association pour le dévelopement de la mesure et de l’évaluation en éducation Y1 - 2001 A1 - Raîche, G. JF - Commnication présentée à l’intérieur de la 23e session d’études de l’Association pour le dévelopement de la mesure et de l’évaluation en éducation CY - ADMÉÉ N1 - Québec: ADMÉÉ. ER - TY - CONF T1 - Pour une évaluation sur mesure pour chaque étudiant : défis et enjeux du testing adaptatif par ordinateur en éducation [Tailored testing for each student : Principles and stakes of computerized adaptive testing in education] T2 - Presented at the 23th Study Session of the ADMÉÉ. Québec: Association pour le développement de la mesure et de l’évaluation en éducation (ADMÉÉ). Y1 - 2001 A1 - Raîche, G, Blais, J.G. A1 - Boiteau, N. JF - Presented at the 23th Study Session of the ADMÉÉ. Québec: Association pour le développement de la mesure et de l’évaluation en éducation (ADMÉÉ). ER - TY - CHAP T1 - Practical issues in setting standards on computerized adaptive tests T2 - Setting performance standards: Concepts, methods, and perspectives Y1 - 2001 A1 - Sireci, S. G. A1 - Clauser, B. E. KW - Adaptive Testing KW - Computer Assisted Testing KW - Performance Tests KW - Testing Methods AB - (From the chapter) Examples of setting standards on computerized adaptive tests (CATs) are hard to find. Some examples of CATs involving performance standards include the registered nurse exam and the Novell systems engineer exam. Although CATs do not require separate standard setting-methods, there are special issues to be addressed by test specialist who set performance standards on CATs. Setting standards on a CAT will typical require modifications on the procedures used with more traditional, fixed-form, paper-and -pencil examinations. The purpose of this chapter is to illustrate why CATs pose special challenges to the standard setter. (PsycINFO Database Record (c) 2005 APA ) JF - Setting performance standards: Concepts, methods, and perspectives PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingSetting performance standards: Concepts, methods, and perspectives. (pp. 355-369). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xiii, 510 pp ER - TY - JOUR T1 - Precision of Warm’s weighted likelihood estimation of ability for a polytomous model in CAT JF - Applied Psychological Measurement Y1 - 2001 A1 - Wang, S., A1 - Wang, T. VL - 25 ER - TY - CONF T1 - Principes et enjeux du testing adaptatif : de la loi des petits nombres à la loi des grands nombres T2 - Communication présentée à l’intérieur du 69e congrès de l’Association canadienne française pour l’avancement de la science Y1 - 2001 A1 - Raîche, G. JF - Communication présentée à l’intérieur du 69e congrès de l’Association canadienne française pour l’avancement de la science CY - Acfas N1 - Sherbrooke: Acfas. ER - TY - BOOK T1 - A rearrangement procedure for administering adaptive tests when review options are permitted Y1 - 2001 A1 - Papanastasiou, E. C. CY - Unpublished doctoral dissertation, Michigan State University ER - TY - ABST T1 - Refining a system for computerized adaptive testing pool creation (Research Report 01-18) Y1 - 2001 A1 - Way, W. D. A1 - Swanson, L. A1 - Steffen, M. A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Refining a system for computerized adaptive testing pool creation T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Way, W. D. A1 - Swanson, l, A1 - Stocking, M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA ER - TY - JOUR T1 - Requerimientos, aplicaciones e investigación en tests adaptativos informatizados [Requirements, applications, and investigation in computerized adaptive testing] JF - Apuntes de Psicologia Y1 - 2001 A1 - Olea Díaz, J. A1 - Ponsoda Gil, V. A1 - Revuelta Menéndez, J. A1 - Hontangas Beltrán, P. A1 - Abad, F. J. KW - Computer Assisted Testing KW - English as Second Language KW - Psychometrics computerized adaptive testing AB - Summarizes the main requirements and applications of computerized adaptive testing (CAT) with emphasis on the differences between CAT and conventional computerized tests. Psychometric properties of estimations based on CAT, item selection strategies, and implementation software are described. Results of CAT studies in Spanish-speaking samples are described. Implications for developing a CAT measuring the English vocabulary of Spanish-speaking students are discussed. (PsycINFO Database Record (c) 2005 APA ) VL - 19 ER - TY - ABST T1 - Scoring alternatives for incomplete computerized adaptive tests (Research Report 01-20) Y1 - 2001 A1 - Way, W. D. A1 - Gawlick, L. A. A1 - Eignor, D. R. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - STAR Early Literacy Computer-Adaptive Diagnostic Assessment: Technical Manual Y1 - 2001 A1 - Renaissance-Learning-Inc. CY - Wisconsin Rapids, WI: Author ER - TY - CONF T1 - A system for on-the-fly adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Wagner, M. E. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - JOUR T1 - Test anxiety and test performance: Comparing paper-based and computer-adaptive versions of the Graduate Record Examinations (GRE) General test JF - Journal of educational computing research Y1 - 2001 A1 - Powers, D. E. VL - 24 IS - 3 ER - TY - CONF T1 - Testing a computerized adaptive personality inventory using simulated response data T2 - Paper presented at the annual meeting of the American Psychological Association Y1 - 2001 A1 - Simms, L. JF - Paper presented at the annual meeting of the American Psychological Association CY - San Francisco CA ER - TY - ABST T1 - Testing via the Internet: A literature review and analysis of issues for Department of Defense Internet testing of the Armed Services Vocational Aptitude Battery (ASVAB) in high schools (FR-01-12) Y1 - 2001 A1 - J. R. McBride A1 - Paddock, A. F. A1 - Wise, L. L. A1 - Strickland, W. J. A1 - B. K. Waters CY - Alexandria VA: Human Resources Research Organization N1 - {PDF file, 894 KB} ER - TY - JOUR T1 - Toepassing van een computergestuurde adaptieve testprocedure op persoonlijkheidsdata [Application of a computerised adaptive test procedure on personality data] JF - Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden Y1 - 2001 A1 - Hol, A. M. A1 - Vorst, H. C. M. A1 - Mellenbergh, G. J. KW - Adaptive Testing KW - Computer Applications KW - Computer Assisted Testing KW - Personality Measures KW - Test Reliability computerized adaptive testing AB - Studied the applicability of a computerized adaptive testing procedure to an existing personality questionnaire within the framework of item response theory. The procedure was applied to the scores of 1,143 male and female university students (mean age 21.8 yrs) in the Netherlands on the Neuroticism scale of the Amsterdam Biographical Questionnaire (G. J. Wilde, 1963). The graded response model (F. Samejima, 1969) was used. The quality of the adaptive test scores was measured based on their correlation with test scores for the entire item bank and on their correlation with scores on other scales from the personality test. The results indicate that computerized adaptive testing can be applied to personality scales. (PsycINFO Database Record (c) 2005 APA ) VL - 56 ER - TY - ABST T1 - User's guide for SCORIGHT (version 1): A computer program for scoring tests built of testlets (Research Report 01-06) Y1 - 2001 A1 - Wang, X A1 - Bradlow, E. T. A1 - Wainer, H., CY - Princeton NJ: Educational Testing Service. N1 - #WA01-06 {PDF file, 2.349 MB} ER - TY - ABST T1 - Users guide for SCORIGHT (version 2) : A computer program for scoring tests built of testlets (Research Report 01-06) Y1 - 2001 A1 - Wang, X A1 - Bradlow, E. T. A1 - Wainer, H., CY - Princeton NJ: Educational Testing Service. ER - TY - CONF T1 - Using response times to detect aberrant behavior in computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - van der Linden, W. J. A1 - van Krimpen-Stoop, E. M. L. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - JOUR T1 - Validity issues in computer-based testing JF - Educational Measurement: Issues and Practice Y1 - 2001 A1 - Huff, K. L. A1 - Sireci, S. G. VL - 20(3) ER - TY - ABST T1 - Adaptive mastery testing using a multidimensional IRT model and Bayesian sequential decision theory (Research Report 00-06) Y1 - 2000 A1 - Glas, C. A. W. A1 - Vos, H. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Algoritmo mixto mínima entropía-máxima información para la selección de ítems en un test adaptativo informatizado JF - Psicothema Y1 - 2000 A1 - Dorronsoro, J. R. A1 - Santa-Cruz, C. A1 - Rubio Franco, V. J. A1 - Aguado García, D. KW - computerized adaptive testing AB - El objetivo del estudio que presentamos es comparar la eficacia como estrat egia de selección de ítems de tres algo ritmos dife rentes: a) basado en máxima info rmación; b) basado en mínima entropía; y c) mixto mínima entropía en los ítems iniciales y máxima info rmación en el resto; bajo la hipótesis de que el algo ritmo mixto, puede dotar al TAI de mayor eficacia. Las simulaciones de procesos TAI se re a l i z a ron sobre un banco de 28 ítems de respuesta graduada calibrado según el modelo de Samejima, tomando como respuesta al TAI la respuesta ori ginal de los sujetos que fueron utilizados para la c a l i b ración. Los resultados iniciales mu e s t ran cómo el cri t e rio mixto es más eficaz que cualquiera de los otros dos tomados indep e n d i e n t e m e n t e. Dicha eficacia se maximiza cuando el algo ritmo de mínima entropía se re s t ri n ge a la selección de los pri m e ros ítems del TAI, ya que con las respuestas a estos pri m e ros ítems la estimación de q comienza a ser re l evante y el algo ritmo de máxima informaciónse optimiza.Item selection algo rithms in computeri zed adap t ive testing. The aim of this paper is to compare the efficacy of three different item selection algo rithms in computeri zed adap t ive testing (CAT). These algorithms are based as follows: the first one is based on Item Info rm ation, the second one on Entropy, and the last algo rithm is a mixture of the two previous ones. The CAT process was simulated using an emotional adjustment item bank. This item bank contains 28 graded items in six categories , calibrated using Samejima (1969) Graded Response Model. The initial results show that the mixed criterium algorithm performs better than the other ones. VL - 12 ER - TY - CONF T1 - Applying specific information item selection to a passage-based test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Thompson, T.D. A1 - Davey, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, LA, April ER - TY - CONF T1 - Assembling parallel item pools for computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Wang, T. A1 - Fan, M. Yi, Q. A1 - Ban, J. C. A1 - Zhu, D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans N1 - #WA00-02 ER - TY - JOUR T1 - Capitalization on item calibration error in adaptive testing JF - Applied Measurement in Education Y1 - 2000 A1 - van der Linden, W. J. A1 - Glas, C. A. W. KW - computerized adaptive testing AB - (from the journal abstract) In adaptive testing, item selection is sequentially optimized during the test. Because the optimization takes place over a pool of items calibrated with estimation error, capitalization on chance is likely to occur. How serious the consequences of this phenomenon are depends not only on the distribution of the estimation errors in the pool or the conditional ratio of the test length to the pool size given ability, but may also depend on the structure of the item selection criterion used. A simulation study demonstrated a dramatic impact of capitalization on estimation errors on ability estimation. Four different strategies to minimize the likelihood of capitalization on error in computerized adaptive testing are discussed. VL - 13 N1 - References .Lawrence Erlbaum, US ER - TY - JOUR T1 - CAT administration of language placement examinations JF - Journal of Applied Measurement Y1 - 2000 A1 - Stahl, J. A1 - Bergstrom, B. A1 - Gershon, R. C. KW - *Language KW - *Software KW - Aptitude Tests/*statistics & numerical data KW - Educational Measurement/*statistics & numerical data KW - Humans KW - Psychometrics KW - Reproducibility of Results KW - Research Support, Non-U.S. Gov't AB - This article describes the development of a computerized adaptive test for Cegep de Jonquiere, a community college located in Quebec, Canada. Computerized language proficiency testing allows the simultaneous presentation of sound stimuli as the question is being presented to the test-taker. With a properly calibrated bank of items, the language proficiency test can be offered in an adaptive framework. By adapting the test to the test-taker's level of ability, an assessment can be made with significantly fewer items. We also describe our initial attempt to detect instances in which "cheating low" is occurring. In the "cheating low" situation, test-takers deliberately answer questions incorrectly, questions that they are fully capable of answering correctly had they been taking the test honestly. VL - 1 N1 - 1529-7713Journal Article ER - TY - CHAP T1 - Caveats, pitfalls, and unexpected consequences of implementing large-scale computerized testing Y1 - 2000 A1 - Wainer, H., A1 - Eignor, D. R. CY - Wainer, H. (Ed). Computerized adaptive testing: A primer (2nd ed.). pp. 271-299. Mahwah, NJ: Lawrence Erlbaum Associates. ER - TY - ABST T1 - CBTS: Computer-based testing simulation and analysis [computer software] Y1 - 2000 A1 - Robin, F. CY - Amherst, MA: University of Massachusetts, School of Education ER - TY - CONF T1 - Change in distribution of latent ability with item position in CAT sequence T2 - Paper presented at the annual meeting of the National Council on Measurement in Education in New Orleans Y1 - 2000 A1 - Krass, I. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education in New Orleans CY - LA N1 - {PDF file, 103 KB} ER - TY - JOUR T1 - The choice of item difficulty in self adapted testing JF - European Journal of Psychological Assessment Y1 - 2000 A1 - Hontangas, P. A1 - Ponsoda, V. A1 - Olea, J. A1 - Wise, S. L. VL - 16 IS - 1 ER - TY - CONF T1 - Classification accuracy and test security for a computerized adaptive mastery test calibrated with different IRT models T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Robin, F. A1 - Xing, D. A1 - Scrams, D. A1 - Potenza, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - JOUR T1 - A comparison of computerized adaptive testing and multistage testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 2000 A1 - Patsula, L N. KW - computerized adaptive testing AB - There is considerable evidence to show that computerized-adaptive testing (CAT) and multi-stage testing (MST) are viable frameworks for testing. With many testing organizations looking to move towards CAT or MST, it is important to know what framework is superior in different situations and at what cost in terms of measurement. What was needed is a comparison of the different testing procedures under various realistic testing conditions. This dissertation addressed the important problem of the increase or decrease in accuracy of ability estimation in using MST rather than CAT. The purpose of this study was to compare the accuracy of ability estimates produced by MST and CAT while keeping some variables fixed and varying others. A simulation study was conducted to investigate the effects of several factors on the accuracy of ability estimation using different CAT and MST designs. The factors that were manipulated are the number of stages, the number of subtests per stage, and the number of items per subtest. Kept constant were test length, distribution of subtest information, method of determining cut-points on subtests, amount of overlap between subtests, and method of scoring total test. The primary question of interest was, given a fixed test length, how many stages and many subtests per stage should there be to maximize measurement precision? Furthermore, how many items should there be in each subtest? Should there be more in the routing test or should there be more in the higher stage tests? Results showed that, in general, increasing the number of stages from two to three decreased the amount of errors in ability estimation. Increasing the number of subtests from three to five increased the accuracy of ability estimates as well as the efficiency of the MST designs relative to the P&P and CAT designs at most ability levels (-.75 to 2.25). Finally, at most ability levels (-.75 to 2.25), varying the number of items per stage had little effect on either the resulting accuracy of ability estimates or the relative efficiency of the MST designs to the P&P and CAT designs. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 60 ER - TY - JOUR T1 - A comparison of item selection rules at the early stages of computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Chen, S-Y. A1 - Ankenmann, R. D. A1 - Chang, Hua-Hua KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis (Test) KW - Statistical Estimation computerized adaptive testing AB - The effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 24 ER - TY - JOUR T1 - A comparison of item selection rules at the early stages of computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Chen, S.Y. A1 - Ankenmann, R. D. A1 - Chang, Hua-Hua VL - 24 ER - TY - CHAP T1 - Computer-adaptive sequential testing Y1 - 2000 A1 - Luecht, RM A1 - Nungester, R. J. CY - W. J. van der Linden (Ed.), Computerized Adaptive Testing: Theory and Practice (pp. 289-209). Dordrecht, The Netherlands: Kluwer. ER - TY - CHAP T1 - Computer-adaptive testing: A methodology whose time has come T2 - Development of Computerised Middle School Achievement Tests Y1 - 2000 A1 - Linacre, J. M. ED - Kang, U. ED - Jean, E. ED - Linacre, J. M. KW - computerized adaptive testing JF - Development of Computerised Middle School Achievement Tests PB - MESA CY - Chicago, IL. USA VL - 69 ER - TY - ABST T1 - Computer-adaptive testing: A methodology whose time has come. MESA Memorandum No 9 Y1 - 2000 A1 - Linacre, J. M. CY - Chicago : MESA psychometric laboratory, Unversity of Chicago. ER - TY - JOUR T1 - Computerization and adaptive administration of the NEO PI-R JF - Assessment Y1 - 2000 A1 - Reise, S. P. A1 - Henson, J. M. KW - *Personality Inventory KW - Algorithms KW - California KW - Diagnosis, Computer-Assisted/*methods KW - Humans KW - Models, Psychological KW - Psychometrics/methods KW - Reproducibility of Results AB - This study asks, how well does an item response theory (IRT) based computerized adaptive NEO PI-R work? To explore this question, real-data simulations (N = 1,059) were used to evaluate a maximum information item selection computerized adaptive test (CAT) algorithm. Findings indicated satisfactory recovery of full-scale facet scores with the administration of around four items per facet scale. Thus, the NEO PI-R could be reduced in half with little loss in precision by CAT administration. However, results also indicated that the CAT algorithm was not necessary. We found that for many scales, administering the "best" four items per facet scale would have produced similar results. In the conclusion, we discuss the future of computerized personality assessment and describe the role IRT methods might play in such assessments. VL - 7 N1 - 1073-1911 (Print)Journal Article ER - TY - JOUR T1 - Computerized adaptive administration of the self-evaluation examination JF - AANA.J Y1 - 2000 A1 - LaVelle, T. A1 - Zaglaniczny, K., A1 - Spitzer, L.E. VL - 68 ER - TY - ABST T1 - Computerized adaptive rating scales (CARS): Development and evaluation of the concept Y1 - 2000 A1 - Borman, W. C. A1 - Hanson, M. A. A1 - Kubisiak, U. C. A1 - Buck, D. E. CY - (Institute Rep No. 350). Tampa FL: Personnel Decisions Research Institute. ER - TY - BOOK T1 - Computerized adaptive testing: A primer (2nd edition) Y1 - 2000 A1 - Wainer, H., A1 - Dorans, N. A1 - Eignor, D. R. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale, N. J. : Lawrence Erlbaum Associates ER - TY - JOUR T1 - Computerized adaptive testing for classifying examinees into three categories JF - Educational and Psychological Measurement Y1 - 2000 A1 - Theo Eggen A1 - Straetmans, G. J. J. M. KW - computerized adaptive testing KW - Computerized classification testing AB - The objective of this study was to explore the possibilities for using computerized adaptive testing in situations in which examinees are to be classified into one of three categories.Testing algorithms with two different statistical computation procedures are described and evaluated. The first computation procedure is based on statistical testing and the other on statistical estimation. Item selection methods based on maximum information (MI) considering content and exposure control are considered. The measurement quality of the proposed testing algorithms is reported. The results of the study are that a reduction of at least 22% in the mean number of items can be expected in a computerized adaptive test (CAT) compared to an existing paper-and-pencil placement test. Furthermore, statistical testing is a promising alternative to statistical estimation. Finally, it is concluded that imposing constraints on the MI selection strategy does not negatively affect the quality of the testing algorithms VL - 60 ER - TY - BOOK T1 - Computerized adaptive testing: Theory and practice Y1 - 2000 A1 - van der Linden, W. J. A1 - Glas, C. A. W. PB - Kluwer Academic Publishers CY - Dordrecht, The Netherlands ER - TY - CONF T1 - Computerized testing – the adolescent years: Juvenile delinquent or positive role model T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - CHAP T1 - Constrained adaptive testing with shadow tests Y1 - 2000 A1 - van der Linden, W. J. CY - W. J. van der Linden and C. A. W. Glas (eds.), Computerized adaptive testing: Theory and practice (pp.27-52). Norwell MA: Kluwer. ER - TY - BOOK T1 - The construction and evaluation of a dynamic computerised adaptive test for the measurement of learning potential Y1 - 2000 A1 - De Beer, M. CY - Unpublished D. Litt et Phil dissertation. University of South Africa, Pretoria. ER - TY - CONF T1 - Content balancing in stratified computerized adaptive testing designs T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA N1 - {PDF file, 427 KB} ER - TY - CHAP T1 - Cross-validating item parameter estimation in adaptive testing Y1 - 2000 A1 - van der Linden, W. J. A1 - Glas, C. A. W. CY - A. Boorsma, M. A. J. van Duijn, and T. A. B. Snijders (Eds.) (pp. 205-219), Essays on item response theory. New York: Springer. ER - TY - CHAP T1 - Designing item pools for computerized adaptive testing T2 - Computerized adaptive testing: Theory and practice Y1 - 2000 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. JF - Computerized adaptive testing: Theory and practice PB - Kluwer Academic Publishers CY - Dendrecht, The Netherlands ER - TY - CHAP T1 - Detecting person misfit in adaptive testing using statistical process control techniques T2 - Computer adaptive testing: Theory and practice Y1 - 2000 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. KW - person Fit JF - Computer adaptive testing: Theory and practice PB - Kluwer Academic. CY - Dordrecht, The Netherlands ER - TY - CHAP T1 - Detecting person misfit in adaptive testing using statistical process control techniques Y1 - 2000 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. CY - W. J. van der Linden, and C. A. W. Glas (Editors). Computerized Adaptive Testing: Theory and Practice. Norwell MA: Kluwer. ER - TY - CONF T1 - Detecting test-takers who have memorized items in computerized-adaptive testing and muti-stage testing: A comparison T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Patsula, L N. A1 - McLeod, L. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - JOUR T1 - Detection of known items in adaptive testing with a statistical quality control method JF - Journal of Educational and Behavioral Statistics Y1 - 2000 A1 - Veerkamp, W. J. J. A1 - Glas, C. E. W. VL - 25 ER - TY - ABST T1 - Detection of person misfit in computerized adaptive testing with polytomous items (Research Report 00-01) Y1 - 2000 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Development and evaluation of test assembly procedures for computerized adaptive testing (Laboratory of Psychometric and Evaluative Methods Research Report No 391) Y1 - 2000 A1 - Robin, F. CY - Amherst MA: University of Massachusetts, School of Education. ER - TY - JOUR T1 - The development of a computerized version of Vandenberg's mental rotation test and the effect of visuo-spatial working memory loading JF - Dissertation Abstracts International Section A: Humanities and Social Sciences Y1 - 2000 A1 - Strong, S. D. KW - Computer Assisted Testing KW - Mental Rotation KW - Short Term Memory computerized adaptive testing KW - Test Construction KW - Test Validity KW - Visuospatial Memory AB - This dissertation focused on the generation and evaluation of web-based versions of Vandenberg's Mental Rotation Test. Memory and spatial visualization theory were explored in relation to the addition of a visuo-spatial working memory component. Analysis of the data determined that there was a significant difference between scores on the MRT Computer and MRT Memory test. The addition of a visuo-spatial working memory component did significantly affect results at the .05 alpha level. Reliability and discrimination estimates were higher on the MRT Memory version. The computerization of the paper and pencil version on the MRT did not significantly effect scores but did effect the time required to complete the test. The population utilized in the quasi-experiment consisted of 107 university students from eight institutions in engineering graphics related courses. The subjects completed two researcher developed, Web-based versions of Vandenberg's Mental Rotation Test and the original paper and pencil version of the Mental Rotation Test. One version of the test included a visuo-spatial working memory loading. Significant contributions of this study included developing and evaluating computerized versions of Vandenberg's Mental Rotation Test. Previous versions of Vandenberg's Mental Rotation Test did not take advantage of the ability of the computer to incorporate an interaction factor, such as a visuo-spatial working memory loading, into the test. The addition of an interaction factor results in a more discriminate test which will lend itself well to computerized adaptive testing practices. Educators in engineering graphics related disciplines should strongly consider the use of spatial visualization tests to aid in establishing the effects of modern computer systems on fundamental design/drafting skills. Regular testing of spatial visualization skills will result assist in the creation of a more relevant curriculum. Computerized tests which are valid and reliable will assist in making this task feasible. (PsycINFO Database Record (c) 2005 APA ) VL - 60 ER - TY - JOUR T1 - Diagnostische programme in der Demenzfrüherkennung: Der Adaptive Figurenfolgen-Lerntest (ADAFI) [Diagnostic programs in the early detection of dementia: The Adaptive Figure Series Learning Test (ADAFI)] JF - Zeitschrift für Gerontopsychologie & -Psychiatrie Y1 - 2000 A1 - Schreiber, M. D. A1 - Schneider, R. J. A1 - Schweizer, A. A1 - Beckmann, J. F. A1 - Baltissen, R. KW - Adaptive Testing KW - At Risk Populations KW - Computer Assisted Diagnosis KW - Dementia AB - Zusammenfassung: Untersucht wurde die Eignung des computergestützten Adaptiven Figurenfolgen-Lerntests (ADAFI), zwischen gesunden älteren Menschen und älteren Menschen mit erhöhtem Demenzrisiko zu differenzieren. Der im ADAFI vorgelegte Aufgabentyp der fluiden Intelligenzdimension (logisches Auffüllen von Figurenfolgen) hat sich in mehreren Studien zur Erfassung des intellektuellen Leistungspotentials (kognitive Plastizität) älterer Menschen als günstig für die genannte Differenzierung erwiesen. Aufgrund seiner Konzeption als Diagnostisches Programm fängt der ADAFI allerdings einige Kritikpunkte an Vorgehensweisen in diesen bisherigen Arbeiten auf. Es konnte gezeigt werden, a) daß mit dem ADAFI deutliche Lokationsunterschiede zwischen den beiden Gruppen darstellbar sind, b) daß mit diesem Verfahren eine gute Vorhersage des mentalen Gesundheitsstatus der Probanden auf Einzelfallebene gelingt (Sensitivität: 80 %, Spezifität: 90 %), und c) daß die Vorhersageleistung statusdiagnostischer Tests zur Informationsverarbeitungsgeschwindigkeit und zum Arbeitsgedächtnis geringer ist. Die Ergebnisse weisen darauf hin, daß die plastizitätsorientierte Leistungserfassung mit dem ADAFI vielversprechend für die Frühdiagnostik dementieller Prozesse sein könnte.The aim of this study was to examine the ability of the computerized Adaptive Figure Series Learning Test (ADAFI) to differentiate among old subjects at risk for dementia and old healthy controls. Several studies on the subject of measuring the intellectual potential (cognitive plasticity) of old subjects have shown the usefulness of the fluid intelligence type of task used in the ADAFI (completion of figure series) for this differentiation. Because the ADAFI has been developed as a Diagnostic Program it is able to counter some critical issues in those studies. It was shown a) that distinct differences between both groups are revealed by the ADAFI, b) that the prediction of the cognitive health status of individual subjects is quite good (sensitivity: 80 %, specifity: 90 %), and c) that the prediction of the cognitive health status with tests of processing speed and working memory is worse than with the ADAFI. The results indicate that the ADAFI might be a promising plasticity-oriented tool for the measurement of cognitive decline in the elderly, and thus might be useful for the early detection of dementia. VL - 13 ER - TY - JOUR T1 - Does adaptive testing violate local independence? JF - Psychometrika Y1 - 2000 A1 - Mislevy, R. J. A1 - Chang, Hua-Hua VL - 65 ER - TY - ABST T1 - Effects of item-selection criteria on classification testing with the sequential probability ratio test (Research Report 2000-8) Y1 - 2000 A1 - Lin, C.-J. A1 - Spray, J. A. CY - Iowa City, IA: American College Testing N1 - #LI00-8 ER - TY - CONF T1 - Effects of nonequivalence of item pools on ability estimates in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Ban, J. C. A1 - Wang, T. A1 - Yi, Q. A1 - Harris, D. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - PDF file, 657 K ER - TY - JOUR T1 - Emergence of item response modeling in instrument development and data analysis JF - Medical Care Y1 - 2000 A1 - Hambleton, R. K. KW - Computer Assisted Testing KW - Health KW - Item Response Theory KW - Measurement KW - Statistical Validity computerized adaptive testing KW - Test Construction KW - Treatment Outcomes VL - 38 ER - TY - ABST T1 - Estimating item parameters from classical indices for item pool development with a computerized classification test (Research Report 2000-4) Y1 - 2000 A1 - Huang, C.-Y. A1 - Kalohn, J.C. A1 - Lin, C.-J. A1 - Spray, J. CY - Iowa City IA: ACT Inc ER - TY - RPRT T1 - Estimating Item Parameters from Classical Indices for Item Pool Development with a Computerized Classification Test. Y1 - 2000 A1 - Huang, C.-Y. A1 - Kalohn, J.C. A1 - Lin, C.-J. A1 - Spray, J. A. PB - ACT, Inc. CY - Iowa City, Iowa ER - TY - ABST T1 - Estimating item parameters from classical indices for item pool development with a computerized classification test (ACT Research 2000-4) Y1 - 2000 A1 - Chang, C.-Y. A1 - Kalohn, J.C. A1 - Lin, C.-J. A1 - Spray, J. CY - Iowa City IA, ACT, Inc ER - TY - JOUR T1 - Estimation of trait level in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Cheng, P. E. A1 - Liou, M. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis KW - Statistical Estimation computerized adaptive testing AB - Notes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA ) VL - 24 ER - TY - JOUR T1 - ETS finds flaws in the way online GRE rates some students JF - Chronicle of Higher Education Y1 - 2000 A1 - Carlson, S. VL - 47 ER - TY - CONF T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA ER - TY - JOUR T1 - An examination of the reliability and validity of performance ratings made using computerized adaptive rating scales JF - Dissertation Abstracts International: Section B: The Sciences and Engineering Y1 - 2000 A1 - Buck, D. E. KW - Adaptive Testing KW - Computer Assisted Testing KW - Performance Tests KW - Rating Scales KW - Reliability KW - Test KW - Test Validity AB - This study compared the psychometric properties of performance ratings made using recently-developed computerized adaptive rating scales (CARS) to the psyc hometric properties of ratings made using more traditional paper-and-pencil rati ng formats, i.e., behaviorally-anchored and graphic rating scales. Specifically, the reliability, validity and accuracy of the performance ratings from each for mat were examined. One hundred twelve participants viewed six 5-minute videotape s of office situations and rated the performance of a target person in each vide otape on three contextual performance dimensions-Personal Support, Organizationa l Support, and Conscientious Initiative-using CARS and either behaviorally-ancho red or graphic rating scales. Performance rating properties were measured using Shrout and Fleiss's intraclass correlation (2, 1), Borman's differential accurac y measure, and Cronbach's accuracy components as indexes of rating reliability, validity, and accuracy, respectively. Results found that performance ratings mad e using the CARS were significantly more reliable and valid than performance rat ings made using either of the other formats. Additionally, CARS yielded more acc urate performance ratings than the paper-and-pencil formats. The nature of the C ARS system (i.e., its adaptive nature and scaling methodology) and its paired co mparison judgment task are offered as possible reasons for the differences found in the psychometric properties of the performance ratings made using the variou s rating formats. (PsycINFO Database Record (c) 2005 APA ) VL - 61 ER - TY - JOUR T1 - An exploratory analysis of item parameters and characteristics that influence item level response time JF - Dissertation Abstracts International Section A: Humanities and Social Sciences Y1 - 2000 A1 - Smith, Russell Winsor KW - Item Analysis (Statistical) KW - Item Response Theory KW - Problem Solving KW - Reaction Time KW - Reading Comprehension KW - Reasoning AB - This research examines the relationship between item level response time and (1) item discrimination, (2) item difficulty, (3) word count, (4) item type, and (5) whether a figure is included in an item. Data are from the Graduate Management Admission Test, which is currently offered only as a computerized adaptive test. Analyses revealed significant differences in response time between the five item types: problem solving, data sufficiency, sentence correction, critical reasoning, and reading comprehension. For this reason, the planned pairwise and complex analyses were run within each item type. Pairwise curvilinear regression analyses explored the relationship between response time and item discrimination, item difficulty, and word count. Item difficulty significantly contributed to the prediction of response time for each item type; two of the relationships were significantly quadratic. Item discrimination significantly contributed to the prediction of response time for only two of the item types; one revealed a quadratic relationship and the other a cubic relationship. Word count had significant linear relationship with response time for all the item types except reading comprehension, for which there was no significant relationship. Multiple regression analyses using word count, item difficulty, and item discrimination predicted between 35.4% and 71.4% of the variability in item response time across item types. The results suggest that response time research should consider the type of item that is being administered and continue to explore curvilinear relationships between response time and its predictor variables. (PsycINFO Database Record (c) 2005 APA ) VL - 61 ER - TY - ABST T1 - A framework for comparing adaptive test designs Y1 - 2000 A1 - Stocking, M. L. CY - Unpublished manuscript ER - TY - CONF T1 - From simulation to application: Examinees react to computerized testing T2 - Paper presented at the annual meeting of the National Council of Measurement in Education Y1 - 2000 A1 - Pommerich, M A1 - Burden, T. JF - Paper presented at the annual meeting of the National Council of Measurement in Education CY - New Orleans, April 2000 ER - TY - CHAP T1 - The GRE computer adaptive test: Operational issues Y1 - 2000 A1 - Mills, C. N. A1 - Steffen, M. CY - W. J. van der Linden and C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 75-99). Dordrecht, Netherlands: Kluwer. ER - TY - JOUR T1 - The impact of receiving the same items on consecutive computer adaptive test administrations JF - Journal of Applied Measurement Y1 - 2000 A1 - O'Neill, T. A1 - Lunz, M. E. A1 - Thiede, K. AB - Addresses item exposure in a Computerized Adaptive Test (CAT) when the item selection algorithm is permitted to present examinees with questions that they have already been asked in a previous test administration. The data were from a national certification exam in medical technology. The responses of 178 repeat examinees were compared. The results indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate person ability provides substantial protection from score contamination. The implications for constraints that prohibit examinees from seeing an item twice are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 1 N1 - Richard M Smith, US ER - TY - CONF T1 - Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests T2 - Symposium paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Luecht, RM JF - Symposium paper presented at the Annual Meeting of the National Council on Measurement in Education CY - New Orleans, LA ER - TY - JOUR T1 - An integer programming approach to item bank design JF - Applied Psychological Measurement Y1 - 2000 A1 - van der Linden, W. J. A1 - Veldkamp, B. P. A1 - Reese, L. M. KW - Aptitude Measures KW - Item Analysis (Test) KW - Item Response Theory KW - Test Construction KW - Test Items AB - An integer programming approach to item bank design is presented that can be used to calculate an optimal blueprint for an item bank, in order to support an existing testing program. The results are optimal in that they minimize the effort involved in producing the items as revealed by current item writing patterns. Also presented is an adaptation of the models, which can be used as a set of monitoring tools in item bank management. The approach is demonstrated empirically for an item bank that was designed for the Law School Admission Test. VL - 24 ER - TY - ABST T1 - An investigation of approaches to computerizing the GRE subject tests (GRE Board Professional Report No 93-08P; Educational Testing Service Research Report 00-4) Y1 - 2000 A1 - Stocking, M. L. A1 - Smith, R. A1 - Swanson, L. CY - Princeton NJ: Educational Testing Service. N1 - #ST00-01 ER - TY - CHAP T1 - Item calibration and parameter drift Y1 - 2000 A1 - Glas, C. A. W. CY - W. J. van der linden and C. A. W. Glas (Eds.). Computerized adaptive testing: Theory and practice (pp.183-199). Norwell MA: Kluwer Academic. ER - TY - JOUR T1 - Item exposure control in computer-adaptive testing: The use of freezing to augment stratification JF - Florida Journal of Educational Research Y1 - 2000 A1 - Parshall, C. A1 - Harmes, J. C. A1 - Kromrey, J. D. VL - 40 ER - TY - CHAP T1 - Item pools Y1 - 2000 A1 - Flaugher, R. CY - Wainer, H. (2000). Computerized adaptive testing: a primer. Mahwah, NJ: Erlbaum. ER - TY - JOUR T1 - Item response theory and health outcomes measurement in the 21st century JF - Medical Care Y1 - 2000 A1 - Hays, R. D. A1 - Morales, L. S. A1 - Reise, S. P. KW - *Models, Statistical KW - Activities of Daily Living KW - Data Interpretation, Statistical KW - Health Services Research/*methods KW - Health Surveys KW - Human KW - Mathematical Computing KW - Outcome Assessment (Health Care)/*methods KW - Research Design KW - Support, Non-U.S. Gov't KW - Support, U.S. Gov't, P.H.S. KW - United States AB - Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. VL - 38 N1 - 204349670025-7079Journal Article ER - TY - JOUR T1 - Item selection algorithms in computerized adaptive testing JF - Psicothema Y1 - 2000 A1 - Garcia, David A. A1 - Santa Cruz, C. A1 - Dorronsoro, J. R. A1 - Rubio Franco, V. J. AB - Studied the efficacy of 3 different item selection algorithms in computerized adaptive testing. Ss were 395 university students (aged 20-25 yrs) in Spain. Ss were asked to submit answers via computer to 28 items of a personality questionnaire using item selection algorithms based on maximum item information, entropy, or mixed item-entropy algorithms. The results were evaluated according to ability of Ss to use item selection algorithms and number of questions. Initial results indicate that mixed criteria algorithms were more efficient than information or entropy algorithms for up to 15 questionnaire items, but that differences in efficiency decreased with increasing item number. Implications for developing computer adaptive testing methods are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 12 N1 - Spanish .Algoritmo mixto minima entropia-maxima informacion para la seleccion de items en un test adaptativo informatizado..Universidad de Oviedo, Spain ER - TY - CHAP T1 - Item selection and ability estimation in adaptive testing T2 - Computerized adaptive testing: Theory and practice Y1 - 2000 A1 - van der Linden, W. J. A1 - Pashley, P. J. JF - Computerized adaptive testing: Theory and practice PB - Kluwer Academic Publishers CY - Dordrecht, The Netherlands ER - TY - BOOK T1 - La distribution dchantillonnage en testing adaptatif en fonction de deux rgles darrt : selon lerreur type et selon le nombre ditems administrs [Sampling distribution of the proficiency estimate in computerized adaptive testing according to two stopping... Y1 - 2000 A1 - Rache, G. CY - Doctoral thesis, Montreal: University of Montreal N1 - . ER - TY - JOUR T1 - Lagrangian relaxation for constrained curve-fitting with binary variables: Applications in educational testing JF - Dissertation Abstracts International Section A: Humanities and Social Sciences Y1 - 2000 A1 - Koppel, N. B. KW - Analysis KW - Educational Measurement KW - Mathematical Modeling KW - Statistical AB - This dissertation offers a mathematical programming approach to curve fitting with binary variables. Various Lagrangian Relaxation (LR) techniques are applied to constrained curve fitting. Applications in educational testing with respect to test assembly are utilized. In particular, techniques are applied to both static exams (i.e. conventional paper-and-pencil (P&P)) and adaptive exams (i.e. a hybrid computerized adaptive test (CAT) called a multiple-forms structure (MFS)). This dissertation focuses on the development of mathematical models to represent these test assembly problems as constrained curve-fitting problems with binary variables and solution techniques for the test development. Mathematical programming techniques are used to generate parallel test forms with item characteristics based on item response theory. A binary variable is used to represent whether or not an item is present on a form. The problem of creating a test form is modeled as a network flow problem with additional constraints. In order to meet the target information and the test characteristic curves, a Lagrangian relaxation heuristic is applied to the problem. The Lagrangian approach works by multiplying the constraint by a "Lagrange multiplier" and adding it to the objective. By systematically varying the multiplier, the test form curves approach the targets. This dissertation explores modifications to Lagrangian Relaxation as it is applied to the classical paper-and-pencil exams. For the P&P exams, LR techniques are also utilized to include additional practical constraints to the network problem, which limit the item selection. An MFS is a type of a computerized adaptive test. It is a hybrid of a standard CAT and a P&P exam. The concept of an MFS will be introduced in this dissertation, as well as, the application of LR as it is applied to constructing parallel MFSs. The approach is applied to the Law School Admission Test for the assembly of the conventional P&P test as well as an experimental computerized test using MFSs. (PsycINFO Database Record (c) 2005 APA ) VL - 61 ER - TY - BOOK T1 - Learning Potential Computerised Adaptive Test (LPCAT): Technical Manual Y1 - 2000 A1 - De Beer, M. CY - Pretoria: UNISA N1 - #deBE00-01 ER - TY - BOOK T1 - Learning Potential Computerised Adaptive Test (LPCAT): User's Manual Y1 - 2000 A1 - De Beer, M. CY - Pretoria: UNISA N1 - #deBE00-02 ER - TY - JOUR T1 - Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results JF - Journal of Educational Measurement Y1 - 2000 A1 - Vispoel, W. P. A1 - Hendrickson, A. B. A1 - Bleiler, T. VL - 37 ER - TY - JOUR T1 - Los tests adaptativos informatizados en la frontera del siglo XXI: Una revisión [Computerized adaptive tests at the turn of the 21st century: A review] JF - Metodología de las Ciencias del Comportamiento Y1 - 2000 A1 - Hontangas, P. A1 - Ponsoda, V. A1 - Olea, J. A1 - Abad, F. J. KW - computerized adaptive testing VL - 2 SN - 1575-9105 ER - TY - CHAP T1 - Methods of controlling the exposure of items in CAT Y1 - 2000 A1 - Stocking, M. L. A1 - Lewis, C. CY - W. J. van der Linden and C. A. W. Glas (eds.), Computerized adaptive testing: Theory and practice (pp. 163-182). Norwell MA: Kluwer. ER - TY - CHAP T1 - A minimax solution for sequential classification problems Y1 - 2000 A1 - Vos, H. J. CY - H. A. L. Kiers, J.-P.Rasson, P. J. F. Groenen, and M. Schader (Eds.), Data analysis, classification, and related methods (pp. 121-126). Berlin: Springer. N1 - #VO00101 ER - TY - CHAP T1 - MML and EAP estimation in testlet-based adaptive testing T2 - Computerized adaptive testing: Theory and practice Y1 - 2000 A1 - Glas, C. A. W. A1 - Wainer, H., A1 - Bradlow, E. T. JF - Computerized adaptive testing: Theory and practice PB - Kluwer Academic Publishers CY - Dordrecht, The Netherlands ER - TY - ABST T1 - Modifications of the branch-and-bound algorithm for application in constrained adaptive testing (Research Report 00-05) Y1 - 2000 A1 - Veldkamp, B. P. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Multidimensional adaptive testing with constraints on test content (Research Report 00-11) Y1 - 2000 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Multiple stratification CAT designs with content control Y1 - 2000 A1 - Yi, Q. A1 - Chang, Hua-Hua CY - Unpublished manuscript ER - TY - CONF T1 - A new item selection procedure for mixed item type in computerized classification testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Lau, C. A1 - Wang, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans N1 - {PDF file, 452 KB} ER - TY - JOUR T1 - The null distribution of person-fit statistics for conventional and adaptive tests JF - Applied Psychological Measurement Y1 - 2000 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. VL - 23 ER - TY - ABST T1 - Optimal stratification of item pools in a-stratified computerized adaptive testing (Research Report 00-07) Y1 - 2000 A1 - van der Linden, W. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Overview of the computerized adaptive testing special section JF - Psicológica Y1 - 2000 A1 - Ponsoda, V. KW - Adaptive Testing KW - Computers computerized adaptive testing AB - This paper provides an overview of the five papers included in the Psicologica special section on computerized adaptive testing. A short introduction to this topic is presented as well. The main results, the links between the five papers and the general research topic to which they are more related are also shown. (PsycINFO Database Record (c) 2005 APA ) VL - 21 ER - TY - CONF T1 - Performance of item exposure control methods in computerized adaptive testing: Further explorations T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 2000 A1 - Chang, Hua-Hua A1 - Chang, S. A1 - Ansley JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - New Orleans , LA ER - TY - JOUR T1 - Practical issues in developing and maintaining a computerized adaptive testing program JF - Psicologica Y1 - 2000 A1 - Wise, S. L. A1 - Kingsbury, G. G. VL - 21 ER - TY - CHAP T1 - Principles of multidimensional adaptive testing Y1 - 2000 A1 - Segall, D. O. CY - W. J. van der Linden and C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 53-73). Norwell MA: Kluwer. ER - TY - JOUR T1 - Psychological reactions to adaptive testing JF - International Journal of Selection and Assessment Y1 - 2000 A1 - Tonidandel, S., A1 - Quiñones, M. A. VL - 8 ER - TY - JOUR T1 - Psychometric and psychological effects of review on computerized fixed and adaptive tests JF - Psicolgia Y1 - 2000 A1 - Olea, J. A1 - Revuelta, J. A1 - Ximenez, M. C. A1 - Abad, F. J. VL - 21 ER - TY - JOUR T1 - A real data simulation of computerized adaptive administration of the MMPI-A JF - Computers in Human Behavior Y1 - 2000 A1 - Forbey, J. D. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. VL - 16 ER - TY - JOUR T1 - A real data simulation of computerized adaptive administration of the MMPI-A JF - Computers in Human Behavior Y1 - 2000 A1 - Fobey, J. D. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. AB - A real data simulation of computerized adaptive administration of the Minnesota Multiphasic Inventory-Adolescent (MMPI-A) was conducted using item responses from three groups of participants. The first group included 196 adolescents (age range 14-18) tested at a midwestern residential treatment facility for adolescents. The second group was the normative sample used in the standardization of the MMPI-A (Butcher, Williams, Graham, Archer, Tellegen, Ben-Porath, & Kaemmer, 1992. Minnesota Multiphasic Inventory-Adolescent (MMPI-A): manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.). The third group was the clinical sample: used in the validation of the MMPI-A (Williams & Butcher, 1989. An MMPI study of adolescents: I. Empirical validation of the study's scales. Personality assessment, 1, 251-259.). The MMPI-A data for each group of participants were run through a modified version of the MMPI-2 adaptive testing computer program (Roper, Ben-Porath & Butcher, 1995. Comparability and validity of computerized adaptive testing with the MMPI-2. Journal of Personality Assessment, 65, 358-371.). To determine the optimal amount of item savings, each group's MMPI-A item responses were used to simulate three different orderings of the items: (1) from least to most frequently endorsed in the keyed direction; (2) from least to most frequently endorsed in the keyed direction with the first 120 items rearranged into their booklet order; and (3) all items in booklet order. The mean number of items administered for each group was computed for both classification and full- scale elevations for T-score cut-off values of 60 and 65. Substantial item administration savings were achieved for all three groups, and the mean number of items saved ranged from 50 items (10.7% of the administered items) to 123 items (26.4% of the administered items), depending upon the T-score cut-off, classification method (i.e. classification only or full-scale elevation), and group. (C) 2000 Elsevier Science Ltd. All rights reserved. VL - 16 ER - TY - JOUR T1 - Rescuing computerized testing by breaking Zipf’s law JF - Journal of Educational and Behavioral Statistics Y1 - 2000 A1 - Wainer, H., VL - 25 ER - TY - JOUR T1 - Response to Hays et al and McHorney and Cohen: Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales JF - Medical Care Y1 - 2000 A1 - Ware, J. E., Jr. A1 - Bjorner, J. B. A1 - Kosinski, M. VL - 38 ER - TY - JOUR T1 - A review of CAT review JF - Popular Measurement Y1 - 2000 A1 - Sekula-Wacura, R. AB - Studied the effects of answer review on results of a computerized adaptive test, the laboratory professional examination of the American Society of Clinical Pathologists. Results from 29,293 candidates show that candidates who changed answers were more likely to improve their scores. (SLD) VL - 3 ER - TY - ABST T1 - A selection procedure for polytomous items in computerized adaptive testing (Measurement and Research Department Reports 2000-5) Y1 - 2000 A1 - Rijn, P. W. van, A1 - Theo Eggen A1 - Hemker, B. T. A1 - Sanders, P. F. CY - Arnhem, The Netherlands: Cito ER - TY - CONF T1 - Solving complex constraints in a-stratified computerized adaptive testing designs T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, USA N1 - PDF file, 384 K ER - TY - CONF T1 - Some considerations for improving accuracy of estimation of item characteristic curves in online calibration of computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Samejima, F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - CONF T1 - Specific information item selection for adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Davey, T. A1 - Fan, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans ER - TY - ABST T1 - STAR Reading 2 Computer-Adaptive Reading Test and Database: Technical Manual Y1 - 2000 A1 - Renaissance-Learning-Inc. CY - Wisconsin Rapids, WI: Author ER - TY - CONF T1 - Sufficient simplicity or comprehensive complexity? A comparison of probabilitic and stratification methods of exposure control T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Parshall, C. G. A1 - Kromrey, J. D. A1 - Hogarty, K. Y. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - JOUR T1 - Taylor approximations to logistic IRT models and their use in adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 2000 A1 - Veerkamp, W. J. J. KW - computerized adaptive testing AB - Taylor approximation can be used to generate a linear approximation to a logistic ICC and a linear ability estimator. For a specific situation it will be shown to result in a special case of a Robbins-Monro item selection procedure for adaptive testing. The linear estimator can be used for the situation of zero and perfect scores when maximum likelihood estimation fails to come up with a finite estimate. It is also possible to use this estimator to generate starting values for maximum likelihood and weighted likelihood estimation. Approximations to the expectation and variance of the linear estimator for a sequence of Robbins-Monro item selections can be determined analytically. VL - 25 ER - TY - CONF T1 - Test security and item exposure control for computer-based T2 - Paper presented at the annual meeting of the National Council on Measurement in Educatio Y1 - 2000 A1 - Kalohn, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Educatio CY - Chicago ER - TY - CONF T1 - Test security and the development of computerized tests T2 - Paper presented at the National Council on Measurement in Education invited symposium: Maintaining test security in computerized programs–Implications for practice Y1 - 2000 A1 - Guo, F. A1 - Way, W. D. A1 - Reshetar, R. JF - Paper presented at the National Council on Measurement in Education invited symposium: Maintaining test security in computerized programs–Implications for practice CY - New Orleans ER - TY - CHAP T1 - Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing Y1 - 2000 A1 - Wainer, H., A1 - Bradlow, E. T. A1 - Du, Z. CY - W. J. van der Linden and C. A. W. Glas (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 245-270). Norwell MA: Kluwer. ER - TY - CHAP T1 - Testlet-based adaptive mastery testing, W Y1 - 2000 A1 - Vos, H. J. A1 - Glas, C. A. W. CY - J. van der Linden (Ed.), Computerized adaptive testing: Theory and practice (pp. 289-309). Norwell MA: Kluwer. ER - TY - ABST T1 - Testlet-based Designs for Computer-Based Testing in a Certification and Licensure Setting Y1 - 2000 A1 - Pitoniak, M. J. CY - Jersey City, NJ: AICPA Technical Report ER - TY - CHAP T1 - Using Bayesian Networks in Computerized Adaptive Tests Y1 - 2000 A1 - Millan, E. A1 - Trella, M A1 - Perez-de-la-Cruz, J.-L. A1 - Conejo, R CY - M. Ortega and J. Bravo (Eds.),Computers and Education in the 21st Century. Kluwer, pp. 217228. ER - TY - CONF T1 - Using constraints to develop and deliver adaptive tests T2 - Paper presented at the Computer-Assisted Testing Conference. Y1 - 2000 A1 - Abdullah, S. C A1 - Cooley, R. E. JF - Paper presented at the Computer-Assisted Testing Conference. N1 - {PDF file, 46 KB} ER - TY - ABST T1 - Using response times to detect aberrant behavior in computerized adaptive testing (Research Report 00-09) Y1 - 2000 A1 - van der Linden, W. J. A1 - van Krimpen-Stoop, E. M. L. A. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Variations in mean response times for questions on the computer-adaptive GRE general test: Implications for fair assessment (GRE Board Professional Report No Y1 - 2000 A1 - Bridgeman, B. A1 - Cline, F. CY - 96-20P: Educational Testing Service Research Report 00-7) N1 - #BR00-01 Princeton NJ: Educational Testing Service. ER - TY - ABST T1 - Adaptive testing with equated number-correct scoring (Research Report 99-02) Y1 - 1999 A1 - van der Linden, W. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - CONF T1 - Adjusting computer adaptive test starting points to conserve item pool T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1999 A1 - Zhu, D. A1 - Fan. M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Montreal, Canada ER - TY - CONF T1 - Adjusting "scores" from a CAT following successful item challenges T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1999 A1 - Wang, T. A1 - Yi, Q. A1 - Ban, J. C. A1 - Harris, D. J. A1 - Hanson, B. A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Montreal, Canada N1 - #WA99-01 {PDF file, 150 KB} ER - TY - CONF T1 - Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education< Montreal Y1 - 1999 A1 - Robin, F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education< Montreal CY - Canada ER - TY - JOUR T1 - Alternative methods for the detection of item preknowledge in computerized adaptive testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1999 A1 - McLeod, Lori Davis KW - computerized adaptive testing VL - 59 ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - Chang, Hua-Hua A1 - Ying, Z. VL - 23 ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - Chang, Hua-Hua A1 - Ying, Z. KW - computerized adaptive testing AB - For computerized adaptive tests (CAT) based on the three-parameter logistic mode it was found that administering items with low discrimination parameter (a) values early in the test and administering those with high a values later was advantageous; the skewness of item exposure distributions was reduced while efficiency was maintain in trait level estimation. Thus, a new multistage adaptive testing approach is proposed that factors a into the item selection process. In this approach, the items in the item bank are stratified into a number of levels based on their a values. The early stages of a test use items with lower as and later stages use items with higher as. At each stage, items are selected according to an optimization criterion from the corresponding level. Simulation studies were performed to compare a-stratified CATs with CATs based on the Sympson-Hetter method for controlling item exposure. Results indicated that this new strategy led to tests that were well-balanced, with respect to item exposure, and efficient. The a-stratified CATs achieved a lower average exposure rate than CATs based on Bayesian or information-based item selection and the Sympson-Hetter method. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 23 N1 - Sage Publications, US ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - Chang, Hua-Hua A1 - Ying, Z. VL - 23 ER - TY - CONF T1 - Automated flawed item detection and graphical item used in on-line calibration of CAT-ASVAB. T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Krass, I. A. A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - A Bayesian random effects model for testlets JF - Psychometrika Y1 - 1999 A1 - Bradlow, E. T. A1 - Wainer, H., A1 - Wang, X VL - 64 ER - TY - JOUR T1 - Benefits from computerized adaptive testing as seen in simulation studies JF - European Journal of Psychological Assessment Y1 - 1999 A1 - Hornke, L. F. VL - 15 ER - TY - JOUR T1 - Can examinees use a review option to obtain positively biased ability estimates on a computerized adaptive test? JF - Journal of Educational Measurement Y1 - 1999 A1 - Vispoel, W. P. A1 - Rocklin, T. R. A1 - Wang, T. A1 - Bleiler, T. VL - 36 ER - TY - CONF T1 - CAT administration of language placement exams T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Stahl, J. A1 - Gershon, R. C. A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CHAP T1 - CAT for certification and licensure T2 - Innovations in computerized assessment Y1 - 1999 A1 - Bergstrom, Betty A. A1 - Lunz, M. E. KW - computerized adaptive testing AB - (from the chapter) This chapter discusses implementing computerized adaptive testing (CAT) for high-stakes examinations that determine whether or not a particular candidate will be certified or licensed. The experience of several boards who have chosen to administer their licensure or certification examinations using the principles of CAT illustrates the process of moving into this mode of administration. Examples of the variety of options that can be utilized within a CAT administration are presented, the decisions that boards must make to implement CAT are discussed, and a timetable for completing the tasks that need to be accomplished is provided. In addition to the theoretical aspects of CAT, practical issues and problems are reviewed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). JF - Innovations in computerized assessment PB - Lawrence Erlbaum Associates CY - Mahwah, N.J. N1 - Using Smart Source ParsingInnovations in computerized assessment. (pp. 67-91). xiv, 266pp ER - TY - CONF T1 - A comparative study of ability estimates from computer-adaptive testing and multi-stage testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Patsula, L N. A1 - Hambleton, R. K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal Canada ER - TY - BOOK T1 - A comparison of computerized-adaptive testing and multi-stage testing Y1 - 1999 A1 - Patsula, L N. CY - Unpublished doctoral dissertation, University of Massachusetts at Amherst ER - TY - CONF T1 - A comparison of conventional and adaptive testing procedures for making single-point decisions T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Kingsbury, G. G. A1 - A Zara JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada N1 - #KI99-1 ER - TY - CONF T1 - Comparison of stratum scored and maximum likelihood scoring T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Wise, S. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - ABST T1 - A comparison of testlet-based test designs for computerized adaptive testing (LSAC Computerized Testing Report 97-01) Y1 - 1999 A1 - Schnipke, D. L. A1 - Reese, L. M. CY - Newtown, PA: LSAC. ER - TY - CONF T1 - Comparison of the a-stratified method, the Sympson-Hetter method, and the matched difficulty method in CAT administration T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1999 A1 - Ban, J A1 - Wang, T. A1 - Yi, Q. JF - Paper presented at the annual meeting of the Psychometric Society CY - Lawrence KS N1 - #BA99-01 ER - TY - JOUR T1 - Competency gradient for child-parent centers JF - Journal of Outcomes Measurement Y1 - 1999 A1 - Bezruczko, N. KW - *Models, Statistical KW - Activities of Daily Living/classification/psychology KW - Adolescent KW - Chicago KW - Child KW - Child, Preschool KW - Early Intervention (Education)/*statistics & numerical data KW - Female KW - Follow-Up Studies KW - Humans KW - Male KW - Outcome and Process Assessment (Health Care)/*statistics & numerical data AB - This report describes an implementation of the Rasch model during the longitudinal evaluation of a federally-funded early childhood preschool intervention program. An item bank is described for operationally defining a psychosocial construct called community life-skills competency, an expected teenage outcome of the preschool intervention. This analysis examined the position of teenage students on this scale structure, and investigated a pattern of cognitive operations necessary for students to pass community life-skills test items. Then this scale structure was correlated with nationally standardized reading and math achievement scores, teacher ratings, and school records to assess its validity as a measure of the community-related outcome goal for this intervention. The results show a functional relationship between years of early intervention and magnitude of effect on the life-skills competency variable. VL - 3 N1 - 1090-655X (Print)Journal ArticleResearch Support, U.S. Gov't, P.H.S. ER - TY - JOUR T1 - Computerized adaptive assessment with the MMPI-2 in a clinical setting JF - Psychological Assessment Y1 - 1999 A1 - Handel, R. W. A1 - Ben-Porath, Y. S. A1 - Watt, M. E. VL - 11 ER - TY - ABST T1 - Computerized adaptive testing in the Bundeswehr Y1 - 1999 A1 - Storm, E. G. CY - Unpublished manuscript N1 - #ST99-01 {PDF file, 427 KB} ER - TY - JOUR T1 - Computerized adaptive testing: Overview and introduction JF - Applied Psychological Measurement Y1 - 1999 A1 - Meijer, R. R. A1 - Nering, M. L. VL - 23 ER - TY - JOUR T1 - Computerized Adaptive Testing: Overview and Introduction JF - Applied Psychological Measurement Y1 - 1999 A1 - Meijer, R. R. A1 - Nering, M. L. KW - computerized adaptive testing AB - Use of computerized adaptive testing (CAT) has increased substantially since it was first formulated in the 1970s. This paper provides an overview of CAT and introduces the contributions to this Special Issue. The elements of CAT discussed here include item selection procedures, estimation of the latent trait, item exposure, measurement precision, and item bank development. Some topics for future research are also presented. VL - 23 ER - TY - Generic T1 - Computerized classification testing under practical constraints with a polytomous model T2 - annual meeting of the American Educational Research Association Y1 - 1999 A1 - Lau, CA A1 - Wang, T. JF - annual meeting of the American Educational Research Association CY - Montreal, Quebec, Canada ER - TY - CONF T1 - Computerized classification testing under practical constraints with a polytomous model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1999 A1 - Lau, C. A, A1 - Wang, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Montreal N1 - PDF file, 579 K ER - TY - CONF T1 - Computerized testing – Issues and applications (Mini-course manual) T2 - Annual Meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Parshall, C. A1 - Davey, T. A1 - Spray, J. A1 - Kalohn, J. JF - Annual Meeting of the National Council on Measurement in Education CY - Montreal ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the National council on Measurement in Education Y1 - 1999 A1 - Fan, M. A1 - Thompson, T. A1 - Davey, T. JF - Paper presented at the annual meeting of the National council on Measurement in Education CY - Montreal N1 - #FA99-01 ER - TY - CHAP T1 - Creating computerized adaptive tests of music aptitude: Problems, solutions, and future directions Y1 - 1999 A1 - Vispoel, W. P. CY - F. Drasgow and J. B. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 151-176). Mahwah NJ: Erlbaum. ER - TY - ABST T1 - Current and future research in multi-stage testing (Research Report No 370) Y1 - 1999 A1 - Zenisky, A. L. CY - Amherst MA: University of Massachusetts, Laboratory of Pychometric and Evaluative Research. N1 - {PDF file, 131 KB} ER - TY - JOUR T1 - CUSUM-based person-fit statistics for adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 1999 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. VL - 26 ER - TY - ABST T1 - CUSUM-based person-fit statistics for adaptive testing (Research Report 99-05) Y1 - 1999 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Designing item pools for computerized adaptive testing (Research Report 99-03 ) Y1 - 1999 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Detecting item memorization in the CAT environment JF - Applied Psychological Measurement Y1 - 1999 A1 - McLeod L. D., A1 - Lewis, C. VL - 23 ER - TY - CONF T1 - Detecting items that have been memorized in the CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - McLeod, L. D. A1 - Schinpke, D. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CHAP T1 - Developing computerized adaptive tests for school children Y1 - 1999 A1 - Kingsbury, G. G. A1 - Houser, R.L. CY - F. Drasgow and J. B. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 93-115). Mahwah NJ: Erlbaum. ER - TY - CONF T1 - The development and cognitive evaluation of an audio-assisted computer-adaptive test for eight-grade mathematics T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Williams, V. S. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CHAP T1 - Development and introduction of a computer adaptive Graduate Record Examination General Test Y1 - 1999 A1 - Mills, C. N. CY - F. Drasgow and J .B. Olson-Buchanan (Eds.). Innovations in computerized assessment (pp. 117-135). Mahwah NJ: Erlbaum. ER - TY - CHAP T1 - The development of a computerized adaptive selection system for computer programmers in a financial services company Y1 - 1999 A1 - Zickar, M.. J. A1 - Overton, R. C. A1 - Taylor, L. R. A1 - Harms, H. J. CY - F. Drasgow and J. B. Olsen (Eds.), Innvoations in computerized assessment (p. 7-33). Mahwah NJ Erlbaum. ER - TY - JOUR T1 - The development of an adaptive test for placement in french JF - Studies in language testing Y1 - 1999 A1 - Laurier, M. VL - 10 ER - TY - CHAP T1 - Development of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery Y1 - 1999 A1 - Segall, D. O. A1 - Moreno, K. E. CY - F. Drasgow and J. Olson-Buchanan (Eds.). Innovations in computerized assessment. Mahwah NJ: Erlbaum. ER - TY - JOUR T1 - Dynamic health assessments: The search for more practical and more precise outcomes measures JF - Quality of Life Newsletter Y1 - 1999 A1 - Ware, J. E., Jr. A1 - Bjorner, J. B. A1 - Kosinski, M. N1 - {PDF file, 75 KB} ER - TY - JOUR T1 - The effect of model misspecification on classification decisions made using a computerized test JF - Journal of Educational Measurement Y1 - 1999 A1 - Kalohn, J.C. A1 - Spray, J. A. KW - computerized adaptive testing AB - Many computerized testing algorithms require the fitting of some item response theory (IRT) model to examinees' responses to facilitate item selection, the determination of test stopping rules, and classification decisions. Some IRT models are thought to be particularly useful for small volume certification programs that wish to make the transition to computerized adaptive testing (CAT). The 1-parameter logistic model (1-PLM) is usually assumed to require a smaller sample size than the 3-parameter logistic model (3-PLM) for item parameter calibrations. This study examined the effects of model misspecification on the precision of the decisions made using the sequential probability ratio test. For this comparison, the 1-PLM was used to estimate item parameters, even though the items' characteristics were represented by a 3-PLM. Results demonstrate that the 1-PLM produced considerably more decision errors under simulation conditions similar to a real testing environment, compared to the true model and to a fixed-form standard reference set of items. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 36 N1 - National Council on Measurement in Education, US ER - TY - JOUR T1 - The effects of test difficulty manipulation in computerized adaptive testing and self-adapted testing JF - Applied Measurement in Education Y1 - 1999 A1 - Ponsoda, V. A1 - Olea, J. A1 - Rodriguez, M. S. A1 - Revuelta, J. VL - 12 ER - TY - JOUR T1 - Empirical initialization of the trait estimator in adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - van der Linden, W. J. VL - 23 N1 - [Error correction in 23, 248] ER - TY - CONF T1 - An enhanced stratified computerized adaptive testing design T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1999 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Montreal, Canada N1 - {PDF file, 478 KB} ER - TY - JOUR T1 - Evaluating the usefulness of computerized adaptive testing for medical in-course assessment JF - Academic Medicine Y1 - 1999 A1 - Kreiter, C. D. A1 - Ferguson, K. A1 - Gruppen, L. D. KW - *Automation KW - *Education, Medical, Undergraduate KW - Educational Measurement/*methods KW - Humans KW - Internal Medicine/*education KW - Likelihood Functions KW - Psychometrics/*methods KW - Reproducibility of Results AB - PURPOSE: This study investigated the feasibility of converting an existing computer-administered, in-course internal medicine test to an adaptive format. METHOD: A 200-item internal medicine extended matching test was used for this research. Parameters were estimated with commercially available software with responses from 621 examinees. A specially developed simulation program was used to retrospectively estimate the efficiency of the computer-adaptive exam format. RESULTS: It was found that the average test length could be shortened by almost half with measurement precision approximately equal to that of the full 200-item paper-and-pencil test. However, computer-adaptive testing with this item bank provided little advantage for examinees at the upper end of the ability continuum. An examination of classical item statistics and IRT item statistics suggested that adding more difficult items might extend the advantage to this group of examinees. CONCLUSIONS: Medical item banks presently used for incourse assessment might be advantageously employed in adaptive testing. However, it is important to evaluate the match between the items and the measurement objective of the test before implementing this format. VL - 74 SN - 1040-2446 (Print) N1 - Kreiter, C DFerguson, KGruppen, L DUnited statesAcademic medicine : journal of the Association of American Medical CollegesAcad Med. 1999 Oct;74(10):1125-8. JO - Acad Med ER - TY - CONF T1 - An examination of conditioning variables in DIF analysis in a computer adaptive testing environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Walker, C. M. A1 - Ackerman, T. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Examinee judgments of changes in item difficulty: Implications for item review in computerized adaptive testing JF - Applied Measurement in Education Y1 - 1999 A1 - Wise, S. L. A1 - Finney, S. J., A1 - Enders, C. K. A1 - Freeman, S.A. A1 - Severance, D.D. VL - 12 ER - TY - ABST T1 - Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing Y1 - 1999 A1 - Chen, S. A1 - Ankenmann, R. D. A1 - Spray, J. A. CY - Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada N1 - (Also ACT Research Report 99-5). (Also presented at American Educational Research Association, 1999) ER - TY - ABST T1 - Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing (ACT Research Report series 99-5) Y1 - 1999 A1 - Chen, S-Y. A1 - Ankenmann, R. D. A1 - Spray, J. A. CY - Iowa City IA: ACT, Inc N1 - (also National Council on Measurement in Education paper, 1999). ER - TY - CONF T1 - Fairness in computer-based testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Gallagher, Aand A1 - Bridgeman, B. A1 - Calahan, C JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada N1 - Fairness in computer-based testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada. ER - TY - CONF T1 - Formula score and direct optimization algorithms in CAT ASVAB on-line calibration T2 - Paper presented at the annual meeting of the *?*. Y1 - 1999 A1 - Levine, M. V. A1 - Krass, I. A. JF - Paper presented at the annual meeting of the *?*. ER - TY - JOUR T1 - Generating items during testing: Psychometric issues and models JF - Psychometrika Y1 - 1999 A1 - Embretson, S. E. VL - 64 ER - TY - JOUR T1 - Graphical models and computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - Almond, R. G. A1 - Mislevy, R. J. KW - computerized adaptive testing AB - Considers computerized adaptive testing from the perspective of graphical modeling (GM). GM provides methods for making inferences about multifaceted skills and knowledge and for extracting data from complex performances. Provides examples from language-proficiency assessment. (SLD) VL - 23 ER - TY - CHAP T1 - Het ontwerpen van adaptieve examens [Designing adaptive tests] Y1 - 1999 A1 - van der Linden, W. J. CY - J. M Pieters, Tj. Plomp, and L.E. Odenthal (Eds.), Twintig jaar Toegepaste Onderwijskunde: Een kaleidoscopisch overzicht van Twents onderwijskundig onderzoek (pp. 249-267). Enschede: Twente University Press. N1 - [In Dutch] ER - TY - CONF T1 - Impact of flawed items on ability estimation in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Liu, M. A1 - Steffen, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada N1 - #LI99-01 ER - TY - CONF T1 - Implications from information functions and standard errors for determining preferred normed scales for CAT and P and P ASVAB T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Nicewander, W. A. A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - RPRT T1 - Incorporating content constraints into a multi-stage adaptive testlet design. Y1 - 1999 A1 - Reese, L. M. A1 - Schnipke, D. L. A1 - Luebke, S. W. AB - Most large-scale testing programs facing computerized adaptive testing (CAT) must face the challenge of maintaining extensive content requirements, but content constraints in computerized adaptive testing (CAT) can compromise the precision and efficiency that could be achieved by a pure maximum information adaptive testing algorithm. This simulation study first evaluated whether realistic content constraints could be met by carefully assembling testlets and appropriately selecting testlets for each test taker that, when combined, would meet the content requirements of the test and would be adapted to the test takers ability level. The second focus of the study was to compare the precision of the content-balanced testlet design with that achieved by the current paper-and-pencil version of the test through data simulation. The results reveal that constraints to control for item exposure, testlet overlap, and efficient pool utilization need to be incorporated into the testlet assembly algorithm. More refinement of the statistical constraints for testlet assembly is also necessary. However, even for this preliminary attempt at assembling content-balanced testlets, the two-stage computerized test simulated with these testlets performed quite well. (Contains 5 figures, 5 tables, and 12 references.) (Author/SLD) JF - LSAC Computerized Testing Report PB - Law School Admission Council CY - Princeton, NJ. USA SN - Series ER - TY - BOOK T1 - Innovations in computerized assessment Y1 - 1999 A1 - F Drasgow A1 - Olson-Buchanan, J. B. KW - computerized adaptive testing AB - Chapters in this book present the challenges and dilemmas faced by researchers as they created new computerized assessments, focusing on issues addressed in developing, scoring, and administering the assessments. Chapters are: (1) "Beyond Bells and Whistles; An Introduction to Computerized Assessment" (Julie B. Olson-Buchanan and Fritz Drasgow); (2) "The Development of a Computerized Selection System for Computer Programmers in a Financial Services Company" (Michael J. Zickar, Randall C. Overton, L. Rogers Taylor, and Harvey J. Harms); (3) "Development of the Computerized Adaptive Testing Version of the Armed Services Vocational Aptitude Battery" (Daniel O. Segall and Kathleen E. Moreno); (4) "CAT for Certification and Licensure" (Betty A. Bergstrom and Mary E. Lunz); (5) "Developing Computerized Adaptive Tests for School Children" (G. Gage Kingsbury and Ronald L. Houser); (6) "Development and Introduction of a Computer Adaptive Graduate Record Examinations General Test" (Craig N. Mills); (7) "Computer Assessment Using Visual Stimuli: A Test of Dermatological Skin Disorders" (Terry A. Ackerman, John Evans, Kwang-Seon Park, Claudia Tamassia, and Ronna Turner); (8) "Creating Computerized Adaptive Tests of Music Aptitude: Problems, Solutions, and Future Directions" (Walter P. Vispoel); (9) "Development of an Interactive Video Assessment: Trials and Tribulations" (Fritz Drasgow, Julie B. Olson-Buchanan, and Philip J. Moberg); (10) "Computerized Assessment of Skill for a Highly Technical Job" (Mary Ann Hanson, Walter C. Borman, Henry J. Mogilka, Carol Manning, and Jerry W. Hedge); (11) "Easing the Implementation of Behavioral Testing through Computerization" (Wayne A. Burroughs, Janet Murray, S. Scott Wesley, Debra R. Medina, Stacy L. Penn, Steven R. Gordon, and Michael Catello); and (12) "Blood, Sweat, and Tears: Some Final Comments on Computerized Assessment." (Fritz Drasgow and Julie B. Olson-Buchanan). Each chapter contains references. (Contains 17 tables and 21 figures.) (SLD) PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. N1 - EDRS Availability: None. Lawrence Erlbaum Associates, Inc., Publishers, 10 Industrial Avenue, Mahwah, New Jersey 07430-2262 (paperback: ISBN-0-8058-2877-X, $29.95; clothbound: ISBN-0-8058-2876-1, $59.95). Tel: 800-926-6579 (Toll Free). ER - TY - CHAP T1 - Item calibration and parameter drift Y1 - 1999 A1 - Glas, C. A. W. A1 - Veerkamp, W. J. J. CY - W. J. van der Linden and C. A. W. Glas (Eds.), Computer adaptive testing: Theory and practice. Norwell MA: Kluwer. ER - TY - CONF T1 - Item exposure in adaptive tests: An empirical investigation of control strategies T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1999 A1 - Parshall, C. A1 - Hogarty, K. A1 - Kromrey, J. JF - Paper presented at the annual meeting of the Psychometric Society CY - Lawrence KS ER - TY - ABST T1 - Item nonresponse: Occurrence, causes and imputation of missing answers to test items Y1 - 1999 A1 - Huisman, J. M. E. CY - (M and T Series No 32). Leiden: DSWO Press ER - TY - JOUR T1 - Item selection in adaptive testing with the sequential probability ratio test JF - Applied Psychological Measurement Y1 - 1999 A1 - Theo Eggen VL - 23 N1 - [Reprinted as Chapter 6 in #EG04-01] ER - TY - CONF T1 - Item selection in computerized adaptive testing: improving the a-stratified design with the Sympson-Hetter algorithm T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 1999 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - Montreal, CA ER - TY - CONF T1 - Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Vispoel, W. P. A1 - Hendrickson, A. A1 - Bleiler, T. A1 - Widiatmo, H. A1 - Shrairi, S. A1 - Ihrig, D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada N1 - #VI99-01 ER - TY - CONF T1 - Managing CAT item development in the face of uncertainty T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Guo, F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - ABST T1 - A minimax procedure in the context of sequential mastery testing (Research Report 99-04) Y1 - 1999 A1 - Vos, H. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - CONF T1 - More efficient use of item inventories T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Smith, R. A1 - Zhu, R. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Multidimensional adaptive testing with a minimum error-variance criterion JF - Journal of Educational and Behavioral Statistics Y1 - 1999 A1 - van der Linden, W. J. KW - computerized adaptive testing AB - Adaptive testing under a multidimensional logistic response model is addressed. An algorithm is proposed that minimizes the (asymptotic) variance of the maximum-likelihood estimator of a linear combination of abilities of interest. The criterion results in a closed-form expression that is easy to evaluate. In addition, it is shown how the algorithm can be modified if the interest is in a test with a "simple ability structure". The statistical properties of the adaptive ML estimator are demonstrated for a two-dimensional item pool with several linear combinations of the abilities. VL - 24 ER - TY - JOUR T1 - The null distribution of person-fit statistics for conventional and adaptive tests JF - Applied Psychological Measurement Y1 - 1999 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. VL - 23 ER - TY - CONF T1 - On-the-fly adaptive tests: An application of generative modeling to quantitative reasoning T2 - Symposium presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Bejar, I. I. JF - Symposium presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Optimal design for item calibration in computerized adaptive testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1999 A1 - Buyske, S. G. KW - computerized adaptive testing AB - Item Response Theory is the psychometric model used for standardized tests such as the Graduate Record Examination. A test-taker's response to an item is modelled as a binary response with success probability depending on parameters for both the test-taker and the item. Two popular models are the two-parameter logistic (2PL) model and the three-parameter logistic (3PL) model. For the 2PL model, the logit of the probability of a correct response equals ai(theta j-bi), where ai and bi are item parameters, while thetaj is the test-taker's parameter, known as "proficiency." The 3PL model adds a nonzero left asymptote to model random response behavior by low theta test-takers. Assigning scores to students requires accurate estimation of theta s, while accurate estimation of theta s requires accurate estimation of the item parameters. The operational implementation of Item Response Theory, particularly following the advent of computerized adaptive testing, generally involves handling these two estimation problems separately. This dissertation addresses the optimal design for item parameter estimation. Most current designs calibrate items with a sample drawn from the overall test-taking population. For 2PL models a sequential design based on the D-optimality criterion has been proposed, while no 3PL design is in the literature. In this dissertation, we design the calibration with the ultimate use of the items in mind, namely to estimate test-takers' proficiency parameters. For both the 2PL and 3PL models, this criterion leads to a locally L-optimal design criterion, named the Minimal Information Loss criterion. In turn, this criterion and the General Equivalence Theorem give a two point design for the 2PL model and a three point design for the 3PL model. A sequential implementation of this optimal design is presented. For the 2PL model, this design is almost 55% more efficient than the simple random sample approach, and 12% more efficient than the locally D-optimal design. For the 3PL model, the proposed design is 34% more efficient than the simple random sample approach. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 59 ER - TY - CONF T1 - Performance of the Sympson-Hetter exposure control algorithm with a polytomous item bank T2 - Paper presented at the annual meeting of American Educational Research Association Y1 - 1999 A1 - Pastor, D. A. A1 - Chiang, C. A1 - Dodd, B. G. A1 - Yockey, R. and JF - Paper presented at the annual meeting of American Educational Research Association CY - Montreal, Canada ER - TY - BOOK T1 - The precision of ability estimation methods for computerized adaptive testing using the generalized partial credit model Y1 - 1999 A1 - Wang, S CY - Unpublished doctoral dissertation, University of Pittsburgh ER - TY - CONF T1 - Precision of Warm's weighted likelihood estimation of ability for a polytomous model in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1999 A1 - Wang, S A1 - Wang, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Montreal N1 - #WA99-02 {PDF file, 604 KB} ER - TY - CONF T1 - Pretesting alongside an operational CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Davey, T. A1 - Pommerich, M A1 - Thompson, D. T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Principles for administering adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Miller, T. A1 - Davey, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal Canada ER - TY - CONF T1 - A procedure to compare conventional and adaptive testing procedures for making single-point decisions T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Kingsbury, G. G. A1 - A Zara JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - The rationale and principles of stratum scoring T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Wise, S. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Reducing bias in CAT trait estimation: A comparison of approaches JF - Applied Psychological Measurement Y1 - 1999 A1 - Wang, T. A1 - Hanson, B. H. A1 - C.-M. H. Lau VL - 23 ER - TY - CONF T1 - Reducing item exposure without reducing precision (much) in computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Holmes, R. M. A1 - Segall, D. O. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, CA ER - TY - CHAP T1 - Research and development of a computer-adaptive test of listening comprehension in the less-commonly taught language Hausa Y1 - 1999 A1 - Dunkel, P. CY - M. Chalhoub-Deville (Ed.). Issues in computer-adaptive testing of reading proficiency. Cambridge, UK : Cambridge University Press. ER - TY - CONF T1 - Response time feedback on computer-administered tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Scrams, D. J. A1 - Schnipke, D. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - BOOK T1 - The robustness of the unidimensional 3PL IRT model when applied to two-dimensional data in computerized adaptive testing Y1 - 1999 A1 - Zhao, J. C. CY - Unpublished Ph.D. dissertation, State University of New York at Albany N1 - #ZH99-1 ER - TY - ABST T1 - Some relationship among issues in CAT item pool management Y1 - 1999 A1 - Wang, T. N1 - #WA99-03 ER - TY - JOUR T1 - Some reliability estimates for computerized adaptive tests JF - Applied Psychological Measurement Y1 - 1999 A1 - Nicewander, W. A. A1 - Thomasson, G. L. AB - Three reliability estimates are derived for the Bayes modal estimate (BME) and the maximum likelihood estimate (MLE) of θin computerized adaptive tests (CAT). Each reliability estimate is a function of test information. Two of the estimates are shown to be upper bounds to true reliability. The three reliability estimates and the true reliabilities of both MLE and BME were computed for seven simulated CATs. Results showed that the true reliabilities for MLE and BME were nearly identical in all seven tests. The three reliability estimates never differed from the true reliabilities by more than .02 (.01 in most cases). A simple implementation of one reliability estimate was found to accurately estimate reliability in CATs. VL - 23 ER - TY - CONF T1 - Standard errors of proficiency estimates in stratum scored CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Kingsbury, G. G. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Study of methods to detect aberrant response patterns in computerized testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Iwamoto, C. K. A1 - Nungester, R. J. A1 - Luecht, RM JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - ABST T1 - Test anxiety and test performance: Comparing paper-based and computer-adaptive versions of the GRE General Test (Research Report 99-15) Y1 - 1999 A1 - Powers, D. E. CY - Princeton NJ: Educational Testing Service ER - TY - CHAP T1 - Testing adaptatif et évaluation des processus cognitifs Y1 - 1999 A1 - Laurier, M. CY - C. Depover and B. Noël (Éds) : L’évaluation des compétences et des processus cognitifs - Modèles, pratiques et contextes. Bruxelles : De Boeck Université. ER - TY - ABST T1 - Tests informatizados: Fundamentos y aplicaciones (Computerized testing: Fundamentals and applications Y1 - 1999 A1 - Olea, J. A1 - Ponsoda, V. A1 - Prieto, G., Eds. CY - Madrid: Pirmide. N1 - [In Spanish] ER - TY - CONF T1 - Test-taking strategies T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Steffen, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Test-taking strategies in computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Steffen, M. A1 - Way, W. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Threats to score comparability with applications to performance assessments and computerized adaptive tests JF - Educational Assessment Y1 - 1999 A1 - Kolen, M. J. AB - Develops a conceptual framework that addresses score comparability for performance assessments, adaptive tests, paper-and-pencil tests, and alternate item pools for computerized tests. Outlines testing situation aspects that might threaten score comparability and describes procedures for evaluating the degree of score comparability. Suggests ways to minimize threats to comparability. (SLD) VL - 6 ER - TY - JOUR T1 - Threats to score comparability with applications to performance assessments and computerized adaptive tests JF - Educational Assessment Y1 - 1999 A1 - Kolen, M. J. VL - 6 ER - TY - CONF T1 - Use of conditional item exposure methodology for an operational CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Anderson, D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - The use of linear-on-the-fly testing for TOEFL Reading T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Carey, P. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - The use of Rasch analysis to produce scale-free measurement of functional ability JF - American Journal of Occupational Therapy Y1 - 1999 A1 - Velozo, C. A. A1 - Kielhofner, G. A1 - Lai, J-S. KW - *Activities of Daily Living KW - Disabled Persons/*classification KW - Human KW - Occupational Therapy/*methods KW - Predictive Value of Tests KW - Questionnaires/standards KW - Sensitivity and Specificity AB - Innovative applications of Rasch analysis can lead to solutions for traditional measurement problems and can produce new assessment applications in occupational therapy and health care practice. First, Rasch analysis is a mechanism that translates scores across similar functional ability assessments, thus enabling the comparison of functional ability outcomes measured by different instruments. This will allow for the meaningful tracking of functional ability outcomes across the continuum of care. Second, once the item-difficulty order of an instrument or item bank is established by Rasch analysis, computerized adaptive testing can be used to target items to the patient's ability level, reducing assessment length by as much as one half. More importantly, Rasch analysis can provide the foundation for "equiprecise" measurement or the potential to have precise measurement across all levels of functional ability. The use of Rasch analysis to create scale-free measurement of functional ability demonstrates how this methodlogy can be used in practical applications of clinical and outcome assessment. VL - 53 N1 - 991250470272-9490Journal Article ER - TY - JOUR T1 - Using Bayesian decision theory to design a computerized mastery test JF - Journal of Educational and Behavioral Statistics Y1 - 1999 A1 - Vos, H. J. VL - 24(3) ER - TY - JOUR T1 - Using response-time constraints to control for differential speededness in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - van der Linden, W. J. A1 - Scrams, D. J. A1 - Schnipke, D. L. KW - computerized adaptive testing AB - An item-selection algorithm is proposed for neutralizing the differential effects of time limits on computerized adaptive test scores. The method is based on a statistical model for distributions of examinees’ response times on items in a bank that is updated each time an item is administered. Predictions from the model are used as constraints in a 0-1 linear programming model for constrained adaptive testing that maximizes the accuracy of the trait estimator. The method is demonstrated empirically using an item bank from the Armed Services Vocational Aptitude Battery. VL - 23 N1 - Sage Publications, US ER - TY - ABST T1 - WISCAT: Een computergestuurd toetspakket voor rekenen en wiskunde [A computerized test package for arithmetic and mathematics] Y1 - 1999 A1 - Cito. CY - Cito: Arnhem, The Netherlands ER - TY - ABST T1 - Adaptive mastery testing using the Rasch model and Bayesian sequential decision theory (Research Report 98-15) Y1 - 1998 A1 - Glas, C. A. W. A1 - Vos, H. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - CONF T1 - Adaptive testing without IRT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Yan, D. A1 - Lewis, C. A1 - Stocking, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - (ERIC No. ED422359) ER - TY - CHAP T1 - Alternatives for scoring computerized adaptive tests T2 - Computer-based testing Y1 - 1998 A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. ED - J. J. Fremer ED - W. C. Ward JF - Computer-based testing PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J., USA ER - TY - CONF T1 - Alternatives for scoring computerized adaptive tests T2 - Paper presented at an Educational Testing Service-sponsored colloquium entitled Computer-based testing: Building the foundations for future assessments Y1 - 1998 A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at an Educational Testing Service-sponsored colloquium entitled Computer-based testing: Building the foundations for future assessments CY - Philadelphia PA ER - TY - CONF T1 - Application of an IRT ideal point model to computer adaptive assessment of job performance T2 - Paper presented at the annual meeting of the Society for Industrial and Organization Psychology Y1 - 1998 A1 - Stark, S. A1 - F Drasgow JF - Paper presented at the annual meeting of the Society for Industrial and Organization Psychology CY - Dallas TX ER - TY - CONF T1 - Application of direct optimization for on-line calibration in computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Krass, I. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 146 KB} ER - TY - JOUR T1 - Applications of network flows to computerized adaptive testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1998 A1 - Claudio, M. J. C. KW - computerized adaptive testing AB - Recently, the concept of Computerized Adaptive Testing (CAT) has been receiving ever growing attention from the academic community. This is so because of both practical and theoretical considerations. Its practical importance lies in the advantages of CAT over the traditional (perhaps outdated) paper-and-pencil test in terms of time, accuracy, and money. The theoretical interest is sparked by its natural relationship to Item Response Theory (IRT). This dissertation offers a mathematical programming approach which creates a model that generates a CAT that takes care of many questions concerning the test, such as feasibility, accuracy and time of testing, as well as item pool security. The CAT generated is designed to obtain the most information about a single test taker. Several methods for eatimating the examinee's ability, based on the (dichotomous) responses to the items in the test, are also offered here. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 59 ER - TY - BOOK T1 - Applications of network flows to computerized adaptive testing Y1 - 1998 A1 - Cordova, M. J. CY - Dissertation, Rutgers Center for Operations Research (RUTCOR), Rutgers University, New Brunswick NJ ER - TY - CONF T1 - A Bayesian approach to detection of item preknowledge in a CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - McLeod, L. D. A1 - Lewis, C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA ER - TY - JOUR T1 - Bayesian identification of outliers in computerized adaptive testing JF - Journal of the American Statistical Association Y1 - 1998 A1 - Bradlow, E. T. A1 - Weiss, R. E. A1 - Cho, M. AB - We consider the problem of identifying examinees with aberrant response patterns in a computerized adaptive test (CAT). The vec-tor of responses yi of person i from the CAT comprise a multivariate response vector. Multivariate observations may be outlying in manydi erent directions and we characterize speci c directions as corre- sponding to outliers with different interpretations. We develop a class of outlier statistics to identify different types of outliers based on a con-trol chart type methodology. The outlier methodology is adaptable to general longitudinal discrete data structures. We consider several procedures to judge how extreme a particular outlier is. Data from the National Council Licensure EXamination (NCLEX) motivates our development and is used to illustrate the results. VL - 93 ER - TY - JOUR T1 - Bayesian item selection criteria for adaptive testing JF - Psychometrika Y1 - 1998 A1 - van der Linden, W. J. VL - 63 ER - TY - ABST T1 - Capitalization on item calibration error in adaptive testing (Research Report 98-07) Y1 - 1998 A1 - van der Linden, W. J. A1 - Glas, C. A. W. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - CASTISEL [Computer software] Y1 - 1998 A1 - Luecht, RM CY - Philadelphia, PA: National Board of Medical Examiners ER - TY - CONF T1 - CAT item calibration T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Hsu, Y. A1 - Thompson, T.D. A1 - Chen, W-H. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego ER - TY - CONF T1 - CAT Item exposure control: New evaluation tools, alternate methods and integration into a total CAT program T2 - Paper presented at the annual meeting of the National Council of Measurement in Education Y1 - 1998 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council of Measurement in Education CY - San Diego, CA ER - TY - RPRT T1 - Comparability of paper-and-pencil and computer adaptive test scores on the GRE General Test Y1 - 1998 A1 - Schaeffer, G. A. A1 - Bridgeman, B. A1 - Golub-Smith, M. L. A1 - Lewis, C. A1 - Potenza, M. T. A1 - Steffen, M. PB - Educational Testing Services CY - Princeton, N.J. SN - ETS Research Report 98-38 ER - TY - ABST T1 - Comparability of paper-and-pencil and computer adaptive test scores on the GRE General Test (GRE Board Professional Report No 95-08P; Educational Testing Service Research Report 98-38) Y1 - 1998 A1 - Schaeffer, G. A1 - Bridgeman, B. A1 - Golub-Smith, M. L. A1 - Lewis, C. A1 - Potenza, M. T. A1 - Steffen, M. CY - Princeton, NJ: Educational Testing Service ER - TY - ABST T1 - A comparative study of item exposure control methods in computerized adaptive testing Y1 - 1998 A1 - Chang, S-W. A1 - Twu, B.-Y. CY - Research Report Series 98-3, Iowa City: American College Testing. N1 - #CH98-03 ER - TY - BOOK T1 - A comparative study of item exposure control methods in computerized adaptive testing Y1 - 1998 A1 - Chang, S-W. CY - Unpublished doctoral dissertation, University of Iowa , Iowa City IA ER - TY - CONF T1 - Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1998 A1 - Lau, CA A1 - Wang, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Diego N1 - PDF file, 375 K ER - TY - JOUR T1 - A comparison of item exposure control methods in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 1998 A1 - Revuelta, J., A1 - Ponsoda, V. VL - 35 ER - TY - JOUR T1 - A comparison of maximum likelihood estimation and expected a posteriori estimation in CAT using the partial credit model JF - Educational and Psychological Measurement Y1 - 1998 A1 - Chen, S. A1 - Hou, L. A1 - Dodd, B. G. VL - 58 ER - TY - CONF T1 - A comparison of two methods of controlling item exposure in computerized adaptive testing T2 - Paper presented at the meeting of the American Educational Research Association. San Diego CA. Y1 - 1998 A1 - Tang, L. A1 - Jiang, H. A1 - Chang, Hua-Hua JF - Paper presented at the meeting of the American Educational Research Association. San Diego CA. ER - TY - ABST T1 - Computer adaptive testing – Approaches for item selection and measurement Y1 - 1998 A1 - Armstrong, R. D. A1 - Jones, D. H. CY - Rutgers Center for Operations Research, New Brunswick NJ ER - TY - JOUR T1 - Computer-assisted test assembly using optimization heuristics JF - Applied Psychological Measurement Y1 - 1998 A1 - Luecht, RM VL - 22 ER - TY - CONF T1 - Computerized adaptive rating scales that measure contextual performance T2 - Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology Y1 - 1998 A1 - Borman, W. C. A1 - Hanson, M. A. A1 - Montowidlo, S. J A1 - F Drasgow A1 - Foster, L A1 - Kubisiak, U. C. JF - Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology CY - Dallas TX ER - TY - JOUR T1 - Computerized adaptive testing: What it is and how it works JF - Educational Technology Y1 - 1998 A1 - Straetmans, G. J. J. M. A1 - Theo Eggen AB - Describes the workings of computerized adaptive testing (CAT). Focuses on the key concept of information and then discusses two important components of a CAT system: the calibrated item bank and the testing algorithm. Describes a CAT that was designed for making placement decisions on the basis of two typical test administrations and notes the most significant differences between traditional paper-based testing and CAT. (AEF) VL - 38 ER - TY - CONF T1 - Computerized adaptive testing with multiple form structures T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Armstrong, R. D. A1 - Jones, D. H. A1 - Berliner, N. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Thompson, T. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego ER - TY - CONF T1 - Constructing passage-based tests that parallel conventional programs T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Thompson, T. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Controlling item exposure and maintaining item security T2 - Paper presented at an Educational Testing Service-sponsored colloquium entitled “Computer-based testing: Building the foundations for future assessments Y1 - 1998 A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at an Educational Testing Service-sponsored colloquium entitled “Computer-based testing: Building the foundations for future assessments CY - ” Philadelphia PA ER - TY - JOUR T1 - Controlling item exposure conditional on ability in computerized adaptive testing JF - Journal of Educational & Behavioral Statistics Y1 - 1998 A1 - Stocking, M. L. A1 - Lewis, C. AB - The interest in the application of large-scale adaptive testing for secure tests has served to focus attention on issues that arise when theoretical advances are made operational. One such issue is that of ensuring item and pool security in the continuous testing environment made possible by the computerized admin-istration of a test, as opposed to the more periodic testing environment typically used for linear paper-and-pencil tests. This article presents a new method of controlling the exposure rate of items conditional on ability level in this continuous testing environment. The properties of such conditional control on the exposure rates of items, when used in conjunction with a particular adaptive testing algorithm, are explored through studies with simulated data. VL - 23 N1 - American Educational Research Assn, US ER - TY - CONF T1 - Developing, maintaining, and renewing the item inventory to support computer-based testing T2 - Paper presented at the colloquium Y1 - 1998 A1 - Way, W. D. A1 - Steffen, M. A1 - Anderson, G. S. JF - Paper presented at the colloquium CY - Computer-Based Testing: Building the Foundation for Future Assessments, Philadelphia PA ER - TY - ABST T1 - Development and evaluation of online calibration procedures (TCN 96-216) Y1 - 1998 A1 - Levine, M. L. A1 - Williams. CY - Champaign IL: Algorithm Design and Measurement Services, Inc ER - TY - ABST T1 - Does adaptive testing violate local independence? (Research Report 98-33) Y1 - 1998 A1 - Mislevy, R. J. A1 - Chang, Hua-Hua CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - The effect of item pool restriction on the precision of ability measurement for a Rasch-based CAT: comparisons to traditional fixed length examinations JF - J Outcome Meas Y1 - 1998 A1 - Halkitis, P. N. KW - *Decision Making, Computer-Assisted KW - Comparative Study KW - Computer Simulation KW - Education, Nursing KW - Educational Measurement/*methods KW - Human KW - Models, Statistical KW - Psychometrics/*methods AB - This paper describes a method for examining the precision of a computerized adaptive test with a limited item pool. Standard errors of measurement ascertained in the testing of simulees with a CAT using a restricted pool were compared to the results obtained in a live paper-and-pencil achievement testing of 4494 nursing students on four versions of an examination of calculations of drug administration. CAT measures of precision were considered when the simulated examine pools were uniform and normal. Precision indices were also considered in terms of the number of CAT items required to reach the precision of the traditional tests. Results suggest that regardless of the size of the item pool, CAT provides greater precision in measurement with a smaller number of items administered even when the choice of items is limited but fails to achieve equiprecision along the entire ability continuum. VL - 2 N1 - 983263801090-655xJournal Article ER - TY - CONF T1 - Effect of item selection on item exposure rates within a computerized classification test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Kalohn, J.C. A1 - Spray, J. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA ER - TY - CONF T1 - An empirical Bayes approach to Mantel-Haenszel DIF analysis: Theoretical development and application to CAT data T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Zwick, R. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Essentially unbiased Bayesian estimates in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1998 A1 - Wang, T. A1 - Lau, C. A1 - Hanson, B. A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Diego ER - TY - CONF T1 - Evaluating and insuring measurement precision in adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Evaluation of methods for the use of underutilized items in a CAT environment T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Steffen, M. A1 - Liu, M. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - An examination of item-level response times from an operational CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Swygert, K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Urbana IL ER - TY - CONF T1 - Expected losses for individuals in Computerized Mastery Testing T2 - Paper presented at the annual meeting of National Council on Measurement in Education Y1 - 1998 A1 - Smith, R. A1 - Lewis, C. JF - Paper presented at the annual meeting of National Council on Measurement in Education CY - San Diego ER - TY - ABST T1 - Feasibility studies of two-stage testing in large-scale educational assessment: Implications for NAEP Y1 - 1998 A1 - Bock, R. D. A1 - Zimowski, M. F. CY - American Institutes for Research, CA ER - TY - ABST T1 - A framework for comparing adaptive test designs Y1 - 1998 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - A framework for exploring and controlling risks associated with test item exposure over time T2 - Paper presented at the Annual Meeting of the National Council for Measurement in Education Y1 - 1998 A1 - Luecht, RM JF - Paper presented at the Annual Meeting of the National Council for Measurement in Education CY - San Diego, CA ER - TY - CONF T1 - A hybrid method for controlling item exposure in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nering, M. L. A1 - Davey, T. A1 - Thompson, T. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - A hybrid method for controlling item exposure in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nering, M. L. A1 - Davey, T. A1 - Thompson, T. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - The impact of nonmodel-fitting responses in a realistic CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Yi, Q, A1 - Nering, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA ER - TY - CONF T1 - The impact of scoring flawed items on ability estimation in CAT T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Liu, M. A1 - Steffen, M. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CHAP T1 - Innovations in computer-based ability testing: Promise, problems and perils Y1 - 1998 A1 - J. R. McBride CY - In Hakel, M.D. (Ed.) Beyond multiple choice: Alternatives to traditional testing for selection. Hillsdale, NJ: Lawrence Erlbaum Associates. ER - TY - ABST T1 - Item banking Y1 - 1998 A1 - Rudner, L. M. AB - Discusses the advantages and disadvantages of using item banks while providing useful information to those who are considering implementing an item banking project in their school district. The primary advantage of item banking is in test development. Also describes start-up activities in implementing item banking. (SLD) JF - Practical Assessment, Research and Evaluation VL - 6 N1 - Using Smart Source Parsing ER - TY - CONF T1 - Item development and pretesting in a computer-based testing environment T2 - Paper presented at the Educational Testing Service Sponsored Colloquium on Computer-Based Testing: Building the Foundation for Future Assessments Y1 - 1998 A1 - Parshall, C. G. JF - Paper presented at the Educational Testing Service Sponsored Colloquium on Computer-Based Testing: Building the Foundation for Future Assessments CY - Philadelphia ER - TY - ABST T1 - Item selection in adaptive testing with the sequential probability ratio test (Measurement and Research Department Report, 98-1) Y1 - 1998 A1 - Theo Eggen CY - Arnhem, The Netherlands: Cito. N1 - [see APM paper, 1999; also reprinted as Chapter 6 in #EG04-01.] ER - TY - CONF T1 - Item selection in computerized adaptive testing: Should more discriminating items be used first? Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA Y1 - 1998 A1 - Hau, K. T. A1 - Chang, Hua-Hua ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Should more discriminating items be used first? JF - Journal of Educational Measurement Y1 - 1998 A1 - Hau, K. T. A1 - Chang, Hua-Hua VL - 38 ER - TY - JOUR T1 - Maintaining content validity in computerized adaptive testing JF - Advances in Health Sciences Education Y1 - 1998 A1 - Luecht, RM A1 - de Champlain, A. A1 - Nungester, R. J. KW - computerized adaptive testing AB - The authors empirically demonstrate some of the trade-offs which can occur when content balancing is imposed in computerized adaptive testing (CAT) forms or conversely, when it is ignored. The authors contend that the content validity of a CAT form can actually change across a score scale when content balancing is ignored. However they caution that, efficiency and score precision can be severely reduced by over specifying content restrictions in a CAT form. The results from 2 simulation studies are presented as a means of highlighting some of the trade-offs that could occur between content and statistical considerations in CAT form assembly. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 3 N1 - Kluwer Academic Publishers, Netherlands ER - TY - JOUR T1 - Measuring change conventionally and adaptively JF - Educational and Psychological Measurement Y1 - 1998 A1 - May, K. A1 - Nicewander, W. A. VL - 58 ER - TY - JOUR T1 - A model for optimal constrained adaptive testing JF - Applied Psychological Measurement Y1 - 1998 A1 - van der Linden, W. J. A1 - Reese, L. M. KW - computerized adaptive testing AB - A model for constrained computerized adaptive testing is proposed in which the information in the test at the trait level (0) estimate is maximized subject to a number of possible constraints on the content of the test. At each item-selection step, a full test is assembled to have maximum information at the current 0 estimate, fixing the items already administered. Then the item with maximum in-formation is selected. All test assembly is optimal because a linear programming (LP) model is used that automatically updates to allow for the attributes of the items already administered and the new value of the 0 estimator. The LP model also guarantees that each adaptive test always meets the entire set of constraints. A simulation study using a bank of 753 items from the Law School Admission Test showed that the 0 estimator for adaptive tests of realistic lengths did not suffer any loss of efficiency from the presence of 433 constraints on the item selection process. VL - 22 N1 - Sage Publications, US ER - TY - CONF T1 - A new approach for the detection of item preknowledge in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - McLeod, L. D. A1 - Lewis, C. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Nonmodel-fitting responses and robust ability estimation in a realistic CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Yi, Q, A1 - Nering, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA ER - TY - JOUR T1 - Optimal design of item pools for computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1998 A1 - Stocking, M. L. A1 - Swanson, L. VL - 22 ER - TY - JOUR T1 - Optimal sequential rules for computer-based instruction JF - Journal of Educational Computing Research Y1 - 1998 A1 - Vos, H. J. VL - 19(2) ER - TY - JOUR T1 - Optimal test assembly of psychological and educational tests JF - Applied Psychological Measurement Y1 - 1998 A1 - van der Linden, W. J. VL - 22 ER - TY - CONF T1 - Patterns of item exposure using a randomized CAT algorithm T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Lunz, M. E. A1 - Stahl, J. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego, CA ER - TY - ABST T1 - Person fit based on statistical process control in an adaptive testing environment (Research Report 98-13) Y1 - 1998 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Practical issues in computerized test assembly JF - Applied Psychological Measurement Y1 - 1998 A1 - Wightman, L. F. VL - 22 ER - TY - JOUR T1 - Properties of ability estimation methods in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 1998 A1 - Wang, T. A1 - Vispoel, W. P. VL - 35 ER - TY - JOUR T1 - Protecting the integrity of computerized testing item pools JF - Educational Measurement: Issues and Practice Y1 - 1998 A1 - Way, W. D. VL - 17(4) ER - TY - JOUR T1 - Psychometric characteristics of computer-adaptive and self-adaptive vocabulary tests: The role of answer feedback and test anxiety JF - Journal of Educational Measurement Y1 - 1998 A1 - Vispoel, W. P. VL - 35 ER - TY - ABST T1 - Quality control of on-line calibration in computerized adaptive testing (Research Report 98-03) Y1 - 1998 A1 - Glas, C. A. W. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - The relationship between computer familiarity and performance on computer-based TOEFL test tasks (Research Report 98-08) Y1 - 1998 A1 - Taylor, C. A1 - Jamieson, J. A1 - Eignor, D. R. A1 - Kirsch, I. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Reviewing and changing answers on computer-adaptive and self-adaptive vocabulary tests JF - Journal of Educational Measurement Y1 - 1998 A1 - Vispoel, W. P. VL - 35 N1 - (Also presented at National Council on Measurement in Education, 1996) ER - TY - ABST T1 - Simulating nonmodel-fitting responses in a CAT Environment (Research Report 98-10) Y1 - 1998 A1 - Yi, Q. A1 - Nering, M, L. CY - Iowa City IA: ACT Inc. (Also presented at National Council on Measurement in Education, 1999: ERIC No. ED 427 042) N1 - #YI-98-10 ER - TY - ABST T1 - Simulating the null distribution of person-fit statistics for conventional and adaptive tests (Research Report 98-02) Y1 - 1998 A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. L. A. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Simulating the use of disclosed items in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 1998 A1 - Stocking, M. L. A1 - W. C. Ward A1 - Potenza, M. T. KW - computerized adaptive testing AB - Regular use of questions previously made available to the public (i.e., disclosed items) may provide one way to meet the requirement for large numbers of questions in a continuous testing environment, that is, an environment in which testing is offered at test taker convenience throughout the year rather than on a few prespecified test dates. First it must be shown that such use has effects on test scores small enough to be acceptable. In this study simulations are used to explore the use of disclosed items under a worst-case scenario which assumes that disclosed items are always answered correctly. Some item pool and test designs were identified in which the use of disclosed items produces effects on test scores that may be viewed as negligible. VL - 35 N1 - National Council on Measurement in Education, US ER - TY - CONF T1 - Some considerations for eliminating biases in ability estimation in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1998 A1 - Samejima, F. JF - Paper presented at the annual meeting of the American Educational Research Association ER - TY - CONF T1 - Some item response theory to provide scale scores based on linear combinations of testlet scores, for computerized adaptive tests T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Thissen, D. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - JOUR T1 - Some practical examples of computerized adaptive sequential testing JF - Journal of Educational Measurement Y1 - 1998 A1 - Luecht, RM A1 - Nungester, R. J. VL - 35 ER - TY - CONF T1 - Some reliability estimators for computerized adaptive tests T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nicewander, W. A. A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - ABST T1 - Statistical tests for person misfit in computerized adaptive testing (Research Report 98-01) Y1 - 1998 A1 - Glas, C. A. W. A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. L. A. CY - Enschede, The Netherlands : University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - RPRT T1 - Statistical tests for person misfit in computerized adaptive testing Y1 - 1998 A1 - Glas, C. A. W. A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. PB - Faculty of Educational Science and Technology, Univeersity of Twente CY - Enschede, The Netherlands SN - 98-01 ER - TY - JOUR T1 - Stochastic order in dichotomous item response models for fixed, adaptive, and multidimensional tests JF - Psychometrika Y1 - 1998 A1 - van der Linden, W. J. VL - 63 ER - TY - JOUR T1 - Swedish Enlistment Battery: Construct validity and latent variable estimation of cognitive abilities by the CAT-SEB JF - International Journal of Selection and Assessment Y1 - 1998 A1 - Mardberg, B. A1 - Carlstedt, B. VL - 6 ER - TY - CONF T1 - Test development exposure control for adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Parshall, C. G. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego, CA ER - TY - JOUR T1 - Testing word knowledge by telephone to estimate general cognitive aptitude using an adaptive test JF - Intelligence Y1 - 1998 A1 - Legree, P. J. A1 - Fischl, M. A A1 - Gade, P. A. A1 - Wilson, M. VL - 26 ER - TY - ABST T1 - Three response types for broadening the conception of mathematical problem solving in computerized-adaptive tests (Research Report 98-45) Y1 - 1998 A1 - Bennett, R. E. A1 - Morley, M. A1 - Quardt, D. CY - Princeton NJ : Educational Testing Service N1 - #BE98-45 (Also presented at National Council on Measurement in Education, 1998) ER - TY - ABST T1 - Using response-time constraints to control for differential speededness in adaptive testing (Research Report 98-06) Y1 - 1998 A1 - van der Linden, W. J. A1 - Scrams, D. J. A1 - Schnipke, D. L. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - CONF T1 - The accuracy of examinee judgments of relative item difficulty: Implication for computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Wise, S. L. A1 - Freeman, S.A. A1 - Finney, S. J. A1 - Enders, C. K. A1 - Severance, D.D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago ER - TY - JOUR T1 - Adapting to adaptive testing JF - Personnel Psychology Y1 - 1997 A1 - Overton, R. C. A1 - Harms, H. J. A1 - Taylor, L. R. A1 - Zickar, M.. J. VL - 50 ER - TY - CONF T1 - Administering and scoring the computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - A Zara JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Alternate methods of scoring computer-based adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Green, B. F. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - An alternative method for scoring adaptive tests JF - Journal of Educational and Behavioral Statistics Y1 - 1997 A1 - Stocking, M. L. VL - 21 N1 - (Also Educational Testing Service RR 94-48) ER - TY - ABST T1 - Applications of Bayesian decision theory to sequential mastery testing (Research Report 97-06) Y1 - 1997 A1 - Vos, H. J. CY - Twente, The Netherlands: Department of Educational Measurement and Data Analysis ER - TY - CONF T1 - Applications of multidimensional adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Segall, D. O. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Assessing speededness in variable-length computer-adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Bontempo, B A1 - Julian, E. R A1 - Gorham, J. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - A Bayesian enhancement of Mantel Haenszel DIF analysis for computer adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Zwick, R. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Calibration of CAT items administered online for classification: Assumption of local independence T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1997 A1 - Spray, J. A. A1 - Parshall, C. G. A1 - Huang, C.-H. JF - Paper presented at the annual meeting of the Psychometric Society CY - Gatlinburg TN ER - TY - ABST T1 - CAST 5 for Windows users' guide Y1 - 1997 A1 - J. R. McBride A1 - Cooper, R. R CY - Contract No. "MDA903-93-D-0032, DO 0054. Alexandria, VA: Human Resources Research Organization ER - TY - CHAP T1 - CAT-ASVAB cost and benefit analyses Y1 - 1997 A1 - Wise, L. L. A1 - Curran, L. T. A1 - J. R. McBride CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computer adaptive testing: From inquiry to operation (pp. 227-236). Washington, DC: American Psychological Association. ER - TY - CHAP T1 - CAT-ASVAB operational test and evaluation Y1 - 1997 A1 - Moreno, K. E. CY - W. A. Sands, B. K. Waters, and . R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 199-205). Washington DC: American Psychological Association. ER - TY - ABST T1 - CATSIB: A modified SIBTEST procedure to detect differential item functioning in computerized adaptive tests (Research report) Y1 - 1997 A1 - Nandakumar, R. A1 - Roussos, L. CY - Newtown, PA: Law School Admission Council ER - TY - CONF T1 - Comparability and validity of computerized adaptive testing with the MMPI-2 using a clinical sample T2 - Paper presented at the 32nd Annual Symposium and Recent Developments in the use of the MMPI-2 and MMPI-A. Minneapolis MN. Y1 - 1997 A1 - Handel, R. W. A1 - Ben-Porath, Y. S. A1 - Watt, M. JF - Paper presented at the 32nd Annual Symposium and Recent Developments in the use of the MMPI-2 and MMPI-A. Minneapolis MN. ER - TY - JOUR T1 - A comparison of maximum likelihood estimation and expected a posteriori estimation in computerized adaptive testing using the generalized partial credit model JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1997 A1 - Chen, S-K. KW - computerized adaptive testing AB - A simulation study was conducted to investigate the application of expected a posteriori (EAP) trait estimation in computerized adaptive tests (CAT) based on the generalized partial credit model (Muraki, 1992), and to compare the performance of EAP with maximum likelihood trait estimation (MLE). The performance of EAP was evaluated under different conditions: the number of quadrature points (10, 20, and 30), and the type of prior distribution (normal, uniform, negatively skewed, and positively skewed). The relative performance of the MLE and EAP estimation methods were assessed under two distributional forms of the latent trait, one normal and the other negatively skewed. Also, both the known item parameters and estimated item parameters were employed in the simulation study. Descriptive statistics, correlations, scattergrams, accuracy indices, and audit trails were used to compare the different methods of trait estimation in CAT. The results showed that, regardless of the latent trait distribution, MLE and EAP with a normal prior, a uniform prior, or the prior that matches the latent trait distribution using either 20 or 30 quadrature points provided relatively accurate estimation in CAT based on the generalized partial credit model. However, EAP using only 10 quadrature points did not work well in the generalized partial credit CAT. Also, the study found that increasing the number of quadrature points from 20 to 30 did not increase the accuracy of EAP estimation. Therefore, it appears 20 or more quadrature points are sufficient for accurate EAP estimation. The results also showed that EAP with a negatively skewed prior and positively skewed prior performed poorly for the normal data set, and EAP with positively skewed prior did not provide accurate estimates for the negatively skewed data set. Furthermore, trait estimation in CAT using estimated item parameters produced results similar to those obtained using known item parameters. In general, when at least 20 quadrature points are used, EAP estimation with a normal prior, a uniform prior or the prior that matches the latent trait distribution appears to be a good alternative to MLE in the application of polytomous CAT based on the generalized partial credit model. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 58 ER - TY - CONF T1 - A comparison of testlet-based test designs for computerized adaptive testing T2 - Paper presented at the meeting of American Educational Research Association Y1 - 1997 A1 - Schnipke, D. L. A1 - Reese, L. M. JF - Paper presented at the meeting of American Educational Research Association CY - Chicago, IL ER - TY - CONF T1 - Computer assembly of tests so that content reigns supreme T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Case, S. M. A1 - Luecht, RM JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Computer-adaptive testing of listening comprehension: A blueprint of CAT Development JF - The Language Teacher Online 21 Y1 - 1997 A1 - Dunkel, P. A. VL - no. 10. N1 - . ER - TY - JOUR T1 - Computerized adaptive and fixed-item testing of music listening skill: A comparison of efficiency, precision, and concurrent validity JF - Journal of Educational Measurement Y1 - 1997 A1 - Vispoel, W. P. A1 - Wang, T. A1 - Bleiler, T. VL - 34 ER - TY - JOUR T1 - Computerized adaptive and fixed-item testing of music listening skill: A comparison of efficiency, precision, and concurrent validity JF - Journal of Educational Measurement Y1 - 1997 A1 - Vispoel, W. P., A1 - Wang, T. VL - 34 ER - TY - BOOK T1 - Computerized adaptive testing: From inquiry to operation Y1 - 1997 A1 - Sands, W. A. A1 - B. K. Waters A1 - J. R. McBride KW - computerized adaptive testing AB - (from the cover) This book traces the development of computerized adaptive testing (CAT) from its origins in the 1960s to its integration with the Armed Services Vocational Aptitude Battery (ASVAB) in the 1990s. A paper-and-pencil version of the battery (P&P-ASVAB) has been used by the Defense Department since the 1970s to measure the abilities of applicants for military service. The test scores are used both for initial qualification and for classification into entry-level training opportunities. /// This volume provides the developmental history of the CAT-ASVAB through its various stages in the Joint-Service arena. Although the majority of the book concerns the myriad technical issues that were identified and resolved, information is provided on various political and funding support challenges that were successfully overcome in developing, testing, and implementing the battery into one of the nation's largest testing programs. The book provides useful information to professionals in the testing community and everyone interested in personnel assessment and evaluation. (PsycINFO Database Record (c) 2004 APA, all rights reserved). PB - American Psychological Association CY - Washington, D.C., USA N1 - References .Using Smart Source Parsingxvii, pp ER - TY - JOUR T1 - A computerized adaptive testing system for speech discrimination measurement: The Speech Sound Pattern Discrimination Test JF - Journal of the Accoustical Society of America Y1 - 1997 A1 - Bochner, J. A1 - Garrison, W. A1 - Palmer, L. A1 - MacKenzie, D. A1 - Braveman, A. KW - *Diagnosis, Computer-Assisted KW - *Speech Discrimination Tests KW - *Speech Perception KW - Adolescent KW - Adult KW - Audiometry, Pure-Tone KW - Human KW - Middle Age KW - Psychometrics KW - Reproducibility of Results AB - A computerized, adaptive test-delivery system for the measurement of speech discrimination, the Speech Sound Pattern Discrimination Test, is described and evaluated. Using a modified discrimination task, the testing system draws on a pool of 130 items spanning a broad range of difficulty to estimate an examinee's location along an underlying continuum of speech processing ability, yet does not require the examinee to possess a high level of English language proficiency. The system is driven by a mathematical measurement model which selects only test items which are appropriate in difficulty level for a given examinee, thereby individualizing the testing experience. Test items were administered to a sample of young deaf adults, and the adaptive testing system evaluated in terms of respondents' sensory and perceptual capabilities, acoustic and phonetic dimensions of speech, and theories of speech perception. Data obtained in this study support the validity, reliability, and efficiency of this test as a measure of speech processing ability. VL - 101 N1 - 972575560001-4966Journal Article ER - TY - ABST T1 - Computerized adaptive testing through the World Wide Web Y1 - 1997 A1 - Shermis, M. D. A1 - Mzumara, H. A1 - Brown, M. A1 - Lillig, C. CY - (ERIC No. ED414536) ER - TY - CONF T1 - Computerized adaptive testing through the World Wide Web Y1 - 1997 A1 - Shermis, M. D. ER - TY - CHAP T1 - Computerized adaptive testing using the partial credit model for attitude measurement Y1 - 1997 A1 - Baek, S. G. CY - M. Wilson, G. Engelhard Jr and K. Draney (Eds.), Objective measurement: Theory into practice, volume 4. Norwood NJ: Ablex. ER - TY - CONF T1 - Controlling test and computer anxiety: Test performance under CAT and SAT conditions T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Shermis, M. D. A1 - Mzumara, H. A1 - Bublitz, S. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CHAP T1 - Current and future challenges Y1 - 1997 A1 - Segall, D. O. A1 - Moreno, K. E. CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.). Computerized adaptive testing: From inquiry to operation (pp 257-269). Washington DC: American Psychological Association. ER - TY - CONF T1 - Detecting misbehaving items in a CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Swygert, K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago, IL ER - TY - CONF T1 - Detection of aberrant response patterns in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - van der Linden, W. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Developing and scoring an innovative computerized writing assessment JF - Journal of Educational Measurement Y1 - 1997 A1 - Davey, T. A1 - Godwin, J., A1 - Mittelholz, D. VL - 34 ER - TY - JOUR T1 - Diagnostic adaptive testing: Effects of remedial instruction as empirical validation JF - Journal of Educational Measurement Y1 - 1997 A1 - Tatsuoka, K. K. A1 - Tatsuoka, M. M. VL - 34 ER - TY - JOUR T1 - The distribution of indexes of person fit within the computerized adaptive testing environment JF - Applied Psychological Measurement Y1 - 1997 A1 - Nering, M. L. KW - Adaptive Testing KW - Computer Assisted Testing KW - Fit KW - Person Environment AB - The extent to which a trait estimate represents the underlying latent trait of interest can be estimated by using indexes of person fit. Several statistical methods for indexing person fit have been proposed to identify nonmodel-fitting response vectors. These person-fit indexes have generally been found to follow a standard normal distribution for conventionally administered tests. The present investigation found that within the context of computerized adaptive testing (CAT) these indexes tended not to follow a standard normal distribution. As the item pool became less discriminating, as the CAT termination criterion became less stringent, and as the number of items in the pool decreased, the distributions of the indexes approached a standard normal distribution. It was determined that under these conditions the indexes' distributions approached standard normal distributions because more items were being administered. However, even when over 50 items were administered in a CAT the indexes were distributed in a fashion that was different from what was expected. (PsycINFO Database Record (c) 2006 APA ) VL - 21 N1 - Journal; Peer Reviewed Journal ER - TY - JOUR T1 - The effect of adaptive administration on the variability of the Mantel-Haenszel measure of differential item functioning JF - Educational and Psychological Measurement Y1 - 1997 A1 - Zwick, R. VL - 57 ER - TY - JOUR T1 - The effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational & Psychological Measurement Y1 - 1997 A1 - Chen, S-K. A1 - Hou, L. Y. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. KW - computerized adaptive testing AB - Investigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 57 N1 - Sage Publications, US ER - TY - JOUR T1 - The effect of population distribution and methods of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational and Psychological Measurement Y1 - 1997 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. VL - 57 ER - TY - CONF T1 - The effects of motivation on equating adaptive and conventional tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Segall, D. O. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CHAP T1 - Equating the CAT-ASVAB Y1 - 1997 A1 - Segall, D. O. CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 181-198). Washington DC: American Psychological Association. ER - TY - CONF T1 - Essentially unbiased EAP estimates in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1997 A1 - Wang, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago N1 - #WA97-01 PDF file, 225 K ER - TY - JOUR T1 - Evaluating an automatically scorable, open-ended response type for measuring mathematical reasoning in computer-adaptive tests Y1 - 1997 A1 - Bennett, R. E. A1 - Steffen, M. A1 - Singley, M.K. A1 - Morley, M. A1 - Jacquemin, D. ER - TY - CONF T1 - Evaluating comparability in computerized adaptive testing: A theoretical framework with an example T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1997 A1 - Wang, T. A1 - Kolen, M. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago ER - TY - CHAP T1 - Evaluating item calibration medium in computerized adaptive testing Y1 - 1997 A1 - Hetter, R. D. A1 - Segall, D. O. A1 - Bloxom, B. M. CY - W.A. Sands, B.K. Waters and J.R. McBride, Computerized adaptive testing: From inquiry to operation (pp. 161-168). Washington, DC: American Psychological Association. ER - TY - CONF T1 - Examinee issues in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Wise, S. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - [ERIC ED 408 329] ER - TY - JOUR T1 - Flawed items in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 1997 A1 - Potenza, M. T. A1 - Stocking, M. L. VL - 4 N1 - (Also Educational Testing Service RR-94-06) ER - TY - CONF T1 - Getting more precision on computer adaptive testing T2 - Paper presented at the 62nd Annual meeting of Psychometric Society Y1 - 1997 A1 - Krass, I. A. JF - Paper presented at the 62nd Annual meeting of Psychometric Society CY - University of Tennessee, Knoxville, TN ER - TY - CONF T1 - The goal of equity within and between computerized adaptive tests and paper and pencil forms. T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing JF - Quality of Life Research Y1 - 1997 A1 - Revicki, D. A. A1 - Cella, D. F. KW - *Health Status KW - *HIV Infections/diagnosis KW - *Quality of Life KW - Diagnosis, Computer-Assisted KW - Disease Progression KW - Humans KW - Psychometrics/*methods AB - Health status assessment is frequently used to evaluate the combined impact of human immunodeficiency virus (HIV) disease and its treatment on functioning and well-being from the patient's perspective. No single health status measure can efficiently cover the range of problems in functioning and well-being experienced across HIV disease stages. Item response theory (IRT), item banking and computer adaptive testing (CAT) provide a solution to measuring health-related quality of life (HRQoL) across different stages of HIV disease. IRT allows us to examine the response characteristics of individual items and the relationship between responses to individual items and the responses to each other item in a domain. With information on the response characteristics of a large number of items covering a HRQoL domain (e.g. physical function, and psychological well-being), and information on the interrelationships between all pairs of these items and the total scale, we can construct more efficient scales. Item banks consist of large sets of questions representing various levels of a HRQoL domain that can be used to develop brief, efficient scales for measuring the domain. CAT is the application of IRT and item banks to the tailored assessment of HRQoL domains specific to individual patients. Given the results of IRT analyses and computer-assisted test administration, more efficient and brief scales can be used to measure multiple domains of HRQoL for clinical trials and longitudinal observational studies. VL - 6 SN - 0962-9343 (Print) N1 - Revicki, D ACella, D FEnglandQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 1997 Aug;6(6):595-600. ER - TY - CONF T1 - Identifying similar item content clusters on multiple test forms T2 - Paper presented at the Psychometric Society meeting Y1 - 1997 A1 - Reckase, M. D. A1 - Thompson, T.D. A1 - Nering, M. JF - Paper presented at the Psychometric Society meeting CY - Gatlinburg, TN, June ER - TY - CONF T1 - Improving the quality of music aptitude tests through adaptive administration of items T2 - Paper presented at Multidisciplinary Perspectives on Musicality: The Seashore Symposium Y1 - 1997 A1 - Vispoel, W. P. JF - Paper presented at Multidisciplinary Perspectives on Musicality: The Seashore Symposium CY - University of Iowa, Iowa City IA ER - TY - ABST T1 - Incorporating content constraints into a multi-stage adaptive testlet design: LSAC report Y1 - 1997 A1 - Reese, L. M. A1 - Schnipke, D. L. A1 - Luebke, S. W. CY - Newtown, PA: Law School Admission Council ER - TY - CONF T1 - Incorporating decision consistency into Bayesian sequential testing T2 - Paper presented at the annual meeting of National Council on Measurement in Education Y1 - 1997 A1 - Smith, R. A1 - Lewis, C. JF - Paper presented at the annual meeting of National Council on Measurement in Education CY - Chicago ER - TY - JOUR T1 - An investigation of self-adapted testing in a Spanish high school population JF - Educational and Psychological Measurement Y1 - 1997 A1 - Ponsoda, V. A1 - Wise, S. L. A1 - Olea, J. A1 - Revuelta, J. VL - 57 ER - TY - CHAP T1 - Item exposure control in CAT-ASVAB T2 - Computerized adaptive testing: From inquiry to operation Y1 - 1997 A1 - Hetter, R. D. A1 - Sympson, J. B. ED - J. R. McBride AB - Describes the method used to control item exposure in computerized adaptive testing-Armed Services Vocational Aptitude Battery (CAT-ASVAB). The method described was developed specifically to ensure that CAT-ASVAB items were expose no more often than the items in the printers ASVAB's alternate forms, ensuring that CAT ASVAB is nor more vulnerable than printed ASVAB forms to comprise from item exposure. (PsycINFO Database Record (c) 2010 APA, all rights reserved) JF - Computerized adaptive testing: From inquiry to operation PB - American Psychological Association CY - Washington D.C., USA ER - TY - CHAP T1 - Item pool development and evaluation Y1 - 1997 A1 - Segall, D. O. A1 - Moreno, K. E. A1 - Hetter, D. H. CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 117-130). Washington DC: American Psychological Association. ER - TY - CONF T1 - Item pool development and maintenance T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Kingsbury, G. G. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - ABST T1 - Linking scores for computer-adaptive and paper-and-pencil administrations of the SAT (Research Report No 97-12) Y1 - 1997 A1 - Lawrence, I. A1 - Feigenbaum, M. CY - Princeton NJ: Educational Testing Service N1 - #LA97-12 ER - TY - CONF T1 - Maintaining a CAT item pool with operational data T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Levine, M. L. A1 - Segall, D. O. A1 - Williams, B. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Maintaining item and test security in a CAT environment: A simulation study T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Patsula, L N. A1 - Steffen, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL) ER - TY - CONF T1 - Mathematical programming approaches to computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Jones, D. H. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - ABST T1 - A minimax sequential procedure in the context of computerized adaptive mastery testing (Research Report 97-07) Y1 - 1997 A1 - Vos, H. J. CY - Twente, The Netherlands: Department of Educational Measurement and Data Analysis ER - TY - ABST T1 - Modification of the Computerized Adaptive Screening Test (CAST) for use by recruiters in all military services Y1 - 1997 A1 - J. R. McBride A1 - Cooper, R. R CY - Final Technical Report FR-WATSD-97-24, Contract No. MDA903-93-D-0032, DO 0054. Alexandria VA: Human Resources Research Organization. ER - TY - CONF T1 - Multidimensional adaptive testing with a minimum error-variance criterion T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1997 A1 - van der Linden, W. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago ER - TY - ABST T1 - Multidimensional adaptive testing with a minimum error-variance criterion (Research Report 97-03) Y1 - 1997 A1 - van der Linden, W. J. CY - Enschede, The Netherlands: University of Twente, Department of Educational Measurement and Data Analysis ER - TY - CONF T1 - Multi-stage CAT with stratified design T2 - Paper presented at the annual meeting of the Psychometric Society. Gatlinburg TN. Y1 - 1997 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the Psychometric Society. Gatlinburg TN. ER - TY - JOUR T1 - Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests JF - The Annals of Statistics. Y1 - 1997 A1 - Chang, Hua-Hua A1 - Ying, Z. ER - TY - JOUR T1 - On-line performance assessment using rating scales JF - Journal of Outcomes Measurement Y1 - 1997 A1 - Stahl, J. A1 - Shumway, R. A1 - Bergstrom, B. A1 - Fisher, A. KW - *Outcome Assessment (Health Care) KW - *Rehabilitation KW - *Software KW - *Task Performance and Analysis KW - Activities of Daily Living KW - Humans KW - Microcomputers KW - Psychometrics KW - Psychomotor Performance AB - The purpose of this paper is to report on the development of the on-line performance assessment instrument--the Assessment of Motor and Process Skills (AMPS). Issues that will be addressed in the paper include: (a) the establishment of the scoring rubric and its implementation in an extended Rasch model, (b) training of raters, (c) validation of the scoring rubric and procedures for monitoring the internal consistency of raters, and (d) technological implementation of the assessment instrument in a computerized program. VL - 1 N1 - 1090-655X (Print)Journal Article ER - TY - BOOK T1 - Optimization methods in computerized adaptive testing Y1 - 1997 A1 - Cordova, M. J. CY - Unpublished doctoral dissertation, Rutgers University, New Brunswick NJ ER - TY - CONF T1 - Overview of practical issues in a CAT program T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Wise, S. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - [ERIC ED 408 330] ER - TY - CONF T1 - An overview of the LSAC CAT research agenda T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Pashley, P. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Overview of the USMLE Step 2 computerized field test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Luecht, RM A1 - Nungester, R. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CHAP T1 - Policy and program management perspective Y1 - 1997 A1 - Martin, C.J. A1 - Hoshaw, C.R. CY - W.A. Sands, B.K. Waters, and J.R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association. ER - TY - CHAP T1 - Preliminary psychometric research for CAT-ASVAB: Selecting an adaptive testing strategy Y1 - 1997 A1 - J. R. McBride A1 - Wetzel, C. D. A1 - Hetter, R. D. CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 83-95). Washington DC: American Psychological Association. ER - TY - CONF T1 - Protecting the integrity of the CAT item pool T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Way, W. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Psychometric mode effects and fit issues with respect to item difficulty estimates T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Hadidi, A. A1 - Luecht, RM JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CHAP T1 - Psychometric procedures for administering CAT-ASVAB Y1 - 1997 A1 - Segall, D. O. A1 - Moreno, K. E. A1 - Bloxom, B. M. A1 - Hetter, R. D. CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 131-140). Washington D.C.: American Psychological Association. ER - TY - CONF T1 - Realistic simulation procedures for item response data T2 - In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society Y1 - 1997 A1 - Davey, T. A1 - Nering, M. A1 - Thompson, T. JF - In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society CY - Gatlinburg TN N1 - Symposium presented at the annual meeting of the Psychometric Society, Gatlinburg TN. ER - TY - CONF T1 - Relationship of response latency to test design, examinee ability, and item difficulty in computer-based test administration T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Swanson, D. B. A1 - Featherman, C. M. A1 - Case, A. M. A1 - Luecht, RM A1 - Nungester, R. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CHAP T1 - Reliability and construct validity of CAT-ASVAB Y1 - 1997 A1 - Moreno, K. E. A1 - Segall, O. D. CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.). Computerized adaptive testing: From inquiry to operation (pp. 169-179). Washington DC: American Psychological Association. ER - TY - CHAP T1 - Research antecedents of applied adaptive testing T2 - Computerized adaptive testing: From inquiry to practice Y1 - 1997 A1 - J. R. McBride ED - B. K. Waters ED - J. R. McBride KW - computerized adaptive testing AB - (from the chapter) This chapter sets the stage for the entire computerized adaptive testing Armed Services Vocational Aptitude Battery (CAT-ASVAB) development program by describing the state of the art immediately preceding its inception. By the mid-l970s, a great deal of research had been conducted that provided the technical underpinnings needed to develop adaptive tests, but little research had been done to corroborate empirically the promising results of theoretical analyses and computer simulation studies. In this chapter, the author summarizes much of the important theoretical and simulation research prior to 1977. In doing so, he describes a variety of approaches to adaptive testing, and shows that while many methods for adaptive testing had been proposed, few practical attempts had been made to implement it. Furthermore, the few instances of adaptive testing were based primarily on traditional test theory, and were developed in laboratory settings for purposes of basic research. The most promising approaches, those based on item response theory and evaluated analytically or by means of computer simulations, remained to be proven in the crucible of live testing. (PsycINFO Database Record (c) 2004 APA, all rights reserved). JF - Computerized adaptive testing: From inquiry to practice PB - American Psychological Association CY - Washington D.C. USA ER - TY - JOUR T1 - Revising item responses in computerized adaptive tests: A comparison of three models JF - Applied Psychological Measurement Y1 - 1997 A1 - Stocking, M. L. KW - computerized adaptive testing AB - Interest in the application of large-scale computerized adaptive testing has focused attention on issues that arise when theoretical advances are made operational. One such issue is that of the order in which exaniinees address questions within a test or separately timed test section. In linear testing, this order is entirely under the control of the examinee, who can look ahead at questions and return and revise answers to questions. Using simulation, this study investigated three models that permit restricted examinee control over revising previous answers in the context of adaptive testing. Even under a worstcase model of examinee revision behavior, two of the models of permitting item revisions worked well in preserving test fairness and accuracy. One model studied may also preserve some cognitive processing styles developed by examinees for a linear testing environment. VL - 21 N1 - Sage Publications, US ER - TY - JOUR T1 - The role of item feedback in self-adapted testing JF - Educational and Psychological Measurement Y1 - 1997 A1 - Roos, L. L. A1 - Wise, S. L. A1 - Plake, B. S. VL - 57 ER - TY - JOUR T1 - Self-adapted testing: Improving performance by modifying tests instead of examinees JF - Stress & Coping: An International Journal Y1 - 1997 A1 - Rocklin, T. AB - This paper describes self-adapted testing and some of the evidence concerning its effects, presents possible theoretical explanations for those effects, and discusses some of the practical concerns regarding self-adapted testing. Self-adapted testing is a variant of computerized adapted testing in which the examine makes dynamic choices about the difficulty of the items he or she attempts. Self-adapted testing generates scores that are, in constrast to computerized adapted test and fixed-item tests, uncorrelated with a measure of trait test anxiety. This lack of correlation with an irrelevant attribute of the examine is evidence of an improvement in the construct validity of the scores. This improvement comes at the cost of a decrease in testing efficiency. The interaction between test anxiety and test administration mode is more consistent with an interference theory of test anxiety than a deficit theory. Some of the practical concerns regarding self-adapted testing can be ruled out logically, but others await empirical investigation. VL - 10(1) ER - TY - ABST T1 - Simulating the use of disclosed items in computerized adaptive testing (Research Report 97-10) Y1 - 1997 A1 - Stocking, M. L. A1 - W. C. Ward A1 - Potenza, M. T. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Simulation of realistic ability vectors T2 - Paper presented at the Psychometric Society meeting Y1 - 1997 A1 - Nering, M. A1 - Thompson, T.D. A1 - Davey, T. JF - Paper presented at the Psychometric Society meeting CY - Gatlinburg TN ER - TY - CONF T1 - A simulation study of the use of the Mantel-Haenszel and logistic regression procedures for assessing DIF in a CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Ross, L. P. A1 - Nandakumar, R, A1 - Clauser, B. E. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Some new item selection criteria for adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 1997 A1 - Berger, M. P. F., A1 - Veerkamp, W. J. J. VL - 22 ER - TY - JOUR T1 - Some new item selection criteria for adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 1997 A1 - Veerkamp, W. J. J., A1 - Berger, M. P. F. VL - 22 ER - TY - CONF T1 - Some questions that must be addressed to develop and maintain an item pool for use in an adaptive test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Kingsbury, G. G. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - BOOK T1 - Statistical methods for computerized adaptive testing Y1 - 1997 A1 - Veerkamp, W. J. J. CY - Unpublished doctoral dissertation, University of Twente, Enschede, The Netherlands ER - TY - CHAP T1 - Technical perspective Y1 - 1997 A1 - J. R. McBride CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 29-44). Washington, DC: American Psychological Association. ER - TY - JOUR T1 - Una solución a la estimatión inicial en los tests adaptivos informatizados [A solution to initial estimation in CATs.] JF - Revista Electrónica de Metodología Aplicada Y1 - 1997 A1 - Revuelta, J. A1 - Ponsoda, V. VL - 2 ER - TY - ABST T1 - Unidimensional approximations for a computerized adaptive test when the item pool and latent space are multidimensional (Research Report 97-5) Y1 - 1997 A1 - Spray, J. A. A1 - Abdel-Fattah, A. A. A1 - Huang, C.-Y. A1 - Lau, CA CY - Iowa City IA: ACT Inc ER - TY - CONF T1 - Validation of CATSIB To investigate DIF of CAT data T2 - annual meeting of the American Educational Research Association Y1 - 1997 A1 - Nandakumar, R. A1 - Roussos, L. A. KW - computerized adaptive testing AB - This paper investigates the performance of CATSIB (a modified version of the SIBTEST computer program) to assess differential item functioning (DIF) in the context of computerized adaptive testing (CAT). One of the distinguishing features of CATSIB is its theoretically built-in regression correction to control for the Type I error rates when the distributions of the reference and focal groups differ on the intended ability. This phenomenon is also called impact. The Type I error rate of CATSIB with the regression correction (WRC) was compared with that of CATSIB without the regression correction (WORC) to see if the regression correction was indeed effective. Also of interest was the power level of CATSIB after the regression correction. The subtest size was set at 25 items, and sample size, the impact level, and the amount of DIF were varied. Results show that the regression correction was very useful in controlling for the Type I error, CATSIB WORC had inflated observed Type I errors, especially when impact levels were high. The CATSIB WRC had observed Type I error rates very close to the nominal level of 0.05. The power rates of CATSIB WRC were impressive. As expected, the power increased as the sample size increased and as the amount of DIF increased. Even for small samples with high impact rates, power rates were 64% or higher for high DIF levels. For large samples, power rates were over 90% for high DIF levels. (Contains 12 tables and 7 references.) (Author/SLD) JF - annual meeting of the American Educational Research Association CY - Chicago, IL. USA ER - TY - CHAP T1 - Validation of the experimental CAT-ASVAB system Y1 - 1997 A1 - Segall, D. O. A1 - Moreno, K. E. A1 - Kieckhaefer, W. F. A1 - Vicino, F. L. A1 - J. R. McBride CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association. ER - TY - CHAP T1 - Adaptive assessment and training using the neighbourhood of knowledge states Y1 - 1996 A1 - Dowling, C. E. A1 - Hockemeyer, C. A1 - Ludwig, A .H. CY - Frasson, C. and Gauthier, G. and Lesgold, A. (eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montral, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 578-587. ER - TY - CHAP T1 - Adaptive assessment using granularity hierarchies and Bayesian nets Y1 - 1996 A1 - Collins, J. A. A1 - Greer, J. E. A1 - Huang, S. X. CY - Frasson, C. and Gauthier, G. and Lesgold, A. (Eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montréal, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 569-577. ER - TY - BOOK T1 - Adaptive testing with granularity Y1 - 1996 A1 - Collins, J. A. CY - Masters thesis, University of Saskatchewan, Department of Computer Science ER - TY - JOUR T1 - An alternative method for scoring adaptive tests JF - Journal of Educational and Behavioral Statistics Y1 - 1996 A1 - Stocking, M. L. VL - 21 N1 - (Also Educational Testing Service RR 94-48.) ER - TY - JOUR T1 - Bayesian item selection criteria for adaptive testing JF - Psychometrika Y1 - 1996 A1 - van der Linden, W. J. VL - 63 ER - TY - ABST T1 - Bayesian item selection criteria for adaptive testing (Research Report 96-01) Y1 - 1996 A1 - van der Linden, W. J. CY - Twente, The Netherlands: Department of Educational Measurement and Data Analysis ER - TY - CONF T1 - Building a statistical foundation for computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1996 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the Psychometric Society CY - Banff, Alberta, Canada ER - TY - CONF T1 - Can examinees use a review option to positively bias their scores on a computerized adaptive test? Paper presented at the annual meeting of the National council on Measurement in Education, New York T2 - Paper presented at the annual meeting of the National council on Measurement in Education Y1 - 1996 A1 - Rocklin, T. R. A1 - Vispoel, W. P. A1 - Wang, T. A1 - Bleiler, T. L. JF - Paper presented at the annual meeting of the National council on Measurement in Education CY - New York NY ER - TY - BOOK T1 - A comparison of adaptive self-referenced testing and classical approaches to the measurement of individual change Y1 - 1996 A1 - VanLoy, W. J. CY - Unpublished doctoral dissertation, University of Minnesota ER - TY - JOUR T1 - Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test JF - Journal of Educational & Behavioral Statistics Y1 - 1996 A1 - Spray, J. A. A1 - Reckase, M. D. VL - 21 ER - TY - CONF T1 - A comparison of the traditional maximum information method and the global information method in CAT item selection T2 - annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Tang, K. L. KW - computerized adaptive testing KW - item selection JF - annual meeting of the National Council on Measurement in Education CY - New York, NY USA ER - TY - JOUR T1 - Computerized adaptive skill assessment in a statewide testing program JF - Journal of Research on Computing in Education Y1 - 1996 A1 - Shermis, M. D. A1 - Stemmer, P. M. A1 - Webb, P. M. VL - 29(1) ER - TY - ABST T1 - Computerized adaptive testing for classifying examinees into three categories (Measurement and Research Department Rep 96-3) Y1 - 1996 A1 - Theo Eggen A1 - Straetmans, G. J. J. M. CY - Arnhem, The Netherlands: Cito N1 - #EG96-3 . [Reprinted in Chapter 5 in #EG04-01] ER - TY - JOUR T1 - Computerized adaptive testing for reading assessment and diagnostic assessment JF - Journal of Developmental Education Y1 - 1996 A1 - Shermis, M. D. A1 - et. al. ER - TY - JOUR T1 - Computerized adaptive testing for the national certification examination JF - AANA.J Y1 - 1996 A1 - Bergstrom, Betty A. VL - 64 ER - TY - JOUR T1 - Computerized adaptive testing for the national certification examination Y1 - 1996 A1 - Bergstrom, Betty A. ER - TY - CONF T1 - Computing scores for incomplete GRE General computer adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Slater, S. C. A1 - Schaffer, G.A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York NY ER - TY - JOUR T1 - Conducting self-adapted testing using MicroCAT JF - Educational and Psychological Measurement Y1 - 1996 A1 - Roos, L. L. A1 - Wise, S. L. A1 - Yoes, M. E. A1 - Rocklin, T. R. VL - 56 ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Davey, T. A1 - Thomas, L. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - CHAP T1 - A content-balanced adaptive testing algorithm for computer-based training systems Y1 - 1996 A1 - Huang, S. X. CY - Frasson, C. and Gauthier, G. and Lesgold, A. (Eds.), Intelligent Tutoring Systems, Third International Conference, ITS'96, Montr�al, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 306-314. ER - TY - CONF T1 - A critical analysis of the argument for and against item review in computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Wise, S. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York ER - TY - CONF T1 - Current research in computer-based testing for personnel selection and classification in the United States T2 - Invited address to the Centre for Recruitment and Selection, Belgian Armed Forces" Y1 - 1996 A1 - J. R. McBride JF - Invited address to the Centre for Recruitment and Selection, Belgian Armed Forces" ER - TY - JOUR T1 - Dispelling myths about the new NCLEX exam JF - Recruitment, Retention, and Restructuring Report Y1 - 1996 A1 - Johnson, S. H. KW - *Educational Measurement KW - *Licensure KW - Humans KW - Nursing Staff KW - Personnel Selection KW - United States AB - The new computerized NCLEX system is working well. Most new candidates, employers, and board of nursing representatives like the computerized adaptive testing system and the fast report of results. But, among the candidates themselves some myths have grown which cause them needless anxiety. VL - 9 N1 - Journal Article ER - TY - JOUR T1 - Dynamic scaling: An ipsative procedure using techniques from computer adaptive testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1996 A1 - Berg, S. R. KW - computerized adaptive testing AB - The purpose of this study was to create a prototype method for scaling items using computer adaptive testing techniques and to demonstrate the method with a working model program. The method can be used to scale items, rank individuals with respect to the scaled items, and to re-scale the items with respect to the individuals' responses. When using this prototype method, the items to be scaled are part of a database that contains not only the items, but measures of how individuals respond to each item. After completion of all presented items, the individual is assigned an overall scale value which is then compared with each item responded to, and an individual "error" term is stored with each item. After several individuals have responded to the items, the item error terms are used to revise the placement of the scaled items. This revision feature allows the natural adaptation of one general list to reflect subgroup differences, for example, differences among geographic areas or ethnic groups. It also provides easy revision and limited authoring of the scale items by the computer program administrator. This study addressed the methodology, the instrumentation needed to handle the scale-item administration, data recording, item error analysis, and scale-item database editing required by the method, and the behavior of a prototype vocabulary test in use. Analyses were made of item ordering, response profiles, item stability, reliability and validity. Although slow, the movement of unordered words used as items in the prototype program was accurate as determined by comparison with an expert word ranking. Person scores obtained by multiple administrations of the prototype test were reliable and correlated at.94 with a commercial paper-and-pencil vocabulary test, while holding a three-to-one speed advantage in administration. Although based upon self-report data, dynamic scaling instruments like the model vocabulary test could be very useful for self-assessment, for pre (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 56 ER - TY - CONF T1 - Effect of altering passing score in CAT when unidimensionality is violated T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Abdel-Fattah, A. A. A1 - Lau, CA A1 - Spray, J. A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York NY ER - TY - JOUR T1 - The effect of individual differences variables on the assessment of ability for Computerized Adaptive Testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1996 A1 - Gershon, R. C. KW - computerized adaptive testing AB - Computerized Adaptive Testing (CAT) continues to gain momentum as the accepted testing modality for a growing number of certification, licensure, education, government and human resource applications. However, the developers of these tests have for the most part failed to adequately explore the impact of individual differences such as test anxiety on the adaptive testing process. It is widely accepted that non-cognitive individual differences variables interact with the assessment of ability when using written examinations. Logic would dictate that individual differences variables would equally affect CAT. Two studies were used to explore this premise. In the first study, 507 examinees were given a test anxiety survey prior to taking a high stakes certification exam using CAT or using a written format. All examinees had already completed their course of study, and the examination would be their last hurdle prior to being awarded certification. High test anxious examinees performed worse than their low anxious counterparts on both testing formats. The second study replicated the finding that anxiety depresses performance in CAT. It also addressed the differential effect of anxiety on within test performance. Examinees were candidates taking their final certification examination following a four year college program. Ability measures were calculated for each successive part of the test for 923 subjects. Within subject performance varied depending upon test position. High anxious examinees performed poorly at all points in the test, while low and medium anxious examinee performance peaked in the middle of the test. If test anxiety and performance measures were actually the same trait, then low anxious individuals should have performed equally well throughout the test. The observed interaction of test anxiety and time on task serves as strong evidence that test anxiety has motivationally mediated as well as cognitively mediated effects. The results of the studies are di (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 57 ER - TY - CONF T1 - Effects of answer feedback and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education. Y1 - 1996 A1 - Vispoel, W. P. A1 - Brunsman, B. A1 - Forte, E. A1 - Bleiler, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education. N1 - #VI96-01 ER - TY - CONF T1 - Effects of answer review and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Vispoel, W. A1 - Forte, E. A1 - Boo, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York ER - TY - CONF T1 - The effects of methods of theta estimation, prior distribution, and number of quadrature points on CAT using the graded response model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Hou, L. A1 - Chen, S. A1 - Dodd. B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York NY ER - TY - BOOK T1 - The effects of person misfit in computerized adaptive testing Y1 - 1996 A1 - Nering, M. L. CY - Unpublished doctoral dissertation, University of Minnesota, Minneapolis ER - TY - CONF T1 - Effects of randomesque item selection on CAT item exposure rates and proficiency estimation under 1- and 2-PL models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Featherman, C. M. A1 - Subhiyah, R. G. A1 - Hadadi, A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - CONF T1 - An evaluation of a two-stage testlet design for computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1996 A1 - Reese, L. M. A1 - Schnipke, D. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Banff, Alberta, Canada ER - TY - JOUR T1 - A Global Information Approach to Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 1996 A1 - Chang, H.-H. A1 - Ying, Z. VL - 20 IS - 3 ER - TY - JOUR T1 - A global information approach to computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1996 A1 - Chang, Hua-Hua A1 - Ying, Z. AB - based on Fisher information (or item information). At each stage, an item is selected to maximize the Fisher information at the currently estimated trait level (&thetas;). However, this application of Fisher information could be much less efficient than assumed if the estimators are not close to the true &thetas;, especially at early stages of an adaptive test when the test length (number of items) is too short to provide an accurate estimate for true &thetas;. It is argued here that selection procedures based on global information should be used, at least at early stages of a test when &thetas; estimates are not likely to be close to the true &thetas;. For this purpose, an item selection procedure based on average global information is proposed. Results from pilot simulation studies comparing the usual maximum item information item selection with the proposed global information approach are reported, indicating that the new method leads to improvement in terms of bias and mean squared error reduction under many circumstances. Index terms: computerized adaptive testing, Fisher information, global information, information surface, item information, item response theory, Kullback-Leibler information, local information, test information. VL - 20 SN - 0146-6216 ER - TY - CONF T1 - Heuristic-based CAT: Balancing item information, content, and exposure T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Luecht, RM A1 - Hadadi, A. A1 - Nungester, R. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York NY ER - TY - CONF T1 - Heuristic-based CAT: Balancing item information, content and exposure T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Luecht, RM A1 - Hadadi, A. A1 - Nungester, R. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York NY ER - TY - CONF T1 - Heuristics based CAT: Balancing item information, content, and exposure T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Luecht, RM A1 - Nungester, R. J. A1 - Hadadi, A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York NY ER - TY - CONF T1 - Item review and adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Kingsbury, G. G. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York ER - TY - JOUR T1 - Methodologic trends in the healthcare professions: computer adaptive and computer simulation testing JF - Nurse Education Y1 - 1996 A1 - Forker, J. E. A1 - McDonald, M. E. KW - *Clinical Competence KW - *Computer Simulation KW - Computer-Assisted Instruction/*methods KW - Educational Measurement/*methods KW - Humans AB - Assessing knowledge and performance on computer is rapidly becoming a common phenomenon in testing and measurement. Computer adaptive testing presents an individualized test format in accordance with the examinee's ability level. The efficiency of the testing process enables a more precise estimate of performance, often with fewer items than traditional paper-and-pencil testing methodologies. Computer simulation testing involves performance-based, or authentic, assessment of the examinee's clinical decision-making abilities. The authors discuss the trends in assessing performance through computerized means and the application of these methodologies to community-based nursing practice. VL - 21 SN - 0363-3624 (Print)0363-3624 (Linking) N1 - Forker, J EMcDonald, M EUnited statesNurse educatorNurse Educ. 1996 Jul-Aug;21(4):13-4. ER - TY - JOUR T1 - Metodos sencillos para el control de las tasas de exposicion en tests adaptativos informatizados [Simple methods for item exposure control in CATs] JF - Psicologica Y1 - 1996 A1 - Revuelta, J. A1 - Ponsoda, V. VL - 17 ER - TY - ABST T1 - Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing (Research Report RR-96-30-ONR) Y1 - 1996 A1 - Mislevy, R. J. A1 - Wu, P.-K. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - A model for score maximization within a computerized adaptive testing environment T2 - Paper presented at the annual meeting of the NMCE Y1 - 1996 A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the NMCE CY - New York NY ER - TY - CONF T1 - Modifying the NCLEXTM CAT item selection algorithm to improve item exposure T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Way, W. D. A1 - A Zara A1 - Leahy, J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - JOUR T1 - Multidimensional adaptive testing JF - Psychometrika Y1 - 1996 A1 - Segall, D. O. VL - 61 ER - TY - JOUR T1 - Multidimensional adaptive testing JF - Psychometrika Y1 - 1996 A1 - Segall, D. O. AB - Maximum likelihood and Bayesian procedures for item selection and scoring of multidimensional adaptive tests are presented. A demonstration using simulated response data illustrates that multidimensional adaptive testing (MAT) can provide equal or higher reliabilities with about one-third fewer items than are required by one-dimensional adaptive testing (OAT). Furthermore, holding test-length constant across the MAT and OAT approaches, substantial improvements in reliability can be obtained from multidimensional assessment. A number of issues relating to the operational use of multidimensional adaptive testing are discussed. VL - 61 N1 - Peer Reviewed Journalhttp://www.psychometrika.org/ ER - TY - CONF T1 - Multidimensional computer adaptive testing T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 1996 A1 - Fan, M. A1 - Hsu, Y. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - New York NY N1 - #FA96-02 ER - TY - JOUR T1 - Multidimensional computerized adaptive testing in a certification or licensure context JF - Applied Psychological Measurement Y1 - 1996 A1 - Luecht, RM KW - computerized adaptive testing AB - (from the journal abstract) Multidimensional item response theory (MIRT) computerized adaptive testing, building on a recent work by D. O. Segall (1996), is applied in a licensing/certification context. An example of a medical licensure test is used to demonstrate situations in which complex, integrated content must be balanced at the total test level for validity reasons, but items assigned to reportable subscore categories may be used under a MIRT adaptive paradigm to improve the reliability of the subscores. A heuristic optimization framework is outlined that generalizes to both univariate and multivariate statistical objective functions, with additional systems of constraints included to manage the content balancing or other test specifications on adaptively constructed test forms. Simulation results suggested that a multivariate treatment of the problem, although complicating somewhat the objective function used and the estimation of traits, nonetheless produces advantages from a psychometric perspective. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 20 N1 - Sage Publications, US ER - TY - JOUR T1 - Multidimensional Computerized Adaptive Testing in a Certification or Licensure Context JF - Applied Psychological Measurement Y1 - 1996 A1 - Luecht, RM VL - 20 ER - TY - CONF T1 - New algorithms for item selection and exposure and proficiency estimation under 1- and 2-PL models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Featherman, C. M. A1 - Subhiyah, R. G. A1 - Hadadi, A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - ABST T1 - Optimal design of item pools for computerized adaptive testing (Research Report 96-34) Y1 - 1996 A1 - Stocking, M. L. A1 - Swanson, L. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Person-fit indices and their role in the CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - McLeod, L. D. A1 - Lewis, C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York ER - TY - CONF T1 - Person-fit indices and their role in the CAT environment T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - David, L. A. A1 - Lewis, C. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - New York NY ER - TY - JOUR T1 - Practical issues in large-scale computerized adaptive testing JF - Applied Measurement in Education Y1 - 1996 A1 - Mills, C. N., A1 - Stocking, M. L. VL - 9 ER - TY - ABST T1 - Preliminary cost-effectiveness analysis of alternative ASVAB testing concepts at MET sites Y1 - 1996 A1 - Hogan, P.F. A1 - Dall, T. A1 - J. R. McBride CY - Interim report to Defense Manpower Data Center. Fairfax, VA: Lewin-VHI, Inc. ER - TY - JOUR T1 - Propiedades psicometricas du un test adaptivo informatizado do vocabulario ingles [Psychometric properties of a computerized adaptive tests for the measurement of English vocabulary] JF - Estudios de Psicologica Y1 - 1996 A1 - Olea., J. A1 - Ponsoda, V. A1 - Revuelta, J. A1 - Belchi, J. VL - 55 N1 - [In Spanish] ER - TY - ABST T1 - Recursive maximum likelihood estimation, sequential design, and computerized adaptive testing Y1 - 1996 A1 - Chang, Hua-Hua A1 - Ying, Z. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Revising item responses in computerized adaptive testing: A comparison of three models (RR-96-12) Y1 - 1996 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service N1 - #ST96-12 [See APM paper 1997). ER - TY - BOOK T1 - Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data Y1 - 1996 A1 - Lau, CA CY - Unpublished doctoral dissertation, University of Iowa, Iowa City IA N1 - {PDF file, 1.838 MB} ER - TY - CONF T1 - A search procedure to determine sets of decision points when using testlet-based Bayesian sequential testing procedures T2 - Paper presented at the annual meeting of National Council on Measurement in Education Y1 - 1996 A1 - Smith, R. A1 - Lewis, C. JF - Paper presented at the annual meeting of National Council on Measurement in Education CY - New York ER - TY - ABST T1 - Some practical examples of computerized adaptive sequential testing (Internal Report) Y1 - 1996 A1 - Luecht, RM A1 - Nungester, R. J. CY - Philadelphia: National Board of Medical Examiners ER - TY - CONF T1 - Strategies for managing item pools to maximize item security T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Way, W. D. A1 - A Zara A1 - Leahy, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego ER - TY - CHAP T1 - Test adaptativos informatizados [Computerized adaptive testing] T2 - Psicometría Y1 - 1996 A1 - Olea, J. A1 - Ponsoda, V. JF - Psicometría PB - Universitas CY - Madrid, UNED ER - TY - CONF T1 - A Type I error rate study of a modified SIBTEST DIF procedure with potential application to computerized adaptive tests T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1996 A1 - Roussos, L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Alberta Canada ER - TY - BOOK T1 - Users manual for the MicroCAT testing system, Version 3.5 Y1 - 1996 A1 - Assessment-Systems-Corporation. CY - St Paul MN: Assessment Systems Corporation N1 - . ER - TY - CONF T1 - Using unidimensional IRT models for dichotomous classification via CAT with multidimensional data T2 - Poster session presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Lau, CA A1 - Abdel-Fattah, A. A. A1 - Spray, J. A. JF - Poster session presented at the annual meeting of the American Educational Research Association CY - Boston MA ER - TY - CONF T1 - Utility of Fisher information, global information and different starting abilities in mini CAT T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Fan, M. A1 - Hsu, Y. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - New York NY N1 - #FA96-01 ER - TY - JOUR T1 - Validity of item selection: A comparison of automated computerized adaptive and manual paper and pencil examinations JF - Teaching and Learning in Medicine Y1 - 1996 A1 - Lunz, M. E. A1 - Deville, C. W. VL - 8 ER - TY - JOUR T1 - Assessment of scaled score consistency in adaptive testing from a multidimensional item response theory perspective JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1995 A1 - Fan, Miechu KW - computerized adaptive testing AB - The purpose of this study was twofold: (a) to examine whether the unidimensional adaptive testing estimates are comparable for different ability levels of examinees when the true examinee-item interaction is correctly modeled using a compensatory multidimensional item response theory (MIRT) model; and (b) to investigate the effects of adaptive testing estimation when the procedure of item selection of computerized adaptive testing (CAT) is controlled by either content-balancing or selecting the most informative item in a user specified direction at the current estimate of unidimensional ability. A series of Monte Carlo simulations were conducted in this study. Deviation from the reference composite angle was used as an index of the theta1,theta2-composite consistency across the different levels of unidimensional CAT estimates. In addition, the effect of the content-balancing item selection procedure and the fixed-direction item selection procedure were compared across the different ability levels. The characteristics of item selection, test information and the relationship between unidimensional and multidimensional models were also investigated. In addition to employing statistical analysis to examine the robustness of the CAT procedure violations of unidimensionality, this research also included graphical analyses to present the results. The results were summarized as follows: (a) the reference angles for the no-control-item-selection method were disparate across the unidimensional ability groups; (b) the unidimensional CAT estimates from the content-balancing item selection method did not offer much improvement; (c) the fixed-direction-item selection method did provide greater consistency for the unidimensional CAT estimates across the different levels of ability; (d) and, increasing the CAT test length did not provide greater score scale consistency. Based on the results of this study, the following conclusions were drawn: (a) without any controlling (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 55 ER - TY - CONF T1 - A Bayesian computerized mastery model with multiple cut scores T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Smith, R. L. A1 - Lewis, C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco CA ER - TY - CONF T1 - Bayesian item selection in adaptive testing T2 - Paper presented at the Annual Meeting of the Psychometric Society Y1 - 1995 A1 - van der Linden, W. J. JF - Paper presented at the Annual Meeting of the Psychometric Society CY - Minneapolis MN ER - TY - JOUR T1 - Comparability and validity of computerized adaptive testing with the MMPI-2 JF - Journal of Personality Assessment Y1 - 1995 A1 - Roper, B. L. A1 - Ben-Porath, Y. S. A1 - Butcher, J. N. AB - The comparability and validity of a computerized adaptive (CA) Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 571 undergraduate college students. The CA MMPI-2 administered adaptively Scales L, E the 10 clinical scales, and the 15 content scales, utilizing the countdown method (Butcher, Keller, & Bacon, 1985). All subjects completed the MMPI-2 twice, with three experimental conditions: booklet test-retest, booklet-CA, and conventional computerized (CC)-CA. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of the three forms. Correlations between MMPI-2 scales and other psychometric measures (Beck Depression Inventory; Symptom Checklist-Revised; State-Trait Anxiety and Anger Scales; and the Anger Expression Scale) support the validity of the CA MMPI-2. Substantial item savings may be realized with the implementation of the countdown procedure. VL - 65 SN - 0022-3891 (Print) N1 - Roper, B LBen-Porath, Y SButcher, J NUnited StatesJournal of personality assessmentJ Pers Assess. 1995 Oct;65(2):358-71. ER - TY - CONF T1 - Comparability studies for the GRE CAT General Test and the NCLEX using CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Eignor, D. R. A1 - Schaffer, G.A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - CONF T1 - A comparison of classification agreement between adaptive and full-length test under the 1-PL and 2-PL models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Lewis, M. J. A1 - Subhiyah, R. G. A1 - Morrison, C. A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA N1 - (cited in #RE98311) ER - TY - CONF T1 - A comparison of gender differences on paper-and-pencil and computer-adaptive versions of the Graduate Record Examination T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Bridgeman, B. A1 - Schaeffer, G. A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - JOUR T1 - A comparison of item selection routines in linear and adaptive tests JF - Journal of Educational Measurement Y1 - 1995 A1 - Schnipke, D. L., A1 - Green, B. F. VL - 32 ER - TY - CONF T1 - A comparison of two IRT-based models for computerized mastery testing when item parameter estimates are uncertain T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Way, W. D. A1 - Lewis, C. A1 - Smith, R. L. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - CONF T1 - Computer adaptive testing in a medical licensure setting: A comparison of outcomes under the one- and two- parameter logistic models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Morrison, C. A. A1 - Nungester, R. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - JOUR T1 - Computer-adaptive testing: A new breed of assessment JF - Journal of the American Dietetic Association Y1 - 1995 A1 - Ruiz, B. A1 - Fitz, P. A. A1 - Lewis, C. A1 - Reidy, C. VL - 95 ER - TY - JOUR T1 - Computer-adaptive testing: A new breed of assessment JF - Journal of the American Dietetic Association Y1 - 1995 A1 - Ruiz, B. A1 - Fitz, P. A. A1 - Lewis, C. A1 - Reidy, C. VL - 95 ER - TY - JOUR T1 - Computer-adaptive testing: CAT: A Bayesian maximum-falsification approach JF - Rasch Measurement Transactions Y1 - 1995 A1 - Linacre, J. M. VL - 9 ER - TY - BOOK T1 - Computerized adaptive attitude testing using the partial credit model Y1 - 1995 A1 - Baek, S. G. CY - Dissertation Abstracts International-A, 55(7-A), 1922 (UMI No. AAM9430378) ER - TY - JOUR T1 - Computerized adaptive testing: Tracking candidate response patterns JF - Journal of Educational Computing Research Y1 - 1995 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. AB - Tracked the effect of candidate response patterns on a computerized adaptive test. Data were from a certification examination in laboratory science administered in 1992 to 155 candidates, using a computerized adaptive algorithm. The 90-item certification examination was divided into 9 units of 10 items each to track the pattern of initial responses and response alterations on ability estimates and test precision across the 9 test units. The precision of the test was affected most by response alterations during early segments of the test. While candidates generally benefited from altering responses, individual candidates showed different patterns of response alterations across test segments. Test precision was minimally affected, suggesting that the tailoring of computerized adaptive testing is minimally affected by response alterations. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 13 N1 - Baywood Publishing, US ER - TY - JOUR T1 - Computerized adaptive testing with polytomous items JF - Applied Psychological Measurement Y1 - 1995 A1 - Dodd, B. G. A1 - De Ayala, R. J., A1 - Koch, W. R. AB - Discusses polytomous item response theory models and the research that has been conducted to investigate a variety of possible operational procedures (item bank, item selection, trait estimation, stopping rule) for polytomous model-based computerized adaptive testing (PCAT). Studies are reviewed that compared PCAT systems based on competing item response theory models that are appropriate for the same measurement objective, as well as applications of PCAT in marketing and educational psychology. Directions for future research using PCAT are suggested. VL - 19 ER - TY - JOUR T1 - Computerized Adaptive Testing With Polytomous Items JF - Applied Psychological Measurement Y1 - 1995 A1 - Dodd, B. G. A1 - De Ayala, R. J. A1 - Koch. W.R., VL - 19 IS - 1 ER - TY - CHAP T1 - Computerized testing for licensure Y1 - 1995 A1 - Vale, C. D. CY - J. Impara (ed.), Licensure testing: Purposes, procedures, and Practices (pp. 291-320). Lincoln NE: Buros Institute of Mental Measurements. ER - TY - ABST T1 - Controlling item exposure conditional on ability in computerized adaptive testing (Research Report 95-25) Y1 - 1995 A1 - Stocking, M. L. A1 - Lewis, C. CY - Princeton NJ: Educational Testing Service. N1 - #ST95-25; also see #ST98057 ER - TY - CONF T1 - Does cheating on CAT pay: Not T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Gershon, R. C. A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco N1 - ERIC ED 392 844 ER - TY - CONF T1 - The effect of ability estimation for polytomous CAT in different item selection procedures T2 - Paper presented at the Annual meeting of the Psychometric Society Y1 - 1995 A1 - Fan, M. A1 - Hsu, Y. JF - Paper presented at the Annual meeting of the Psychometric Society CY - Minneapolis MN ER - TY - CONF T1 - The effect of model misspecification on classification decisions made using a computerized test: 3-PLM vs. 1PLM (and UIRT versus MIRT) T2 - Paper presented at the Annual Meeting of the Psychometric Society Y1 - 1995 A1 - Spray, J. A. A1 - Kalohn, J.C. A1 - Schulz, M. A1 - Fleer, P. Jr. JF - Paper presented at the Annual Meeting of the Psychometric Society CY - Minneapolis, MN N1 - #SP95-01 ER - TY - CONF T1 - The effect of model misspecification on classification decisions made using a computerized test: UIRT versus MIRT T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1995 A1 - Abdel-Fattah, A. A. A1 - Lau, C.-M. A. JF - Paper presented at the annual meeting of the Psychometric Society CY - Minneapolis MN N1 - #AB95-01 ER - TY - CONF T1 - The effect of population distribution and methods of theta estimation on CAT using the rating scale model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - JOUR T1 - Effect of Rasch calibration on ability and DIF estimation in computer-adaptive tests JF - Journal of Educational Measurement Y1 - 1995 A1 - Zwick, R. A1 - Thayer, D. T. A1 - Wingersky, M. VL - 32 ER - TY - JOUR T1 - Effects and underlying mechanisms of self-adapted testing JF - Journal of Educational Psychology Y1 - 1995 A1 - Rocklin, T. R. A1 - O’Donnell, A. M. A1 - Holst, P. M. VL - 87 ER - TY - CONF T1 - The effects of item compromise on computerized adaptive test scores T2 - Paper presented at the meeting of the Society for Industrial and Organizational Psychology Y1 - 1995 A1 - Segall, D. O. JF - Paper presented at the meeting of the Society for Industrial and Organizational Psychology CY - Orlando, FL ER - TY - BOOK T1 - El control de la exposicin de los items en tests adaptativos informatizados [Item exposure control in computerized adaptive tests] Y1 - 1995 A1 - Revuelta, J. CY - Unpublished master’s dissertation, Universidad Autonma de Madrid, Spain ER - TY - CONF T1 - Equating computerized adaptive certification examinations: The Board of Registry series of studies T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - CONF T1 - Equating the CAT-ASVAB: Experiences and lessons learned T2 - Paper presented at the meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Segall, D. O. JF - Paper presented at the meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - CONF T1 - Equating the CAT-ASVAB: Issues and approach T2 - Paper presented at the meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Segall, D. O. A1 - Carter, G. JF - Paper presented at the meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - CONF T1 - Equating the computerized adaptive edition of the Differential Aptitude Tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education. San Francisco Y1 - 1995 A1 - J. R. McBride JF - Paper presented at the annual meeting of the National Council on Measurement in Education. San Francisco CY - CA ER - TY - CONF T1 - Estimation of item difficulty from restricted CAT calibration samples T2 - Paper presented at the annual conference of the National Council on Measurement in Education in San Francisco. Y1 - 1995 A1 - Sykes, R. A1 - Ito, K. JF - Paper presented at the annual conference of the National Council on Measurement in Education in San Francisco. ER - TY - ABST T1 - An evaluation of alternative concepts for administering the Armed Services Vocational Aptitude Battery to applicants for enlistment Y1 - 1995 A1 - Hogan, P.F. A1 - J. R. McBride A1 - Curran, L. T. CY - DMDC Technical Report 95-013. Monterey, CA: Personnel Testing Division, Defense Manpower Data Center ER - TY - CHAP T1 - From adaptive testing to automated scoring of architectural simulations Y1 - 1995 A1 - Bejar, I. I. CY - L. E. Mancall and P. G. Bashook (Eds.), Assessing clinical reasoning: The oral examination and alternative methods (pp. 115-130. Evanston IL: The American Board of Medical Specialities. ER - TY - CONF T1 - A global information approach to computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco CA ER - TY - BOOK T1 - Guidelines for computer-adaptive test development and use in education Y1 - 1995 A1 - American-Council-on-Education. CY - Washington DC: Author ER - TY - CHAP T1 - Improving individual differences measurement with item response theory and computerized adaptive testing Y1 - 1995 A1 - Weiss, D. J. CY - D. Lubinski and R. V. Dawis (Eds.), Assessing individual differences in human behavior: New concepts, methods, and findings (pp. 49-79). Palo Alto CA: Davies-Black Publishing. ER - TY - CHAP T1 - Individualized testing in the classroom Y1 - 1995 A1 - Linacre, J. M. CY - Anderson, L.W. (Ed.), International Encyclopedia of Teaching and Teacher Education. Oxford, New York, Tokyo: Elsevier Science 295-299. ER - TY - CONF T1 - The influence of examinee test-taking behavior motivation in computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Kim, J. A1 - McLean, J. E. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco CA N1 - (ERIC No. ED392839) ER - TY - ABST T1 - The introduction and comparability of the computer-adaptive GRE General Test (GRE Board Professional Report 88-08ap; Educational Testing Service Research Report 95-20) Y1 - 1995 A1 - Schaeffer, G. A. A1 - Steffen, M. Golub-Smith, M. L. A1 - Mills, C. N. A1 - Durso, R. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - An investigation of item calibration procedures for a computerized licensure examination T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Haynie, K.A. A1 - Way, W. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco, CA ER - TY - JOUR T1 - An investigation of procedures for computerized adaptive testing using the successive intervals Rasch model JF - Educational and Psychological Measurement Y1 - 1995 A1 - Koch, W. R. A1 - Dodd, B. G. VL - 55 ER - TY - BOOK T1 - Item equivalence from paper-and-pencil to computer adaptive testing Y1 - 1995 A1 - Chae, S. CY - Unpublished doctoral dissertation, University of Chicago N1 - #CH95-01 ER - TY - CONF T1 - Item exposure rates for unconstrained and content-balanced computerized adaptive tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Morrison, C. A1 - Subhiyah, R,, A1 - Nungester, R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - Item review and answer changing in computerized adaptive tests T2 - Paper presented at the Third European Conference on Psychological Assessment Y1 - 1995 A1 - Wise, S. L. JF - Paper presented at the Third European Conference on Psychological Assessment CY - Trier, Germany ER - TY - JOUR T1 - Item times in computerized testing: A new differential information JF - European Journal of Psychological Assessment Y1 - 1995 A1 - Hornke, L. F. VL - 11 (Suppl. 1) ER - TY - CONF T1 - New algorithms for item selection and exposure control with computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Davey, T. A1 - Parshall, C. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - New item exposure control algorithms for computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1995 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Minneapolis MN ER - TY - ABST T1 - A new method of controlling item exposure in computerized adaptive testing (Research Report 95-25) Y1 - 1995 A1 - Stocking, M. L. A1 - Lewis, C. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Practical issues in large-scale high-stakes computerized adaptive testing (Research Report 95-23) Y1 - 1995 A1 - Mills, C. N. A1 - Stocking, M. L. CY - Princeton, NJ: Educational Testing Service. N1 - #MI95-23 ER - TY - JOUR T1 - Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example JF - Applied Measurement in Education Y1 - 1995 A1 - Wainer, H., VL - 8 ER - TY - CONF T1 - Precision of ability estimation methods in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1995 A1 - Wang, T. A1 - Vispoel, W. P. JF - Paper presented at the annual meeting of the Psychometric Society CY - Minneapolis N1 - See APM article below.) ER - TY - BOOK T1 - The precision of ability estimation methods in computerized adaptive testing Y1 - 1995 A1 - Wang, T. CY - Unpublished doctoral dissertation, University of Iowa, Iowa City (UM No. 9945102) N1 - . ER - TY - CHAP T1 - Prerequisite relationships for the adaptive assessment of knowledge Y1 - 1995 A1 - Dowling, C. E. A1 - Kaluscha, R. CY - Greer, J. (Ed.) Proceedings of AIED'95, 7th World Conference on Artificial Intelligence in Education, Washington, DC, AACE 43-50. ER - TY - CONF T1 - Recursive maximum likelihood estimation, sequential designs, and computerized adaptive testing T2 - Paper presented at the Eleventh Workshop on Item Response Theory Y1 - 1995 A1 - Ying, Z. A1 - Chang, Hua-Hua JF - Paper presented at the Eleventh Workshop on Item Response Theory CY - University of Twente, the Netherlands ER - TY - JOUR T1 - Review of the book Computerized Adaptive Testing: A Primer JF - Psychometrika Y1 - 1995 A1 - Andrich, D. VL - 4? ER - TY - JOUR T1 - Shortfall of questions curbs use of computerized graduate exam JF - The Chronicle of Higher Education Y1 - 1995 A1 - Jacobson, R. L. ER - TY - ABST T1 - Some alternative CAT item selection heuristics (Internal report) Y1 - 1995 A1 - Luecht, RM CY - Philadelphia PA: National Board of Medical Examiners ER - TY - CONF T1 - Some new methods for content balancing adaptive tests T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1995 A1 - Segall, D. O. A1 - Davey, T. C. JF - Paper presented at the annual meeting of the Psychometric Society CY - Minneapolis MN ER - TY - JOUR T1 - A study of psychologically optimal level of item difficulty JF - Shinrigaku Kenkyu Y1 - 1995 A1 - Fujimori, S. KW - *Adaptation, Psychological KW - *Psychological Tests KW - Adult KW - Female KW - Humans KW - Male AB - For the purpose of selecting items in a test, this study presented a viewpoint of psychologically optimal difficulty level, as well as measurement efficiency, of items. A paper-and-pencil test (P & P) composed of hard, moderate and easy subtests was administered to 298 students at a university. A computerized adaptive test (CAT) was also administered to 79 students. The items of both tests were selected from Shiba's Word Meaning Comprehension Test, for which the estimates of parameters of two-parameter item response model were available. The results of P & P research showed that the psychologically optimal success level would be such that the proportion of right answers is somewhere between .75 and .85. A similar result was obtained from CAT research, where the proportion of about .8 might be desirable. Traditionally a success rate of .5 has been recommended in adaptive testing. In this study, however, it was suggested that the items of such level would be too hard psychologically for many examinees. VL - 65 SN - 0021-5236 (Print)0021-5236 (Linking) N1 - Fujimori, SClinical TrialControlled Clinical TrialEnglish AbstractJapanShinrigaku kenkyu : The Japanese journal of psychologyShinrigaku Kenkyu. 1995 Feb;65(6):446-53. ER - TY - CONF T1 - Tests adaptivos y autoadaptados informatizados: Effects en la ansiedad y en la pecision de las estimaciones [SATs and CATS: Effects on enxiety and estimate precision] T2 - Paper presented at the Fourth Symposium de Metodologia de las Ciencies del Comportamiento Y1 - 1995 A1 - Olea, J. A1 - Ponsoda, V. A1 - Wise, S. L. JF - Paper presented at the Fourth Symposium de Metodologia de las Ciencies del Comportamiento CY - Murcia, Spain ER - TY - JOUR T1 - Theoretical results and item selection from multidimensional item bank in the Mokken IRT model for polytomous items JF - Applied Psychological Measurement Y1 - 1995 A1 - Hemker, B. T. A1 - Sijtsma, K. A1 - Molenaar, I. W. VL - 19 ER - TY - ABST T1 - Using simulation to select an adaptive testing strategy: An item bank evaluation program Y1 - 1995 A1 - Hsu, T. C. A1 - Tseng, F. L. CY - Unpublished manuscript, University of Pittsburgh ER - TY - JOUR T1 - ADTEST: A computer-adaptive tests based on the maximum information principle JF - Educational and Psychological Measurement Y1 - 1994 A1 - Ponsoda, V. A1 - Olea, J., A1 - Revuelta, J. VL - 54 ER - TY - BOOK T1 - CAT software system [computer program Y1 - 1994 A1 - Gershon, R. C. CY - Chicago IL: Computer Adaptive Technologies ER - TY - ABST T1 - CAT-GATB simulation studies Y1 - 1994 A1 - Segall, D. O. CY - San Diego CA: Navy Personnel Research and Development Center ER - TY - CONF T1 - Comparing computerized adaptive and self-adapted tests: The influence of examinee achievement locus of control T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Wise, S. L. A1 - Roos, L. L. A1 - Plake, B. S. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - JOUR T1 - A Comparison of Item Calibration Media in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 1994 A1 - Hetter, R. D. A1 - Segall, D. O. A1 - Bloxom, B. M. VL - 18 IS - 3 ER - TY - JOUR T1 - A comparison of item calibration media in computerized adaptive tests JF - Applied Psychological Measurement Y1 - 1994 A1 - Hetter, R. D. A1 - Segall, D. O. A1 - Bloxom, B. M. VL - 18 ER - TY - JOUR T1 - Computer adaptive testing JF - International journal of Educational Research Y1 - 1994 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. A1 - Gershon, R. C. VL - 6 ER - TY - JOUR T1 - Computer adaptive testing: A shift in the evaluation paradigm JF - Educational Technology Systems Y1 - 1994 A1 - Carlson, R. VL - 22 (3) ER - TY - JOUR T1 - Computer adaptive testing: Assessment of the future JF - Curriculum/Technology Quarterly Y1 - 1994 A1 - Diones, R. A1 - Everson, H. VL - 4 (2) ER - TY - CONF T1 - Computerized adaptive testing exploring examinee response time using hierarchical linear modeling T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1994 A1 - Bergstrom, B. A1 - Gershon, R. C. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - ERIC No. ED 400 286). ER - TY - JOUR T1 - Computerized adaptive testing for licensure and certification JF - CLEAR Exam Review Y1 - 1994 A1 - Bergstrom, Betty A. A1 - Gershon, R. C. VL - Winter 1994 ER - TY - JOUR T1 - Computerized adaptive testing: Revolutionizing academic assessment JF - Community College Journal Y1 - 1994 A1 - Smittle, P. VL - 65 (1) ER - TY - ABST T1 - Computerized mastery testing using fuzzy set decision theory (Research Report 94-37) Y1 - 1994 A1 - Du, Y. A1 - Lewis, C. A1 - Pashley, P. J. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Computerized Testing (Research Report 94-22). Y1 - 1994 A1 - Oltman, P. K. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Computerized-adaptive and self-adapted music-listening tests: Features and motivational benefits JF - Applied Measurement in Education Y1 - 1994 A1 - Vispoel, W. P., A1 - Coffman, D. D. VL - 7 ER - TY - ABST T1 - DIF analysis for pretest items in computer-adaptive testing (Educational Testing Service Research Rep No RR 94-33) Y1 - 1994 A1 - Zwick, R. A1 - Thayer, D. T. A1 - Wingersky, M. CY - Princeton NJ: Educational Testing Service. N1 - #ZW94-33 ER - TY - CONF T1 - Early psychometric research in the CAT-ASVAB Project T2 - Paper presented at the 102nd Annual Convention of the American Psychological Association. Los Angeles Y1 - 1994 A1 - J. R. McBride JF - Paper presented at the 102nd Annual Convention of the American Psychological Association. Los Angeles CY - CA ER - TY - CONF T1 - The effect of restricting ability distributions in the estimation of item difficulties: Implications for a CAT implementation T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Ito, K. A1 - Sykes, R.C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans ER - TY - JOUR T1 - The effect of review on the psychometric characteristics of computerized adaptive tests JF - Applied Measurement in Education Y1 - 1994 A1 - Lunz, M. E. A1 - Stone, G. E. VL - 7(3) ER - TY - JOUR T1 - The effect of review on the psychometric characteristics of computerized adaptive tests JF - Applied Measurement in Education Y1 - 1994 A1 - Stone, G. E. A1 - Lunz, M. E. VL - 7 ER - TY - JOUR T1 - The effect of review on the psychometric characterstics of computerized adaptive tests JF - Applied Measurement in Education Y1 - 1994 A1 - Stone, G. E. A1 - Lunz, M. E. AB - Explored the effect of reviewing items and altering responses on examinee ability estimates, test precision, test information, decision confidence, and pass/fail status for computerized adaptive tests. Two different populations of examinees took different computerized certification examinations. For purposes of analysis, each population was divided into 3 ability groups (high, medium, and low). Ability measures before and after review were highly correlated, but slightly lower decision confidence was found after review. Pass/fail status was most affected for examinees with estimates close to the pass point. Decisions remained the same for 94% of the examinees. Test precision is only slightly affected by review, and the average information loss can be recovered by the addition of one item. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 7 N1 - Lawrence Erlbaum, US ER - TY - BOOK T1 - Effects of computerized adaptive test anxiety on nursing licensure examinations Y1 - 1994 A1 - Arrowwood, V. E. CY - Dissertation Abstracts International, A (Humanities and Social Sciences), 54 (9-A), 3410 ER - TY - CONF T1 - The effects of item pool depth on the accuracy of pass/fail decisions for NCLEX using CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Haynie, K.A. A1 - Way, W. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans ER - TY - JOUR T1 - An empirical study of computerized adaptive test administration conditions JF - Journal of Educational Measurement Y1 - 1994 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. VL - 31 ER - TY - CHAP T1 - The equivalence of Rasch item calibrations and ability estimates across modes of administration T2 - Objective measurement: Theory into practice Y1 - 1994 A1 - Bergstrom, Betty A. A1 - Lunz, M. E. KW - computerized adaptive testing JF - Objective measurement: Theory into practice PB - Ablex Publishing Co. CY - Norwood, N.J. USA VL - 2 ER - TY - CONF T1 - Establishing the comparability of the NCLEX using CAT with traditional NCLEX examinations T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Eignor, D. R. A1 - Way, W. D. A1 - Amoss, K.E. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, LA ER - TY - CONF T1 - Evaluation and implementation of CAT-ASVAB T2 - Paper presented at the annual meeting of the American Psychological Association Y1 - 1994 A1 - Curran, L. T. A1 - Wise, L. L. JF - Paper presented at the annual meeting of the American Psychological Association CY - Los Angeles ER - TY - BOOK T1 - The exploration of an alternative method for scoring computer adaptive tests Y1 - 1994 A1 - Potenza, M. CY - Unpublished doctoral dissertation, Lincoln NE: University of Nebraska ER - TY - CONF T1 - A few more issues to consider in multidimensional computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1994 A1 - Luecht, RM JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - JOUR T1 - A general approach to algorithmic design of fixed-form tests, adaptive tests, and testlets JF - Applied Psychological Measurement Y1 - 1994 A1 - Berger, M. P. F. VL - 18 ER - TY - JOUR T1 - A General Approach to Algorithmic Design of Fixed-Form Tests, Adaptive Tests, and Testlets JF - Applied Psychological Measurement Y1 - 1994 A1 - Berger, M. P. F. VL - 18 IS - 2 ER - TY - CONF T1 - The historical developments of fit and its assessment in the computerized adaptive testing environment T2 - Midwestern Education Research Association annual meeting Y1 - 1994 A1 - Stone, G. E. JF - Midwestern Education Research Association annual meeting CY - Chicago, IL USA ER - TY - JOUR T1 - The incomplete equivalence of the paper-and-pencil and computerized versions of the General Aptitude Test Battery JF - Journal of Applied Psychology Y1 - 1994 A1 - Van de Vijver, F. J. R., A1 - Harsveld, M. VL - 79 ER - TY - JOUR T1 - Individual differences and test administration procedures: A comparison of fixed-item, computerized adaptive, and self-adapted testing JF - Applied Measurement in Education Y1 - 1994 A1 - Vispoel, W. P. A1 - Rocklin, T. R. A1 - Wang, T. VL - 7 ER - TY - CONF T1 - Item calibration considerations: A comparison of item calibrations on written and computerized adaptive examinations T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1994 A1 - Stone, G. E. A1 - Lunz, M. E. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - ABST T1 - La simulation de modèle sur ordinateur en tant que méthode de recherche : le cas concret de l’étude de la distribution d’échantillonnage de l’estimateur du niveau d’habileté en testing adaptatif en fonction de deux règles d’arrêt Y1 - 1994 A1 - Raîche, G. CY - Actes du 6e colloque de l‘Association pour la recherche au collégial. Montréal : Association pour la recherche au collégial, ARC ER - TY - CONF T1 - L'évaluation nationale individualisée et assistée par ordinateur [Large scale assessment: Tailored and computerized] T2 - Québec: Proceeding of the 14th Congress of the Association québécoise de pédagogie collégiale. Montréal: Association québécoise de pédagogie collégiale (AQPC). Y1 - 1994 A1 - Raîche, G. A1 - Béland, A. JF - Québec: Proceeding of the 14th Congress of the Association québécoise de pédagogie collégiale. Montréal: Association québécoise de pédagogie collégiale (AQPC). ER - TY - JOUR T1 - Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing JF - Dissertation Abstracts International Section A: Humanities & Social Sciences Y1 - 1994 A1 - Kim, H-O. KW - computerized adaptive testing VL - 54 ER - TY - CONF T1 - Pinpointing PRAXIS I CAT characteristics through simulation procedures T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Eignor, D. R. A1 - Folk, V.G., A1 - Li, M.-Y. A1 - Stocking, M. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, LA ER - TY - JOUR T1 - The psychological impacts of computerized adaptive testing methods JF - Educational Technology Y1 - 1994 A1 - Powell, Z. E. VL - 34 ER - TY - JOUR T1 - The relationship between examinee anxiety and preference for self-adapted testing JF - Applied Measurement in Education Y1 - 1994 A1 - Wise, S. L. A1 - Roos, L. L. A1 - Plake, B. S., A1 - Nebelsick-Gullett, L. J. VL - 7 ER - TY - CHAP T1 - Reliability of alternate computer adaptive tests T2 - Objective measurement, theory into practice Y1 - 1994 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. A1 - Wright, B. D. JF - Objective measurement, theory into practice PB - Ablex CY - New Jersey VL - II ER - TY - CONF T1 - The selection of test items for decision making with a computer adaptive test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Reckase, M. D. A1 - Spray, J. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - CONF T1 - The selection of test items for decision making with a computer adaptive test T2 - Paper presented at the national meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Spray, J. A. A1 - Reckase, M. D. JF - Paper presented at the national meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - #SP94-01 ER - TY - JOUR T1 - Self-adapted testing JF - Applied Measurement in Education Y1 - 1994 A1 - Rocklin, T. R. VL - 7 ER - TY - ABST T1 - A simple and fast item selection procedure for adaptive testing Y1 - 1994 A1 - Veerkamp, W. J. J. CY - Research (Report 94-13). University of Twente. ER - TY - JOUR T1 - A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests JF - Applied Psychological Measurement Y1 - 1994 A1 - Zwick, R. A1 - Thayer, D. T. A1 - Wingersky, M. VL - 18 ER - TY - ABST T1 - A simulation study of the Mantel-Haenszel procedure for detecting DIF with the NCLEX using CAT (Technical Report xx-xx) Y1 - 1994 A1 - Way, W. D. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Some new item selection criteria for adaptive testing (Research Rep 94-6) Y1 - 1994 A1 - Veerkamp, W. J. A1 - Berger, M. P. F. CY - Enschede, The Netherlands: University of Twente, Department of Educational Measurement and Data Analysis. ER - TY - ABST T1 - Three practical issues for modern adaptive testing item pools (Research Report 94-5), Y1 - 1994 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Understanding self-adapted testing: The perceived control hypothesis JF - Applied Measurement in Education Y1 - 1994 A1 - Wise, S. L. VL - 7 ER - TY - CHAP T1 - Utilisation de la simulation en tant que méthodologie de recherche [Simulation methodology in research] Y1 - 1994 A1 - Raîche, G. CY - Association pour la recherche au collégial (Ed.) : L'en-quête de la créativité [In quest of creativity]. Proceeding of the 6th Congress of the ARC. Montréal: Association pour la recherche au collégial (ARC). ER - TY - JOUR T1 - The application of an automated item selection method to real data JF - Applied Psychological Measurement Y1 - 1993 A1 - Stocking, M. L. A1 - Swanson, L. A1 - Pearlman, M. VL - 17 ER - TY - JOUR T1 - An application of Computerized Adaptive Testing to the Test of English as a Foreign Language JF - Dissertation Abstracts International Y1 - 1993 A1 - Moon, O. KW - computerized adaptive testing VL - 53 ER - TY - JOUR T1 - Assessing the utility of item response models: computerized adaptive testing JF - Educational Measurement: Issues and Practice Y1 - 1993 A1 - Kingsbury, G. G. A1 - Houser, R.L. KW - computerized adaptive testing VL - 12 ER - TY - ABST T1 - Case studies in computer adaptive test design through simulation (Research Report RR-93-56) Y1 - 1993 A1 - Eignor, D. R. A1 - Stocking, M. L. A1 - Way, W. D. A1 - Steffen, M. CY - Princeton NJ: Educational Testing Service N1 - #EI93-56 (also presented at the 1993 National Council on Measurement in Education meeting in Atlanta GA) ER - TY - ABST T1 - Case studies in computerized adaptive test design through simulation (Research Report 93-56) Y1 - 1993 A1 - Eignor, D. R. A1 - Way, W. D. A1 - Stocking, M. A1 - Steffen, M. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Comparability and validity of computerized adaptive testing with the MMPI-2 JF - Dissertation Abstracts International Y1 - 1993 A1 - Roper, B. L. KW - computerized adaptive testing VL - 53 ER - TY - BOOK T1 - A comparison of computer adaptive test administration methods Y1 - 1993 A1 - Dolan, S. CY - Unpublished doctoral dissertation, University of Chicago ER - TY - CONF T1 - Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using an adaptive test T2 - Unpublished manuscript. ( Y1 - 1993 A1 - Spray, J. A. A1 - Reckase, M. D. JF - Unpublished manuscript. ( ER - TY - JOUR T1 - Computer adaptive testing: A comparison of four item selection strategies when used with the golden section search strategy for estimating ability JF - Dissertation Abstracts International Y1 - 1993 A1 - Carlson, R. D. KW - computerized adaptive testing VL - 54 ER - TY - JOUR T1 - Computer adaptive testing: A new era JF - Journal of Developmental Education Y1 - 1993 A1 - Smittle, P. VL - 17 (1) ER - TY - JOUR T1 - Computerized adaptive and fixed-item versions of the ITED Vocabulary test JF - Educational and Psychological Measurement Y1 - 1993 A1 - Vispoel, W. P. VL - 53 ER - TY - CONF T1 - Computerized adaptive testing in computer science: assessing student programming abilities T2 - Proceedings of the twenty-fourth SIGCSE Technical Symposium on Computer Science Education Y1 - 1993 A1 - Syang, A. A1 - Dale, N.B. JF - Proceedings of the twenty-fourth SIGCSE Technical Symposium on Computer Science Education CY - Indianapolis IN ER - TY - JOUR T1 - Computerized adaptive testing in instructional settings JF - Educational Technology Research and Development Y1 - 1993 A1 - Welch, R. E., A1 - Frick, T. VL - 41(3) ER - TY - BOOK T1 - Computerized adaptive testing strategies: Golden section search, dichotomous search, and Z-score strategies (Doctoral dissertation, Iowa State University, 1990) Y1 - 1993 A1 - Xiao, B. CY - Dissertation Abstracts International, 54-03B, 1720 ER - TY - JOUR T1 - Computerized adaptive testing: the future is upon us JF - Nurs Health Care Y1 - 1993 A1 - Halkitis, P. N. A1 - Leahy, J. M. KW - *Computer-Assisted Instruction KW - *Education, Nursing KW - *Educational Measurement KW - *Reaction Time KW - Humans KW - Pharmacology/education KW - Psychometrics VL - 14 SN - 0276-5284 (Print) N1 - Halkitis, P NLeahy, J MUnited statesNursing & health care : official publication of the National League for NursingNurs Health Care. 1993 Sep;14(7):378-85. ER - TY - JOUR T1 - Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules JF - Educational and Psychological Measurement Y1 - 1993 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J., AB - Simulated datasets were used to research the effects of the systematic variation of three major variables on the performance of computerized adaptive testing (CAT) procedures for the partial credit model. The three variables studied were the stopping rule for terminating the CATs, item pool size, and the distribution of the difficulty of the items in the pool. Results indicated that the standard error stopping rule performed better across the variety of CAT conditions than the minimum information stopping rule. In addition it was found that item pools that consisted of as few as 30 items were adequate for CAT provided that the item pool was of medium difficulty. The implications of these findings for implementing CAT systems based on the partial credit model are discussed. VL - 53 ER - TY - JOUR T1 - Computerized mastery testing using fuzzy set decision theory JF - Applied Measurement in Education Y1 - 1993 A1 - Du, Y. A1 - Lewis, C. A1 - Pashley, P. J. VL - 6 N1 - (Also Educational Testing Service Research Report 94-37) ER - TY - ABST T1 - Controlling item exposure rates in a realistic adaptive testing paradigm (Research Report 93-2) Y1 - 1993 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Deriving comparable scores for computer adaptive and conventional tests: An example using the SAT Y1 - 1993 A1 - Eignor, D. R. CY - Princeton NJ: Educational Testing Service N1 - #EI93-55 (Also presented at the 1993 National Council on Measurement in Education meeting in Atlanta GA.) ER - TY - JOUR T1 - The development and evaluation of a computerized adaptive test of tonal memory JF - Journal of Research in Music Education Y1 - 1993 A1 - Vispoel, W. P. VL - 41 ER - TY - CONF T1 - The efficiency, reliability, and concurrent validity of adaptive and fixed-item tests of music listening skills T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1993 A1 - Vispoel, W. P. A1 - Wang, T. A1 - Bleiler, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Atlanta GA ER - TY - CONF T1 - Establishing time limits for the GRE computer adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1993 A1 - Reese, C. M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Atlanta GA ER - TY - ABST T1 - Field test of a computer-based GRE general test (GRE Board Technical Report 88-8; Educational Testing Service Research Rep No RR 93-07) Y1 - 1993 A1 - Schaeffer, G. A. A1 - Reese, C. M. A1 - Steffen, M. A1 - McKinley, R. L. A1 - Mills, C. N. CY - Princeton NJ: Educational Testing Service. ER - TY - CONF T1 - Individual differences and test administration procedures: A comparison of fixed-item, adaptive, and self-adapted testing T2 - Paper presented at the annual meeting of the AEARA Y1 - 1993 A1 - Vispoel, W. P. A1 - Rocklin, T. R. JF - Paper presented at the annual meeting of the AEARA CY - Atlanta GA ER - TY - CONF T1 - Individual differences in computerized adaptive testing T2 - Paper presented at the annual meeting of the Mid-South Educational Research Association Y1 - 1993 A1 - Kim, J. JF - Paper presented at the annual meeting of the Mid-South Educational Research Association CY - New Orleans LA ER - TY - ABST T1 - Introduction of a computer adaptive GRE General test (Research Report 93-57) Y1 - 1993 A1 - Schaeffer, G. A. A1 - Steffen, M. A1 - Golub-Smith, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - An investigation of restricted self-adapted testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1993 A1 - Wise, S. L. A1 - Kingsbury, G. G. A1 - Houser, R.L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Atlanta GA ER - TY - ABST T1 - Item Calibration: Medium-of-administration effect on computerized adaptive scores (TR-93-9) Y1 - 1993 A1 - Hetter, R. D. A1 - Bloxom, B. M. A1 - Segall, D. O. CY - Navy Personnel Research and Development Center ER - TY - ABST T1 - Les tests adaptatifs en langue seconde Y1 - 1993 A1 - Laurier, M. CY - Communication lors de la 16e session d’étude de l’ADMÉÉ à Laval. Montréal: Association pour le développement de la mesure et de l’évaluation en éducation. ER - TY - JOUR T1 - Linking the standard and advanced forms of the Ravens Progressive Matrices in both the paper-and-pencil and computer-adaptive-testing formats JF - Educational and Psychological Measurement Y1 - 1993 A1 - Styles, I. A1 - Andrich, D. VL - 53 ER - TY - JOUR T1 - A Method for Severely Constrained Item Selection in Adaptive Testing JF - Applied Psychological Measurement Y1 - 1993 A1 - Stocking, M. L. A1 - Swanson, L. VL - 17 IS - 3 ER - TY - JOUR T1 - A method for severely constrained item selection in adaptive testing JF - Applied Psychological Measurement Y1 - 1993 A1 - Stocking, M., A1 - Swanson, L. VL - 17 ER - TY - JOUR T1 - A model and heuristic for solving very large item selection problems JF - Applied Psychological Measurement Y1 - 1993 A1 - Stocking, M., A1 - Swanson, L. VL - 17 ER - TY - CONF T1 - Modern computerized adaptive testing T2 - Paper presented at the Joint Statistics and Psychometric Seminar Y1 - 1993 A1 - Stocking, M. L. JF - Paper presented at the Joint Statistics and Psychometric Seminar CY - Princeton NJ ER - TY - CONF T1 - Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing T2 - Paper presented at the meeting of the National Council on Measurement in Education Y1 - 1993 A1 - Kim, H. A1 - Plake, B. S. JF - Paper presented at the meeting of the National Council on Measurement in Education CY - Atlanta, GA ER - TY - JOUR T1 - Moving in a new direction: Computerized adaptive testing (CAT) JF - Nursing Management Y1 - 1993 A1 - Jones-Dickson, C. A1 - Dorsey, D. A1 - Campbell-Warnock, J. A1 - Fields, F. KW - *Computers KW - Accreditation/methods KW - Educational Measurement/*methods KW - Licensure, Nursing KW - United States VL - 24 SN - 0744-6314 (Print) N1 - Jones-Dickson, CDorsey, DCampbell-Warnock, JFields, FUnited statesNursing managementNurs Manage. 1993 Jan;24(1):80, 82. ER - TY - ABST T1 - Multiple-category classification using a sequential probability ratio test (Research report 93-7) Y1 - 1993 A1 - Spray, J. A. CY - Iowa City: American College Testing. N1 - #SP93-7 ER - TY - NEWS T1 - New computer technique seen producing a revolution in testing T2 - The Chronicle of Higher Education Y1 - 1993 A1 - Jacobson, R. L. JF - The Chronicle of Higher Education VL - 40 ER - TY - CONF T1 - A practical examination of the use of free-response questions in computerized adaptive testing T2 - Paper presented to the annual meeting of the American Educational Research Association: Atlanta GA. Y1 - 1993 A1 - Kingsbury, G. G. A1 - Houser, R.L. JF - Paper presented to the annual meeting of the American Educational Research Association: Atlanta GA. N1 - {PDF file, 30 KB} ER - TY - CONF T1 - A simulated comparison of testlets and a content balancing procedure for an adaptive certification examination T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1993 A1 - Reshetar, R. A. A1 - Norcini, J. J. A1 - Shea, J. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Atlanta ER - TY - CONF T1 - A simulated comparison of two content balancing and maximum information item selection procedures for an adaptive certification examination T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1993 A1 - Reshetar, R. A. A1 - Norcini, J. J. A1 - Shea, J. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Atlanta ER - TY - ABST T1 - A simulation study of methods for assessing differential item functioning in computer-adaptive tests (Educational Testing Service Research Rep No RR 93-11) Y1 - 1993 A1 - Zwick, R. A1 - Thayer, D. A1 - Wingersky, M. CY - Princeton NJ: Educational Testing Service. ER - TY - CONF T1 - Some initial experiments with adaptive survey designs for structured questionnaires T2 - Paper presented at the New Methods and Applications in Consumer Research Conference Y1 - 1993 A1 - Singh, J. JF - Paper presented at the New Methods and Applications in Consumer Research Conference CY - Cambridge MA ER - TY - JOUR T1 - Some practical considerations when converting a linearly administered test to an adaptive format JF - Educational Measurement: Issues and Practice Y1 - 1993 A1 - Wainer, H., VL - 12 (1) ER - TY - CONF T1 - Test targeting and precision before and after review on computer-adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1993 A1 - Lunz, M. E. A1 - Stahl, J. A. A1 - Bergstrom, Betty A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Atlanta GA ER - TY - CHAP T1 - Un test adaptatif en langue seconde : la perception des apprenants Y1 - 1993 A1 - Laurier, M. CY - R.Hivon (Éd.),L’évaluation des apprentissages. Sherbrooke : Éditions du CRP. ER - TY - CONF T1 - Ability measure equivalence of computer adaptive and paper and pencil tests: A research synthesis T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - JOUR T1 - Altering the level of difficulty in computer adaptive testing JF - Applied Measurement in Education Y1 - 1992 A1 - Bergstrom, Betty A. A1 - Lunz, M. E. A1 - Gershon, R. C. KW - computerized adaptive testing AB - Examines the effect of altering test difficulty on examinee ability measures and test length in a computer adaptive test. The 225 Ss were randomly assigned to 3 test difficulty conditions and given a variable length computer adaptive test. Examinees in the hard, medium, and easy test condition took a test targeted at the 50%, 60%, or 70% probability of correct response. The results show that altering the probability of a correct response does not affect estimation of examinee ability and that taking an easier computer adaptive test only slightly increases the number of items necessary to reach specified levels of precision. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 5 N1 - Lawrence Erlbaum, US ER - TY - JOUR T1 - The application of latent class models in adaptive testing JF - Psychometrika Y1 - 1992 A1 - Macready, G. B. A1 - Dayton, C. M. VL - 57 ER - TY - CONF T1 - Assessing existing item bank depth for computer adaptive testing T2 - ERIC Document No. TM022404 Y1 - 1992 A1 - Bergstrom, Betty A. A1 - Stahl, J. A. JF - ERIC Document No. TM022404 ER - TY - JOUR T1 - CAT-ASVAB precision JF - Proceedings of the 34th Annual Conference of the Military Testing Association Y1 - 1992 A1 - Moreno, K. E., A1 - Segall, D. O. VL - 1 ER - TY - CONF T1 - A comparison of computerized adaptive and paper-and-pencil versions of the national registered nurse licensure examination T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - A Zara JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - Comparison of item targeting strategies for pass/fail adaptive tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Bergstrom, B. A1 - Gershon, R. C. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA N1 - (ERIC NO. ED 400 287). ER - TY - BOOK T1 - A comparison of methods for adaptive estimation of a multidimensional trait Y1 - 1992 A1 - Tam, S. S. CY - Unpublished doctoral dissertation, Columbia University ER - TY - JOUR T1 - A comparison of self-adapted and computerized adaptive achievement tests JF - Journal of Educational Measurement Y1 - 1992 A1 - Wise, S. L. A1 - Plake, S. S A1 - Johnson, P. L. A1 - Roos, S. L. VL - 29 ER - TY - JOUR T1 - A comparison of the partial credit and graded response models in computerized adaptive testing JF - Applied Measurement in Education Y1 - 1992 A1 - De Ayala, R. J. A1 - Dodd, B. G. A1 - Koch, W. R. VL - 5 ER - TY - JOUR T1 - A comparison of the performance of simulated hierarchical and linear testlets JF - Journal of Educational Measurement Y1 - 1992 A1 - Wainer, H., A1 - Kaplan, B. A1 - Lewis, C. VL - 29 ER - TY - BOOK T1 - Computer adaptive versus paper-and-pencil tests Y1 - 1992 A1 - Bergstrom, B. CY - Unpublished doctoral dissertation, University of Chicago ER - TY - JOUR T1 - Computer-based adaptive testing in music research and instruction JF - Psychomusicology Y1 - 1992 A1 - Bowers, D. R. VL - 10 ER - TY - ABST T1 - Computerized adaptive assessment of cognitive abilities among disabled adults Y1 - 1992 A1 - Engdahl, B. CY - ERIC Document No ED301274 ER - TY - JOUR T1 - Computerized adaptive mastery tests as expert systems JF - Journal of Educational Computing Research Y1 - 1992 A1 - Frick, T. W. VL - 8 ER - TY - JOUR T1 - Computerized adaptive mastery tests as expert systems JF - Journal of Educational Computing Research Y1 - 1992 A1 - Frick, T. W. VL - 8(2) ER - TY - JOUR T1 - Computerized adaptive testing for NCLEX-PN JF - Journal of Practical Nursing Y1 - 1992 A1 - Fields, F. A. KW - *Licensure KW - *Programmed Instruction KW - Educational Measurement/*methods KW - Humans KW - Nursing, Practical/*education VL - 42 SN - 0022-3867 (Print) N1 - Fields, F AUnited statesThe Journal of practical nursingJ Pract Nurs. 1992 Jun;42(2):8-10. ER - TY - JOUR T1 - Computerized adaptive testing: Its potential substantive contribution to psychological research and assessment JF - Current Directions in Psychological Science Y1 - 1992 A1 - Embretson, S. E. VL - 1 ER - TY - JOUR T1 - Computerized adaptive testing of music-related skills JF - Bulletin of the Council for Research in Music Education Y1 - 1992 A1 - Vispoel, W. P., A1 - Coffman, D. D. VL - 112 ER - TY - JOUR T1 - Computerized adaptive testing with different groups JF - Educational Measurement: Issues and Practice Y1 - 1992 A1 - Legg, S. M., A1 - Buhr, D. C. VL - 11 (2) ER - TY - CONF T1 - Computerized adaptive testing with the MMPI-2: Reliability, validity, and comparability to paper and pencil administration T2 - Paper presented at the 27th Annual Symposium on Recent Developments in the MMPI/MMPI-2 Y1 - 1992 A1 - Ben-Porath, Y. S. A1 - Roper, B. L. JF - Paper presented at the 27th Annual Symposium on Recent Developments in the MMPI/MMPI-2 CY - Minneapolis MN ER - TY - JOUR T1 - Computerized Mastery Testing With Nonequivalent Testlets JF - Applied Psychological Measurement Y1 - 1992 A1 - Sheehan, K. A1 - Lewis, C. VL - 16 IS - 1 ER - TY - JOUR T1 - Computerized mastery testing with nonequivalent testlets JF - Applied Psychological Measurement Y1 - 1992 A1 - Sheehan, K., A1 - Lewis, C. VL - 16 ER - TY - JOUR T1 - Confidence in pass/fail decisions for computer adaptive and paper and pencil examinations JF - Evaluation and the Health Professions Y1 - 1992 A1 - Bergstrom, Betty A. A1 - Lunz, M. E. AB - Compared the level of confidence in pass/fail decisions obtained with computer adaptive tests (CADTs) and pencil-and-paper tests (PPTs). 600 medical technology students took a variable-length CADT and 2 fixed-length PPTs. The CADT was stopped when the examinee ability estimate was either 1.3 times the standard error of measurement above or below the pass/fail point or when a maximum test length was reached. Results show that greater confidence in the accuracy of the pass/fail decisions was obtained for more examinees when the CADT implemented a 90% confidence stopping rule than with PPTs of comparable test length. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 15 N1 - Sage Publications, US ER - TY - JOUR T1 - Confidence in pass/fail decisions for computer adaptive and paper and pencil examinations JF - Evaluation and The Health Professions Y1 - 1992 A1 - Bergstrom, Betty A. VL - 15(4) ER - TY - JOUR T1 - Confidence in pass/fail decisions for computer adaptive and paper and pencil examinations JF - Evaluation and the Health Professions Y1 - 1992 A1 - Bergstrom, Betty A. VL - 15 IS - 4 ER - TY - JOUR T1 - The development and evaluation of a system for computerized adaptive testing JF - Dissertation Abstracts International Y1 - 1992 A1 - de la Torre Sanchez, R. KW - computerized adaptive testing VL - 52 ER - TY - CHAP T1 - The development of alternative operational concepts Y1 - 1992 A1 - J. R. McBride A1 - Curran, L. T. CY - Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center. ER - TY - CONF T1 - Differential item functioning analysis for computer-adaptive tests and other IRT-scored measures T2 - Paper presented at the annual meeting of the Military Testing Association Y1 - 1992 A1 - Zwick, R. JF - Paper presented at the annual meeting of the Military Testing Association CY - San Diego CA ER - TY - JOUR T1 - The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests JF - Applied Psychological Measurement Y1 - 1992 A1 - Lunz, M. E. A1 - Berstrom, B.A. A1 - Wright, B. D. VL - 16 IS - 1 ER - TY - JOUR T1 - The effect of review on student ability and test efficiency for computerized adaptive tests JF - Applied Psychological Measurement Y1 - 1992 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. A1 - Wright, Benjamin D. AB - 220 students were randomly assigned to a review condition for a medical technology test; their test instructions indicated that each item must be answered when presented, but that the responses could be reviewed and altered at the end of the test. A sample of 492 students did not have the opportunity to review and alter responses. Within the review condition, examinee ability estimates before and after review were correlated .98. The average efficiency of the test was decreased by 1% after review. Approximately 32% of the examinees improved their ability estimates after review but did not change their pass/fail status. Disallowing review on adaptive tests administered under these rules is not supported by these data. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 16 N1 - Sage Publications, US ER - TY - CONF T1 - Effects of feedback during self-adapted testing on estimates of ability T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Holst, P. M. A1 - O’Donnell, A. M. A1 - Rocklin, T. R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - CONF T1 - The effects of feedback in computerized adaptive and self-adapted tests T2 - Paper presented at the annual meeting of the NMCE Y1 - 1992 A1 - Roos, L. L. A1 - Plake, B. S. A1 - Wise, S. L. JF - Paper presented at the annual meeting of the NMCE CY - San Francisco ER - TY - CONF T1 - Estimation of ability level by using only observable quantities in adaptive testing T2 - Paper presented at the annual meeting if the American Educational Research Association Y1 - 1992 A1 - Kirisci, L. JF - Paper presented at the annual meeting if the American Educational Research Association CY - Chicago ER - TY - CHAP T1 - Evaluation of alternative operational concepts Y1 - 1992 A1 - J. R. McBride A1 - Hogan, P.F. CY - Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center. ER - TY - ABST T1 - A general Bayesian model for testlets: theory and applications (Research Report 92-21; GRE Board Professional Report No 99-01P) Y1 - 1992 A1 - Wang, X A1 - Bradlow, E. T. A1 - Wainer, H., CY - Princeton NJ: Educational Testing Service. ER - TY - CONF T1 - How review options and administration mode influence scores on computerized vocabulary tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1992 A1 - Vispoel, W. P. A1 - Wang, T. A1 - De la Torre, R. A1 - Bleiler, T. A1 - Dings, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco CA N1 - #VI92-01 ER - TY - JOUR T1 - Improving the measurement of tonal memory with computerized adaptive tests JF - Psychomusicology Y1 - 1992 A1 - Vispoel, W. P. VL - 11 ER - TY - CONF T1 - Incorporating post-administration item response revision into a CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Wang, M. A1 - Wingersky, M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - JOUR T1 - The influence of dimensionality on CAT ability estimation JF - Educational and Psychological Measurement Y1 - 1992 A1 - De Ayala, R. J., VL - 52 ER - TY - JOUR T1 - Item selection using an average growth approximation of target information functions JF - Applied Psychological Measurement Y1 - 1992 A1 - Luecht, RM A1 - Hirsch, T. M. VL - 16 ER - TY - ABST T1 - The Language Training Division's computer adaptive reading proficiency test Y1 - 1992 A1 - Janczewski, D. A1 - Lowe, P. CY - Provo, UT: Language Training Division, Office of Training and Education ER - TY - JOUR T1 - Le testing adaptatif avec interprétation critérielle, une expérience de praticabilité du TAM pour l’évaluation sommative des apprentissages au Québec. JF - Mesure et évaluation en éducation Y1 - 1992 A1 - Auger, R. ED - Seguin, S. P. VL - 15-1 et 2 ER - TY - BOOK T1 - Manual for the General Scholastic Aptitude Test (Senior) Computerized adaptive test Y1 - 1992 A1 - Von Tonder, M. A1 - Claasswn, N. C. W. CY - Pretoria: Human Sciences Research Council ER - TY - ABST T1 - A method for severely constrained item selection in adaptive testing Y1 - 1992 A1 - Stocking, M. L. A1 - Swanson, L. CY - Educational Testing Service Research Report (RR-92-37): Princeton NJ ER - TY - CONF T1 - Multidimensional CAT simulation study Y1 - 1992 A1 - Luecht, RM ER - TY - JOUR T1 - The Nominal Response Model in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 1992 A1 - De Ayals, R.J. VL - 15 ER - TY - JOUR T1 - The nominal response model in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1992 A1 - De Ayala, R. J., VL - 16 ER - TY - CONF T1 - Practical considerations for conducting studies of differential item functioning (DIF) in a CAT environment T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Miller, T. R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - Scaling of two-stage adaptive test configurations for achievement testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1992 A1 - Hendrickson, A. B. A1 - Kolen, M. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - ABST T1 - Some practical considerations when converting a linearly administered test to an adaptive format (Research Report 92-21 or 13?) Y1 - 1992 A1 - Wainer, H., CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Student attitudes toward computer-adaptive test administration T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Baghi, H A1 - Ferrara, S. F A1 - Gabrys, R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - Test anxiety and test performance under computerized adaptive testing methods T2 - Richmond IN: Indiana University. (ERIC Document Reproduction Service No. ED 334910 and/or TM018223). Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Powell, Z. E. JF - Richmond IN: Indiana University. (ERIC Document Reproduction Service No. ED 334910 and/or TM018223). Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - JOUR T1 - Test anxiety and test performance under computerized adaptive testing methods JF - Dissertation Abstracts International Y1 - 1992 A1 - Powell, Zen-Hsiu E. KW - computerized adaptive testing VL - 52 ER - TY - ABST T1 - An analysis of CAT-ASVAB scores in the Marine Corps JPM data (CRM- 91-161) Y1 - 1991 A1 - Divgi, D. R. CY - Alexandria VA: Center for Naval Analysis ER - TY - CONF T1 - Applications of computer-adaptive testing in Maryland T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Baghi, H A1 - Gabrys, R. A1 - Ferrara, S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - ABST T1 - Automatic item selection (AIS) methods in the ETS testing environment (Research Memorandum 91-5) Y1 - 1991 A1 - Stocking, M. L. A1 - Swanson, L. A1 - Pearlman, M. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Building algebra testlets: A comparison of hierarchical and linear structures JF - Journal of Educational Measurement Y1 - 1991 A1 - Wainer, H., A1 - Lewis, C. A1 - Kaplan, B. A1 - Braswell, J. VL - 8 ER - TY - ABST T1 - Collected works on the legal aspects of computerized adaptive testing Y1 - 1991 A1 - Stenson, H. A1 - Graves, P. A1 - Gardiner, J. A1 - Dally, L. CY - Chicago, IL: National Council of State Boards of Nursing, Inc ER - TY - JOUR T1 - Comparability of computerized adaptive and conventional testing with the MMPI-2 JF - Journal of Personality Assessment Y1 - 1991 A1 - Roper, B. L. A1 - Ben-Porath, Y. S. A1 - Butcher, J. N. AB - A computerized adaptive version and the standard version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were administered 1 week apart to a sample of 155 college students to assess the comparability of the two versions. The countdown method was used to adaptively administer Scales L, F, the I0 clinical scales, and the 15 new content scales. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of computerized adaptive and conventional testing with the MMPI-2. Substantial item savings were found with the adaptive version. Future directions in the study of adaptive testing with the MMPI-2 are discussed. VL - 57 SN - 0022-3891 (Print) N1 - Roper, B LBen-Porath, Y SButcher, J NUnited StatesJournal of personality assessmentJ Pers Assess. 1991 Oct;57(2):278-90. ER - TY - JOUR T1 - Comparability of decisions for computer adaptive and written examinations JF - Journal of Allied Health Y1 - 1991 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. VL - 20 ER - TY - JOUR T1 - A comparison of paper-and-pencil, computer-administered, computerized feedback, and computerized adaptive testing methods for classroom achievement testing JF - Dissertation Abstracts International Y1 - 1991 A1 - Kuan, Tsung Hao KW - computerized adaptive testing VL - 52 ER - TY - JOUR T1 - A comparison of procedures for content-sensitive item selection JF - Applied Measurement in Education Y1 - 1991 A1 - Kingsbury, G. G. ER - TY - JOUR T1 - A comparison of procedures for content-sensitive item selection in computerized adaptive tests JF - Applied Measurement in Education Y1 - 1991 A1 - Kingsbury, G. G. A1 - A Zara VL - 4 ER - TY - ABST T1 - Comparisons of computer adaptive and pencil and paper tests Y1 - 1991 A1 - Bergstrom, Betty A. A1 - Lunz, M. E. CY - Chicago IL: American Society of Clinical Pathologists N1 - Unpublished manuscript. ER - TY - CHAP T1 - Computerized adaptive testing: Theory, applications, and standards Y1 - 1991 A1 - Hambleton, R. K. A1 - Zaal, J. N. A1 - Pieters, J. P. M. CY - R. K. Hambleton and J. N. Zaal (Eds.), Advances in educational and psychological testing: Theory and Applications (pp. 341-366). Boston: Kluwer. ER - TY - CONF T1 - Confidence in pass/fail decisions for computer adaptive and paper and pencil examinations T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Bergstrom, B. B A1 - Lunz, M. E. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - ABST T1 - Construction and validation of the SON-R 5-17, the Snijders-Oomen non-verbal intelligence test Y1 - 1991 A1 - Laros, J. A. A1 - Tellegen, P. J. CY - Groningen: Wolters-Noordhoff ER - TY - JOUR T1 - Correlates of examinee item choice behavior in self-adapted testing JF - Mid-Western Educational Researcher Y1 - 1991 A1 - Johnson, J. L. A1 - Roos, L. L. A1 - Wise, S. L. A1 - Plake, B. S. VL - 4 ER - TY - CONF T1 - The development and evaluation of a computerized adaptive testing system T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - De la Torre, R. A1 - Vispoel, W. P. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - ERIC No. ED 338 711) ER - TY - CONF T1 - Development and evaluation of hierarchical testlets in two-stage tests using integer linear programming T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Lam, T. L. A1 - Goong, Y. Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - CONF T1 - An empirical comparison of self-adapted and maximum information item selection T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Rocklin, T. R. A1 - O’Donnell, A. M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago ER - TY - CONF T1 - Individual differences in computer adaptive testing: Anxiety, computer literacy, and satisfaction T2 - Paper presented at the annual meeting of the National Council on Measurement in Education. Y1 - 1991 A1 - Gershon, R. C. A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education. ER - TY - JOUR T1 - Inter-subtest branching in computerized adaptive testing JF - Dissertation Abstracts International Y1 - 1991 A1 - Chang, S-H. KW - computerized adaptive testing VL - 52 ER - TY - ABST T1 - Patterns of alcohol and drug use among federal offenders as assessed by the Computerized Lifestyle Screening Instrument Y1 - 1991 A1 - Robinson, D. A1 - Porporino, F. J. A1 - Millson, W. A. KW - computerized adaptive testing KW - drug abuse KW - substance use PB - Research and Statistics Branch, Correctional Service of Canada CY - Ottawa, ON. Canada SN - R-11 ER - TY - ABST T1 - A psychometric comparison of computerized and paper-and-pencil versions of the national RN licensure examination Y1 - 1991 A1 - National-Council-of-State-Boards-of-Nursing CY - Chicago IL: Author, Unpublished report ER - TY - JOUR T1 - On the reliability of testlet-based tests JF - Journal of Educational Measurement Y1 - 1991 A1 - Sireci, S. G. A1 - Wainer, H., A1 - Thissen, D. VL - 28 ER - TY - ABST T1 - A simulation study of some simple approaches to the study of DIF for CATs Y1 - 1991 A1 - Holland, P. W. A1 - Zwick, R. CY - Internal memorandum, Educational Testing Service ER - TY - ABST T1 - Some empirical guidelines for building testlets (Technical Report 91-56) Y1 - 1991 A1 - Wainer, H., A1 - Kaplan, B. A1 - Lewis, C. CY - Princeton NJ: Educational Testing Service, Program Statistics Research ER - TY - CONF T1 - The use of the graded response model in computerized adaptive testing of the attitudes to science scale T2 - annual meeting of the American Education Research Association Y1 - 1991 A1 - Foong, Y-Y. A1 - Lam, T-L. AB - The graded response model for two-stage testing was applied to an attitudes toward science scale using real-data simulation. The 48-item scale was administered to 920 students at a grade-8 equivalent in Singapore. A two-stage 16-item computerized adaptive test was developed. In two-stage testing an initial, or routing, test is followed by a second-stage testlet of greater or lesser difficulty based on performance. A conventional test of the same length as the adaptive two-stage test was selected from the 48-item pool. Responses to the conventional test, the routing test, and a testlet were simulated. The algorithm of E. Balas (1965) and the multidimensional knapsack problem of optimization theory were used in test development. The simulation showed the efficiency and accuracy of the two-stage test with the graded response model in estimating attitude trait levels, as evidenced by better results from the two-stage test than its conventional counterpart and the reduction to one-third of the length of the original measure. Six tables and three graphs are included. (SLD) JF - annual meeting of the American Education Research Association CY - Chicago, IL USA ER - TY - JOUR T1 - The use of unidimensional parameter estimates of multidimensional items in adaptive testing JF - Applied Psychological Measurement Y1 - 1991 A1 - Ackerman, T. A. VL - 15 ER - TY - JOUR T1 - The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing JF - Applied Psychological Measurement Y1 - 1991 A1 - Ackerman, T. A. VL - 15 IS - 1 ER - TY - CONF T1 - What lies ahead? Computer technology and its implications for personnel testing T2 - Keynote address Y1 - 1991 A1 - J. R. McBride JF - Keynote address CY - NATO Workshop on Computer-based Assessment of Military Personnel, Brussels, Belgium ER - TY - JOUR T1 - Adapting adaptive testing: Using the MicroCAT Testing System in a local school district JF - Educational Measurement: Issues and Practice Y1 - 1990 A1 - Kingsbury, G. G. VL - 29 (2) ER - TY - ABST T1 - An adaptive algebra test: A testlet-based, hierarchically structured test with validity-based scoring Y1 - 1990 A1 - Wainer, H., A1 - Lewis, C. A1 - Kaplan, B, A1 - Braswell, J. CY - ETS Technical Report 90-92 N1 - Princeton NJ: Educational testing Service. ER - TY - JOUR T1 - Adaptive designs for Likert-type data: An approach for implementing marketing research JF - Journal of Marketing Research Y1 - 1990 A1 - Singh, J. A1 - Howell, R. D. A1 - Rhoads, G. K. VL - 27 ER - TY - JOUR T1 - Applying computerized adaptive testing in schools JF - Measurement and Evaluation in Counseling and Development Y1 - 1990 A1 - Olson, J. B VL - 23 ER - TY - CONF T1 - Assessing the utility of item response models: Computerized adaptive testing T2 - A paper presented to the annual meeting of the National Council of Measurement in Education Y1 - 1990 A1 - Kingsbury, G. G. A1 - Houser, R.L. JF - A paper presented to the annual meeting of the National Council of Measurement in Education CY - Boston MA ER - TY - ABST T1 - A comparison of Rasch and three-parameter logistic models in computerized adaptive testing Y1 - 1990 A1 - Parker, S.B. A1 - J. R. McBride CY - Unpublished manuscript ER - TY - JOUR T1 - A comparison of three decision models for adapting the length of computer-based mastery tests JF - Journal of Educational Computing Research Y1 - 1990 A1 - Frick, T. W. VL - 6 IS - 4 ER - TY - JOUR T1 - Computer testing: Pragmatic issues and research needs JF - Educational Measurement: Issues and Practice Y1 - 1990 A1 - Rudner, L. M. VL - 9 (2) N1 - Sum 1990. ER - TY - JOUR T1 - Computerized adaptive measurement of attitudes JF - Measurement and Evaluation in Counseling and Development Y1 - 1990 A1 - Koch, W. R. A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. VL - 23 ER - TY - CONF T1 - Computerized adaptive music tests: A new solution to three old problems T2 - Paper presented at the biannual meeting of the Music Educators National Conference Y1 - 1990 A1 - Vispoel, W. P. JF - Paper presented at the biannual meeting of the Music Educators National Conference CY - Washington DC ER - TY - BOOK T1 - Computerized adaptive testing: A primer (Eds.) Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale NJ: Erlbaum ER - TY - JOUR T1 - The construction of customized two-staged tests Y1 - 1990 A1 - Adema, J. J. VL - 27 ER - TY - CHAP T1 - Creating adaptive tests of musical ability with limited-size item pools Y1 - 1990 A1 - Vispoel, W. T. A1 - Twing, J. S CY - D. Dalton (Ed.), ADCIS 32nd International Conference Proceedings (pp. 105-112). Columbus OH: Association for the Development of Computer-Based Instructional Systems. ER - TY - CONF T1 - Dichotomous search strategies for computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association. Y1 - 1990 A1 - Xiao, B. JF - Paper presented at the annual meeting of the American Educational Research Association. ER - TY - JOUR T1 - The Effect of Item Selection Procedure and Stepsize on Computerized Adaptive Attitude Measurement Using the Rating Scale Model JF - Applied Psychological Measurement Y1 - 1990 A1 - Dodd, B. G. VL - 14 IS - 4 ER - TY - JOUR T1 - The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model JF - Applied Psychological Measurement Y1 - 1990 A1 - Dodd, B. G. AB - Real and simulated datasets were used to investigate the effects of the systematic variation of two major variables on the operating characteristics of computerized adaptive testing (CAT) applied to instruments consisting of poly- chotomously scored rating scale items. The two variables studied were the item selection procedure and the stepsize method used until maximum likelihood trait estimates could be calculated. The findings suggested that (1) item pools that consist of as few as 25 items may be adequate for CAT; (2) the variable stepsize method of preliminary trait estimation produced fewer cases of nonconvergence than the use of a fixed stepsize procedure; and (3) the scale value item selection procedure used in conjunction with a minimum standard error stopping rule outperformed the information item selection technique used in conjunction with a minimum information stopping rule in terms of the frequencies of nonconvergent cases, the number of items administered, and the correlations of CAT 0 estimates with full scale estimates and known 0 values. The implications of these findings for implementing CAT with rating scale items are discussed. Index terms: VL - 14 ER - TY - JOUR T1 - The effects of variable entry on bias and information of the Bayesian adaptive testing procedure JF - Educational and Psychological Measurement Y1 - 1990 A1 - Hankins, J. A. VL - 50 ER - TY - CONF T1 - An empirical study of the computer adaptive MMPI-2 T2 - Paper presented at the 25th Annual Symposium on recent developments in the MMPI/MMPI-2 Y1 - 1990 A1 - Ben-Porath, Y. S. A1 - Roper, B. L. A1 - Butcher, J. N. JF - Paper presented at the 25th Annual Symposium on recent developments in the MMPI/MMPI-2 CY - Minneapolis MN ER - TY - CHAP T1 - Future challenges Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 233-272). Hillsdale NJ: Erlbaum. ER - TY - ABST T1 - Future directions for the National Council: The Computerized Adaptive Testing Project Y1 - 1990 A1 - Bouchard, J. CY - Issues, 11, 1-5(National Council of State Boards of Nursing) ER - TY - JOUR T1 - Future directions for the National Council: the Computerized Adaptive Testing Project JF - Issues Y1 - 1990 A1 - Bouchard, J. KW - *Computers KW - *Licensure KW - Educational Measurement/*methods KW - Societies, Nursing KW - United States VL - 11 N1 - 911613080885-0046Journal Article ER - TY - ABST T1 - Generative adaptive testing with digit span items Y1 - 1990 A1 - Wolfe, J. H. A1 - Larson, G. E. CY - San Diego, CA: Testing Systems Department, Navy Personnel Research and Development Center ER - TY - CONF T1 - Illustration of computerized adaptive testing with the MMPI-2 T2 - Paper presented at the 98th Annual Meeting of the American Psychological Association Y1 - 1990 A1 - Roper, B. L. A1 - Ben-Porath, Y. S. A1 - Butcher, J. N. JF - Paper presented at the 98th Annual Meeting of the American Psychological Association CY - Boston MA ER - TY - CHAP T1 - Important issues in CAT Y1 - 1990 A1 - Wainer, H., CY - H. Wainer et al., Computerized adaptive testing: A primer. Hillsdale NJ: Erlbaum. ER - TY - CHAP T1 - Introduction and history Y1 - 1990 A1 - Wainer, H., CY - In H. Wainer (Ed.), Computerized adaptive testing: A Primer (pp. 1 - 21). Hillsdale NJ: Erlbaum. ER - TY - CHAP T1 - Item response theory, item calibration, and proficiency estimation Y1 - 1990 A1 - Wainer, H., A1 - Mislevy, R. J. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 65-102). Hillsdale NJ: Erlbaum. ER - TY - CONF T1 - MusicCAT: An adaptive testing program to assess musical ability T2 - Paper presented at the ADCIS 32nd International Conference Y1 - 1990 A1 - Vispoel, W. P. A1 - Coffman, D. A1 - Scriven, D. JF - Paper presented at the ADCIS 32nd International Conference CY - San Diego CA ER - TY - JOUR T1 - National Council Computerized Adaptive Testing Project Review--committee perspective JF - Issues Y1 - 1990 A1 - Haynes, B. KW - *Computers KW - *Licensure KW - Educational Measurement/*methods KW - Feasibility Studies KW - Societies, Nursing KW - United States VL - 11 N1 - 911613110885-0046Journal Article ER - TY - CHAP T1 - Reliability and measurement precision Y1 - 1990 A1 - Thissen, D. CY - H. Wainer, N. J. Dorans, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, and D. Thissen (Eds.), Computerized adaptive testing: A primer (pp. 161-186). Hillsdale NJ: Erlbaum. ER - TY - CHAP T1 - A research proposal for field testing CAT for nursing licensure examinations Y1 - 1990 A1 - A Zara CY - Delegate Assembly Book of Reports 1989. Chicago: National Council of State Boards of Nursing. ER - TY - JOUR T1 - Sequential item response models with an ordered response JF - British Journal of Mathematical and Statistical Psychology Y1 - 1990 A1 - Tutz, G. VL - 43 ER - TY - JOUR T1 - A simulation and comparison of flexilevel and Bayesian computerized adaptive testing JF - Journal of Educational Measurement Y1 - 1990 A1 - De Ayala, R. J., A1 - Dodd, B. G. A1 - Koch, W. R. KW - computerized adaptive testing AB - Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT. VL - 27 ER - TY - JOUR T1 - Software review: MicroCAT Testing System Version 3 JF - Journal of Educational Measurement Y1 - 1990 A1 - Patience, W. M. VL - 7 ER - TY - CONF T1 - The stability of Rasch pencil and paper item calibrations on computer adaptive tests T2 - Paper presented at the Midwest Objective Measurement Seminar Y1 - 1990 A1 - Bergstrom, Betty A. A1 - Lunz, M. E. JF - Paper presented at the Midwest Objective Measurement Seminar CY - Chicago IL ER - TY - CHAP T1 - Testing algorithms Y1 - 1990 A1 - Wainer, H., A1 - Mislevy, R. J. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 103-135). Hillsdale NJ: Erlbaum. ER - TY - CHAP T1 - Testing algorithms Y1 - 1990 A1 - Thissen, D. A1 - Mislevy, R. J. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 103-135). Hillsdale NJ: Erlbaum. ER - TY - Generic T1 - Test-retest consistency of computer adaptive tests. T2 - annual meeting of the National Council on Measurement in Education Y1 - 1990 A1 - Lunz, M. E. A1 - Bergstrom, Betty A. A1 - Gershon, R. C. JF - annual meeting of the National Council on Measurement in Education CY - Boston, MA USA ER - TY - JOUR T1 - Toward a psychometrics for testlets JF - Journal of Educational Measurement Y1 - 1990 A1 - Wainer, H., A1 - Lewis, C. VL - 27 ER - TY - JOUR T1 - Using Bayesian decision theory to design a computerized mastery test JF - Applied Psychological Measurement Y1 - 1990 A1 - Lewis, C., A1 - Sheehan, K. VL - 14 ER - TY - JOUR T1 - Using Bayesian Decision Theory to Design a Computerized Mastery Test JF - Applied Psychological Measurement Y1 - 1990 A1 - Lewis, C. A1 - Sheehan, K. VL - 14 IS - 4 ER - TY - ABST T1 - Utility of predicting starting abilities in sequential computer-based adaptive tests (Research Report 90-1) Y1 - 1990 A1 - Green, B. F. A1 - Thomas, T. J. CY - Baltimore MD: Johns Hopkins University, Department of Psychology ER - TY - CHAP T1 - Validity Y1 - 1990 A1 - Steinberg, L. A1 - Thissen, D. A1 - Wainer, H., CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 187-231). Hillsdale NJ: Erlbaum. ER - TY - ABST T1 - Validity study in multidimensional latent space and efficient computerized adaptive testing (Final Report R01-1069-11-004-91) Y1 - 1990 A1 - Samejima, F. CY - Knoxville TN: University of Tennessee, Department of Psychology ER - TY - CONF T1 - What can we do with computerized adaptive testing and what we cannot do? T2 - Paper presented at the annual meeting of the Regional Language Center Seminar Y1 - 1990 A1 - Laurier, M. JF - Paper presented at the annual meeting of the Regional Language Center Seminar N1 - ERIC No. ED 322 7829 ER - TY - JOUR T1 - Adaptive and conventional versions of the DAT: The first complete test battery comparison JF - Applied Psychological Measurement Y1 - 1989 A1 - Henly, S. J. A1 - Klebe, K. J. A1 - J. R. McBride A1 - Cudeck, R. VL - 13 ER - TY - JOUR T1 - Adaptive and Conventional Versions of the DAT: The First Complete Test Battery Comparison JF - Applied Psychological Measurement Y1 - 1989 A1 - Henly, S. J. A1 - Klebe, K. J. A1 - J. R. McBride A1 - Cudeck, R. VL - 13 IS - 4 ER - TY - JOUR T1 - Adaptive estimation when the unidimensionality assumption of IRT is violated JF - Applied Psychological Measurement Y1 - 1989 A1 - Folk, V.G., A1 - Green, B. F. VL - 13 ER - TY - JOUR T1 - Adaptive Estimation When the Unidimensionality Assumption of IRT is Violated JF - Applied Psychological Measurement Y1 - 1989 A1 - Folk, V.G. A1 - Green, B. F. VL - 13 IS - 4 ER - TY - JOUR T1 - Adaptive testing: The evolution of a good idea JF - Educational Measurement: Issues and Practice Y1 - 1989 A1 - Reckase, M. D. KW - computerized adaptive testing VL - 8 SN - 1745-3992 ER - TY - JOUR T1 - Application of computerized adaptive testing to the University Entrance Exam of Taiwan, R.O.C JF - Dissertation Abstracts International Y1 - 1989 A1 - Hung, P-H. KW - computerized adaptive testing VL - 49 ER - TY - BOOK T1 - An applied study on computerized adaptive testing Y1 - 1989 A1 - Schoonman, W. CY - Amsterdam, The Netherlands: Swets and Zeitlinger ER - TY - THES T1 - An applied study on computerized adaptive testing T2 - Faculty of Behavioural and Social Sciences Y1 - 1989 A1 - Schoonman, W. KW - computerized adaptive testing AB - (from the cover) The rapid development and falling prices of powerful personal computers, in combination with new test theories, will have a large impact on psychological testing. One of the new possibilities is computerized adaptive testing. During the test administration each item is chosen to be appropriate for the person being tested. The test becomes tailor-made, resolving some of the problems with classical paper-and-pencil tests. In this way individual differences can be measured with higher efficiency and reliability. Scores on other meaningful variables, such as response time, can be obtained easily using computers. /// In this book a study on computerized adaptive testing is described. The study took place at Dutch Railways in an applied setting and served practical goals. Topics discussed include the construction of computerized tests, the use of response time, the choice of algorithms and the implications of using a latent trait model. After running a number of simulations and calibrating the item banks, an experiment was carried out. In the experiment a pretest was administered to a sample of over 300 applicants, followed by an adaptive test. In addition, a survey concerning the attitudes of testees towards computerized testing formed part of the design. JF - Faculty of Behavioural and Social Sciences PB - University of Groingen CY - Groningen, The Netherlands ER - TY - CONF T1 - Assessing the impact of using item parameter estimates obtained from paper-and-pencil testing for computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1989 A1 - Kingsbury, G. G. A1 - Houser, R.L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco N1 - #KI89-01 ER - TY - JOUR T1 - Bayesian adaptation during computer-based tests and computer-guided practice exercises JF - Journal of Educational Computing Research Y1 - 1989 A1 - Frick, T. W. VL - 5(1) ER - TY - BOOK T1 - CAT administrator [Computer program] Y1 - 1989 A1 - Gershon, R. C. CY - Chicago: Micro Connections ER - TY - CONF T1 - Commercial applications of computerized adaptive testing T2 - C.E. Davis Chair, Computerized Adaptive Testing–Military and Commercial Developments Ten Years Later: Symposium conducted at the Annual Conference of the Military Testing Association (524-529) Y1 - 1989 A1 - J. R. McBride JF - C.E. Davis Chair, Computerized Adaptive Testing–Military and Commercial Developments Ten Years Later: Symposium conducted at the Annual Conference of the Military Testing Association (524-529) CY - San Antonio, TX ER - TY - ABST T1 - A comparison of an expert systems approach to computerized adaptive testing and an IRT model Y1 - 1989 A1 - Frick, T. W. CY - Unpublished manuscript (submitted to American Educational Research Journal) ER - TY - JOUR T1 - A comparison of the nominal response model and the three-parameter logistic model in computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 1989 A1 - De Ayala, R. J., VL - 49 ER - TY - CONF T1 - A comparison of three adaptive testing strategies using MicroCAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1989 A1 - Ho, R. A1 - Hsu, T. C. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco N1 - #HO89-01 Tables and figures only.) ER - TY - JOUR T1 - Comparisons of paper-administered, computer-administered and computerized adaptive achievement tests JF - Journal of Educational Computing Research Y1 - 1989 A1 - Olson, J. B A1 - Maynes, D. D. A1 - Slawson, D. A1 - Ho, K AB - This research study was designed to compare student achievement scores from three different testing methods: paper-administered testing, computer-administered testing, and computerized adaptive testing. The three testing formats were developed from the California Assessment Program (CAP) item banks for grades three and six. The paper-administered and the computer-administered tests were identical in item content, format, and sequence. The computerized adaptive test was a tailored or adaptive sequence of the items in the computer-administered test. VL - 5 ER - TY - CONF T1 - A computerized adaptive mathematics screening test T2 - Paper presented at the Annual Meeting of the California Educational Research Association Y1 - 1989 A1 - J. R. McBride JF - Paper presented at the Annual Meeting of the California Educational Research Association CY - Burlingame, CA N1 - ERIC Document Reproduction Service No. ED 316 554) ER - TY - BOOK T1 - Computerized adaptive personality assessment Y1 - 1989 A1 - Waller, N. G. CY - Unpublished master’s thesis, Harvard University, Cambridge MA ER - TY - ABST T1 - Computerized adaptive tests Y1 - 1989 A1 - Grist, S. A1 - Rudner, L. M. A1 - Wise CY - ERIC Clearinghouse on Tests, Measurement, and Evaluation, no. 107 ER - TY - ABST T1 - A consideration for variable length adaptive tests (Research Report 89-40) Y1 - 1989 A1 - Wingersky, M. S. CY - Princeton NJ: Educational Testing Service ER - TY - CHAP T1 - Die Optimierung der Mebgenauikeit beim branched adaptiven Testen [Optimization of measurement precision for branched-adaptive testing Y1 - 1989 A1 - Kubinger, K. D. CY - K. D. Kubinger (Ed.), Moderne Testtheorie Ein Abrib samt neusten Beitrgen [Modern test theory Overview and new issues] (pp. 187-218). Weinhem, Germany: Beltz. ER - TY - JOUR T1 - Estimating Reliabilities of Computerized Adaptive Tests JF - Applied Psychological Measurement Y1 - 1989 A1 - Divgi, D. R. VL - 13 IS - 2 ER - TY - BOOK T1 - Étude de praticabilité du testing adaptatif de maîtrise des apprentissages scolaires au Québec : une expérimentation en éducation économique secondaire 5 Y1 - 1989 A1 - Auger, R. CY - Thèse de doctorat non publiée. Montréal : Université du Québec à Montréal. [In French] N1 - [In French] ER - TY - CONF T1 - EXSPRT: An expert systems approach to computer-based adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1989 A1 - Frick, T. W. A1 - Plew, G.T. A1 - Luk, H.-K. JF - Paper presented at the annual meeting of the American Educational Research Association VL - San Francisco. ER - TY - CONF T1 - Golden section search strategies for computerized adaptive testing T2 - Paper presented at the Fifth International Objective Measurement Workshop Y1 - 1989 A1 - Xiao, B. JF - Paper presented at the Fifth International Objective Measurement Workshop CY - Berkeley CA N1 - #XI89-01 ER - TY - CONF T1 - Individual differences in item selection in self adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1989 A1 - Rocklin, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - ABST T1 - The interpretation and application of multidimensional item response theory models; and computerized testing in the instructional environment: Final Report (Research Report ONR 89-2) Y1 - 1989 A1 - Reckase, M. D. CY - Iowa City IA: The American College Testing Program N1 - #RE89-02 ER - TY - CONF T1 - Investigating the validity of a computerized adaptive test for different examinee groups T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1989 A1 - Buhr, D. C. A1 - Legg, S. M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - JOUR T1 - An investigation of procedures for computerized adaptive testing using partial credit scoring JF - Applied Measurement in Education Y1 - 1989 A1 - Koch, W. R. A1 - Dodd, B. G. VL - 2 ER - TY - ABST T1 - Item-presentation controls for computerized adaptive testing: Content-balancing versus min-CAT (Research Report 89-1) Y1 - 1989 A1 - Thomas, T. J. A1 - Green, B. F. CY - Baltimore MD: Johns Hopkins University, Department of Psychology, Psychometric Laboratory ER - TY - JOUR T1 - Operational characteristics of adaptive testing procedures using the graded response model JF - Applied Psychological Measurement Y1 - 1989 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J., VL - 13 ER - TY - JOUR T1 - Operational Characteristics of Adaptive Testing Procedures Using the Graded Response Model JF - Applied Psychological Measurement Y1 - 1989 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J. VL - 13 IS - 2 ER - TY - JOUR T1 - Procedures for selecting items for computerized adaptive tests JF - Applied Measurement in Education Y1 - 1989 A1 - Kingsbury, G. G. A1 - A Zara VL - 2 ER - TY - JOUR T1 - Providing item feedback in computer-based tests: Effects of initial success and failure JF - Educational and Psychological Measurement Y1 - 1989 A1 - Wise, S. L. A1 - Plake, B. S. A1 - et. al. VL - 49 ER - TY - JOUR T1 - A real-data simulation of computerized adaptive administration of the MMPI JF - Psychological Assessment Y1 - 1989 A1 - Ben-Porath, Y. S. A1 - Slutske, W. S. A1 - Butcher, J. N. KW - computerized adaptive testing AB - A real-data simulation of computerized adaptive administration of the MMPI was conducted with data obtained from two personnel-selection samples and two clinical samples. A modification of the countdown method was tested to determine the usefulness, in terms of item administration savings, of several different test administration procedures. Substantial item administration savings were achieved for all four samples, though the clinical samples required administration of more items to achieve accurate classification and/or full-scale scores than did the personnel-selection samples. The use of normative item endorsement frequencies was found to be as effective as sample-specific frequencies for the determination of item administration order. The role of computerized adaptive testing in the future of personality assessment is discussed., (C) 1989 by the American Psychological Association VL - 1 N1 - Article ER - TY - CHAP T1 - A research proposal for field testing CAT for nursing licensure examinations Y1 - 1989 A1 - A Zara CY - Delegate Assembly Book of Reports 1989. Chicago: National Council of State Boards of Nursing, Inc. ER - TY - JOUR T1 - Some procedures for computerized ability testing JF - International Journal of Educational Research Y1 - 1989 A1 - van der Linden, W. J. A1 - Zwarts, M. A. VL - 13(2) ER - TY - JOUR T1 - Tailored interviewing: An application of item response theory for personality measurement JF - Journal of Personality Assessment Y1 - 1989 A1 - Kamakura, W. A., A1 - Balasubramanian, S. K. VL - 53 ER - TY - JOUR T1 - Testing software review: MicroCAT Version 3 JF - . Educational Measurement: Issues and Practice Y1 - 1989 A1 - Stone, C. A. VL - 8 (3) ER - TY - JOUR T1 - Trace lines for testlets: A use of multiple-categorical-response models JF - Journal of Educational Measurement Y1 - 1989 A1 - Thissen, D. A1 - Steinberg, L. A1 - Mooney, J.A. VL - 26 ER - TY - BOOK T1 - Application of appropriateness measurement to a problem in computerized adaptive testing Y1 - 1988 A1 - Candell, G. L. CY - Unpublished doctoral dissertation, University of Illinois ER - TY - JOUR T1 - Assessment of academic skills of learning disabled students with classroom microcomputers JF - School Psychology Review Y1 - 1988 A1 - Watkins, M. W. A1 - Kush, J. C. VL - 17 ER - TY - JOUR T1 - The College Board computerized placement tests: An application of computerized adaptive testing JF - Machine-Mediated Learning Y1 - 1988 A1 - W. C. Ward VL - 2 ER - TY - CONF T1 - A comparison of achievement level estimates from computerized adaptive testing and paper-and-pencil testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1988 A1 - Kingsbury, G. G. A1 - Houser, R.L. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 43 KB} ER - TY - CONF T1 - A comparison of two methods for the adaptive administration of the MMPI-2 content scales T2 - Paper presented at the 86th Annual Convention of the American Psychological Association Y1 - 1988 A1 - Ben-Porath, Y. S. A1 - Waller, N. G. A1 - Slutske, W. S. A1 - Butcher, J. N. JF - Paper presented at the 86th Annual Convention of the American Psychological Association CY - Atlanta GA ER - TY - CONF T1 - Computerized adaptive attitude measurement: A comparison of the graded response and rating scale models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1988 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J., JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans ER - TY - JOUR T1 - Computerized adaptive testing: A comparison of the nominal response model and the three parameter model JF - Dissertation Abstracts International Y1 - 1988 A1 - De Ayala, R. J., KW - computerized adaptive testing VL - 48 ER - TY - JOUR T1 - Computerized adaptive testing: A four-year-old pilot study shows that CAT can work JF - Technological Horizons in Education Y1 - 1988 A1 - Kingsbury, G. G. A1 - et. al. VL - 16 (4) ER - TY - CONF T1 - Computerized adaptive testing: a good idea waiting for the right technology T2 - Paper presented at the meeting of the American Educational Research Association Y1 - 1988 A1 - Reckase, M. D. JF - Paper presented at the meeting of the American Educational Research Association CY - New Orleans, April 1988 ER - TY - CONF T1 - Computerized adaptive testing program at Miami-Dade Community College, South Campous T2 - Laguna Hill CA: League for Innovation in the community College. Y1 - 1988 A1 - Schinoff, R. B. A1 - Stead, L. JF - Laguna Hill CA: League for Innovation in the community College. ER - TY - ABST T1 - Computerized adaptive testing: The state of the art in assessment at three community colleges Y1 - 1988 A1 - League-for-Innovation-in-the-Community-College CY - Laguna Hills CA: Author N1 - (25431 Cabot Road, Suite 203, Laguna Hills CA 92653) ER - TY - BOOK T1 - Computerized adaptive testing: The state of the art in assessment at three community colleges Y1 - 1988 A1 - Doucette, D. CY - Laguna Hills CA: League for Innovation in the Community College ER - TY - CONF T1 - A computerized adaptive version of the Differential Aptitude Tests T2 - Paper presented at the meeting of the American Psychological Association Y1 - 1988 A1 - J. R. McBride JF - Paper presented at the meeting of the American Psychological Association CY - Atlanta GA ER - TY - JOUR T1 - Computerized mastery testing JF - Machine-Mediated Learning Y1 - 1988 A1 - Lewis, C. A1 - Sheehan, K. VL - 2 ER - TY - CHAP T1 - Construct validity of computer-based tests Y1 - 1988 A1 - Green, B. F. CY - H. Wainer and H. Braun (Eds.), Test validity (pp. 77-103). Hillsdale NJ: Erlbaum. ER - TY - JOUR T1 - Critical problems in computer-based psychological measurement, , , JF - Applied Measurement in Education Y1 - 1988 A1 - Green, B. F. VL - 1 ER - TY - CONF T1 - The development and evaluation of a microcomputerized adaptive placement testing system for college mathematics T2 - Paper(s) presented at the annual meeting(s) of the American Educational Research Association Y1 - 1988 A1 - Hsu, T.-C. A1 - Shermis, M. D. JF - Paper(s) presented at the annual meeting(s) of the American Educational Research Association CY - 1986 (San Francisco CA) and 1987 (Washington DC) ER - TY - ABST T1 - The equivalence of scores from automated and conventional educational and psychological tests (College Board Report No. 88-8) Y1 - 1988 A1 - Mazzeo, J. A1 - Harvey, A. L. CY - New York: The College Entrance Examination Board. ER - TY - CONF T1 - Fitting the two-parameter model to personality data: The parameterization of the Multidimensional Personality Questionnaire T2 - Unpublished manuscript. Y1 - 1988 A1 - Reise, S. P. A1 - Waller, N. G. JF - Unpublished manuscript. ER - TY - ABST T1 - The four generations of computerized educational measurement (Research Report 98-35) Y1 - 1988 A1 - Bunderson, C. V A1 - Inouye, D. K A1 - Olsen, J. B. CY - Princeton NJ: Educational Testing Service. ER - TY - JOUR T1 - Introduction to item response theory and computerized adaptive testing as applied in licensure and certification testing JF - National Clearinghouse of Examination Information Newsletter Y1 - 1988 A1 - A Zara VL - 6 ER - TY - JOUR T1 - Item pool maintenance in the presence of item parameter drift JF - Journal of Educational Measurement Y1 - 1988 A1 - Bock, B. D., A1 - Muraki, E. A1 - Pfeiffenberger, W. VL - 25 ER - TY - CONF T1 - A predictive analysis approach to adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1988 A1 - Kirisci, L. A1 - Hsu, T.-C. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - ERIC No. ED295982). ER - TY - ABST T1 - A procedure for scoring incomplete adaptive tests in high stakes testing Y1 - 1988 A1 - Segall, D. O. CY - Unpublished manuscript. San Diego, CA: Navy Personnel Research and Development Center ER - TY - CONF T1 - The Rasch model and missing data, with an emphasis on tailoring test items T2 - annual meeting of the American Educational Research Association Y1 - 1988 A1 - de Gruijter, D. N. M. AB - Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the Mokken scale) is considered. The possibility of tailoring test items in combination with their estimation is discussed; however, most methods for the estimation of item parameters are inadequate under tailoring. Without special measures, only marginal maximum likelihood produces adequate item parameter estimates under item tailoring. Fischer's approximate minimum-chi-square method for estimation of item parameters for the Rasch model is discussed, which efficiently produces item parameters. (TJH) JF - annual meeting of the American Educational Research Association CY - New Orleans, LA. USA ER - TY - JOUR T1 - The Rasch model and multi-stage testing JF - Journal of Educational and Behavioral Statistics Y1 - 1988 A1 - Glas, C. A. W. VL - 13 ER - TY - CHAP T1 - On a Rasch-model-based test for non-computerized adaptive testing Y1 - 1988 A1 - Kubinger, K. D. CY - Langeheine, R. and Rost, J. (Ed.), Latent trait and latent class models. New York: Plenum Press. ER - TY - CONF T1 - A real-data simulation of adaptive MMPI administration T2 - Paper presented at the 23rd Annual Symposium on recent developments in the use of the MMPI Y1 - 1988 A1 - Slutske, W. S. A1 - Ben-Porath, Y. S. A1 - Butcher, J. N. JF - Paper presented at the 23rd Annual Symposium on recent developments in the use of the MMPI CY - St. Petersburg FL ER - TY - ABST T1 - Refinement of the Computerized Adaptive Screening Test (CAST) (Final Report, Contract No MDA203 06-C-0373) Y1 - 1988 A1 - Wise, L. L. A1 - McHenry, J.J. A1 - Chia, W.J. A1 - Szenas, P.L. A1 - J. R. McBride CY - Washington, DC: American Institutes for Research. ER - TY - ABST T1 - Scale drift in on-line calibration (Research Report RR-88-28-ONR) Y1 - 1988 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Scale drift in on-line calibration (Tech Rep. No. ERIC ED389710) Y1 - 1988 A1 - Stocking, M. L. CY - Educational Testing Service, Princeton, N.J. ER - TY - CONF T1 - Simple and effective algorithms [for] computer-adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1988 A1 - Linacre, J. M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - #LI88-01 ER - TY - ABST T1 - Some considerations in maintaining adaptive test item pools (Research Report 88-33-ONR) Y1 - 1988 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Some considerations in maintaining adaptive test item pools (Tech Rep. No. ERIC ED391814) Y1 - 1988 A1 - Stocking, M. L. CY - Educational Testing Service, Princeton, N.J. ER - TY - BOOK T1 - Users manual for the MicroCAT Testing System, Version 3 Y1 - 1988 A1 - Assessment-Systems-Corporation CY - St. Paul MN: Author. ER - TY - BOOK T1 - An adaptive test of musical memory: An application of item response theory to the assessment of musical ability Y1 - 1987 A1 - Vispoel, W. P. CY - Doctoral dissertation, University of Illinois. Dissertation Abstracts International, 49, 79A. ER - TY - JOUR T1 - Adaptive testing JF - Applied Psychology: An International Review Y1 - 1987 A1 - Weiss, D. J. A1 - Vale, C. D. VL - 36 ER - TY - ABST T1 - Adaptive testing, information, and the partial credit model Y1 - 1987 A1 - Adams, R. J. CY - Melbourne, Australia: University of Melbourne, Center for the Study of Higher Education ER - TY - JOUR T1 - CATS, testlets, and test construction: A rationale for putting test developers back into CAT JF - Journal of Educational Measurement Y1 - 1987 A1 - Wainer, H., A1 - Kiely, G. L. VL - 32 N1 - (volume number appears to incorrect) ER - TY - ABST T1 - A computer program for adaptive testing by microcomputer (MESA Memorandum No 40) Y1 - 1987 A1 - Linacre, J. M. CY - Chicago: University of Chicago. (ERIC ED 280 895.) ER - TY - ABST T1 - Computerized adaptive language testing: A Spanish placement exam Y1 - 1987 A1 - Larson, J. W. CY - In Language Testing Research Selected Papers from the Colloquium, Monterey CA N1 - (ERIC No. FL016939) ER - TY - CONF T1 - Computerized adaptive testing: A comparison of the nominal response model and the three-parameter logistic model T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1987 A1 - De Ayala, R. J., A1 - Koch, W. R. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Washington DC ER - TY - CHAP T1 - Computerized adaptive testing for measuring abilities and other psychological variables Y1 - 1987 A1 - Weiss, D. J. A1 - Vale, C. D. CY - J. N. Butcher (Ed.), Computerized personality measurement: A practitioners guide (pp. 325-343). New York: Basic Books. ER - TY - CONF T1 - Computerized adaptive testing made practical: The Computerized Adaptive Edition of the Differential Aptitude Tests T2 - Presented at the U.S. Department of Labor National Test Development Conference Y1 - 1987 A1 - J. R. McBride JF - Presented at the U.S. Department of Labor National Test Development Conference CY - San Francisco, CA ER - TY - CONF T1 - Computerized adaptive testing with the rating scale model T2 - Paper presented at the Fourth International Objective Measurement Workshop Y1 - 1987 A1 - Dodd, B. G. JF - Paper presented at the Fourth International Objective Measurement Workshop CY - Chicago ER - TY - JOUR T1 - Computerized psychological testing: Overview and critique JF - Professional Psychology: Research and Practice Y1 - 1987 A1 - Burke, M. J, A1 - Normand, J A1 - Raju, N. M. VL - 1 ER - TY - RPRT T1 - The effect of item parameter estimation error on decisions made using the sequential probability ratio test Y1 - 1987 A1 - Spray, J. A. A1 - Reckase, M. D. KW - computerized adaptive testing KW - Sequential probability ratio test JF - ACT Research Report Series PB - DTIC Document CY - Iowa City, IA. USA ER - TY - ABST T1 - The effect of item parameter estimation error on the decisions made using the sequential probability ratio test (ACT Research Report Series 87-17) Y1 - 1987 A1 - Spray, J. A. A1 - Reckase, M. D. CY - Iowa City IA: American College Testing ER - TY - BOOK T1 - The effects of variable entry on bias and information of the Bayesian adaptive testing procedure Y1 - 1987 A1 - Hankins, J. A. CY - Dissertation Abstracts International, 47 (8A), 3013 ER - TY - CONF T1 - Equating the computerized adaptive edition of the Differential Aptitude Tests T2 - Paper presented at the meeting of the American Psychological Association Y1 - 1987 A1 - J. R. McBride A1 - Corpe, V. A. A1 - Wing, H. JF - Paper presented at the meeting of the American Psychological Association CY - New York ER - TY - ABST T1 - Equivalent-groups versus single-group equating designs for the Accelerated CAT-ASVAB Project (Research Memorandum 87-6) Y1 - 1987 A1 - Stoloff, P. H. CY - Alexandria VA: Center for Naval Analyses ER - TY - ABST T1 - Final report: Feasibility study of a computerized test administration of the CLAST Y1 - 1987 A1 - Legg, S. M. A1 - Buhr, D. C. CY - University of Florida: Institute for Student Assessment and Evaluation ER - TY - ABST T1 - Full-information item factor analysis from the ASVAB CAT item pool (Methodology Research Center Report 87-1) Y1 - 1987 A1 - Zimowski, M. F. A1 - Bock, R. D. CY - Chicago IL: University of Chicago ER - TY - ABST T1 - Functional and design specifications for the National Council of State Boards of Nursing adaptive testing system Y1 - 1987 A1 - A Zara A1 - Bosma, J. A1 - Kaplan, R. CY - Unpublished manuscript ER - TY - CHAP T1 - Improving the measurement of musical ability through adaptive testing Y1 - 1987 A1 - Vispoel, W. P. CY - G. Hayes (Ed.), Proceedings of the 29th International ADCIS Conference (pp. 221-228). Bellingham WA: ADCIS. ER - TY - JOUR T1 - Item clusters and computerized adaptive testing: A case for testlets JF - Journal of Educational Measurement Y1 - 1987 A1 - Wainer, H., A1 - Kiely, G. L. VL - 24 IS - 3 ER - TY - CONF T1 - Multidimensional adaptive testing: A procedure for sequential estimation of the posterior centroid and dispersion of theta T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1987 A1 - Bloxom, B. M. A1 - Vale, C. D. JF - Paper presented at the annual meeting of the Psychometric Society CY - Montreal, Canada ER - TY - ABST T1 - Properties of some Bayesian scoring procedures for computerized adaptive tests (Research Memorandum CRM 87-161) Y1 - 1987 A1 - Divgi, D. R. CY - Alexandria VA: Center for Naval Analyses ER - TY - JOUR T1 - Self-adapted testing: A performance improving variation of computerized adaptive testing JF - Journal of Educational Psychology Y1 - 1987 A1 - Rocklin, T. R., A1 - O’Donnell, A. M. VL - 79 ER - TY - JOUR T1 - Two simulated feasibility studies in computerized adaptive testing JF - Applied Psychology: An International Review Y1 - 1987 A1 - Stocking, M. L. VL - 36 ER - TY - RPRT T1 - The use of unidimensional item parameter estimates of multidimensional items in adaptive testing Y1 - 1987 A1 - Ackerman, T. A. AB - Investigated the effect of using multidimensional (MDN) items in a computer adaptive test setting that assumes a unidimensional item response theory model in 2 experiments, using generated and real data in which difficulty was known to be confounded with dimensionality. Results from simulations suggest that univariate calibration of MDN data filtered out multidimensionality. The closer an item's MDN composite aligned itself with the calibrated univariate ability scale's orientation, the larger was the estimated discrimination parameter. (PsycINFO Database Record (c) 2003 APA, all rights reserved). JF - ACT Research Reports PB - ACT CY - Iowa City, IA SN - 87-13 ER - TY - JOUR T1 - Wilcox' closed sequential testing procedure in stratified item domains JF - Methodika Y1 - 1987 A1 - de Gruijter, D. N. VL - 1(1) ER - TY - JOUR T1 - An application of computer adaptive testing with communication handicapped examinees JF - Educational and Psychological Measurement Y1 - 1986 A1 - Garrison, W. M. A1 - Baumgarten, B. S. KW - computerized adaptive testing AB - This study was conducted to evaluate a computerized adaptive testing procedure for the measurement of mathematical skills of entry level deaf college students. The theoretical basis of the study was the Rasch model for person measurement. Sixty persons were tested using an Apple II Plus microcomputer. Ability estimates provided by the computerized procedure were compared for stability with those obtained six to eight weeks earlier from conventional (written) testing of the same subject matter. Students' attitudes toward their testing experiences also were measured. Substantial increases in measurement efficiency (by reducing test length) were realized through the adaptive testing procedure. Because the item pool used was not specifically designed for adaptive testing purposes, the psychometric quality of measurements resulting from the different testing methods was approximately equal. Attitudes toward computerized testing were favorable. VL - 46 SN - 0013-1644 N1 - Using Smart Source Parsingno. pp. MarchJournal Article10.1177/0013164486461003 ER - TY - ABST T1 - CATs, testlets, and test construction: A rationale for putting test developers back into CAT (Technical Report 86-71) Y1 - 1986 A1 - Wainer, H., A1 - Kiely, G. L. CY - Princeton NJ: Educational Testing Service, Program Statistics Research N1 - #WA86-71 ER - TY - CHAP T1 - A cognitive error diagnostic adaptive testing system Y1 - 1986 A1 - Tatsuoka, K. K. CY - the 28th ADCIS International Conference Proceedings. Washington DC: ADCIS. ER - TY - ABST T1 - College Board computerized placement tests: Validation of an adaptive test of basic skills (Research Report 86-29) Y1 - 1986 A1 - W. C. Ward A1 - Kline, R. G. A1 - Flaugher, J. CY - Princeton NJ: Educational Testing Service. ER - TY - CONF T1 - Comparison and equating of paper-administered, computer-administered, and computerized adaptive tests of achievement T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1986 A1 - Olsen, J. B. A1 - Maynes, D. D. A1 - Slawson, D. A1 - Ho, K JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - A computer-adaptive placement test for college mathematics T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1986 A1 - Shermis, M. D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA N1 - #SH86-01 ER - TY - CONF T1 - Computerized adaptive achievement testing: A prototype T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1986 A1 - J. R. McBride A1 - Moe, K. C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco CA ER - TY - CONF T1 - A computerized adaptive edition of the Differential Aptitude Tests T2 - Presented at the National Assessment Conference of the Education Commission of the States Y1 - 1986 A1 - J. R. McBride JF - Presented at the National Assessment Conference of the Education Commission of the States CY - Boulder, CO ER - TY - CONF T1 - A computerized adaptive edition of the Differential Aptitude Tests T2 - Paper presented at the meeting of the American Psychological Association Y1 - 1986 A1 - J. R. McBride JF - Paper presented at the meeting of the American Psychological Association CY - Washington DC N1 - ERIC No. ED 285 918) ER - TY - CHAP T1 - Computerized adaptive testing: A pilot project Y1 - 1986 A1 - Kingsbury, G. G. CY - W. C. Ryan (ed.), Proceedings: NECC 86, National Educational Computing Conference (pp.172-176). Eugene OR: University of Oregon, International Council on Computers in Education. ER - TY - JOUR T1 - Computerized testing technology JF - Advances in Reading/Language Research Y1 - 1986 A1 - Wolfe, J. H. VL - 4 ER - TY - ABST T1 - Determining the sensitivity of CAT-ASVAB scores to changes in item response curves with the medium of administration (Report No.86-189) Y1 - 1986 A1 - Divgi, D. R. CY - Alexandria VA: Center for Naval Analyses N1 - #DI86-189 ER - TY - JOUR T1 - The effects of computer experience on computerized adaptive test performance JF - Educational and Psychological Measurement Y1 - 1986 A1 - Lee, J. A. VL - 46 ER - TY - JOUR T1 - Equivalence of conventional and computer presentation of speed tests JF - Applied Psychological Measurement Y1 - 1986 A1 - Greaud, V. A., A1 - Green, B. F. VL - 10 ER - TY - ABST T1 - Final report: Adaptive testing of spatial abilities (ONR 150 531) Y1 - 1986 A1 - Bejar, I. I. CY - Princeton, NJ: Educational Testing Service ER - TY - ABST T1 - Final report: The use of tailored testing with instructional programs (Research Report ONR 86-1) Y1 - 1986 A1 - Reckase, M. D. CY - Iowa City IA: The American College Testing Program, Assessment Programs Area, Test Development Division. ER - TY - CHAP T1 - The four generations of computerized educational measurement Y1 - 1986 A1 - Bunderson, C. V A1 - Inouye, D. K A1 - Olsen, J. B. CY - In R. L. Linn (Ed.), Educational Measurement (3rd ed and pp. 367-407). New York: Macmillan. ER - TY - CONF T1 - Measuring up in an individualized way with CAT-ASVAB: Considerations in the development of adaptive testing pools T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1986 A1 - Schartz, M. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Franciso CA N1 - (ERIC No. ED 269 463) ER - TY - CONF T1 - Operational characteristics of adaptive testing procedures using partial credit scoring T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1986 A1 - Koch, W. R. A1 - Dodd. B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA N1 - #KO86-01 ER - TY - JOUR T1 - Some applications of optimization algorithms in test design and adaptive testing JF - Applied Psychological Measurement Y1 - 1986 A1 - Theunissen, T. J. J. M. VL - 10 ER - TY - JOUR T1 - Some Applications of Optimization Algorithms in Test Design and Adaptive Testing JF - Applied Psychological Measurement Y1 - 1986 A1 - Theunissen, T. J. J. M. VL - 10 IS - 4 ER - TY - JOUR T1 - Using microcomputer-based assessment in career counseling JF - Journal of Employment Counseling Y1 - 1986 A1 - Thompson, D. L. VL - 23 ER - TY - JOUR T1 - Adaptive self-referenced testing as a procedure for the measurement of individual change due to instruction: A comparison of the reliabilities of change estimates obtained from conventional and adaptive testing procedures JF - Dissertation Abstracts International Y1 - 1985 A1 - Kingsbury, G. G. KW - computerized adaptive testing VL - 45 ER - TY - JOUR T1 - Adaptive testing by computer JF - Journal of Consulting and Clinical Psychology Y1 - 1985 A1 - Weiss, D. J. VL - 53 ER - TY - JOUR T1 - ALPHATAB: A lookup table for Bayesian computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1985 A1 - De Ayala, R. J., A1 - Koch, W. R. VL - 9 ER - TY - ABST T1 - Armed Services Vocational Aptitude Battery: Development of an adaptive item pool (AFHLR-TR-85-19; Technical Rep No 85-19) Y1 - 1985 A1 - Prestwood, J. S. A1 - Vale, C. D. A1 - Massey, R. H. A1 - Welsh, J. R. CY - Brooks Air Force Base TX: Air Force Human Resources Laboratory ER - TY - CONF T1 - Computerized adaptive attitude measurement T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1985 A1 - Koch, W. R. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago ER - TY - JOUR T1 - Computerized adaptive testing JF - Educational Leadership Y1 - 1985 A1 - J. R. McBride VL - 43 ER - TY - CONF T1 - Computerized adaptive testing: An overview and an example T2 - Presented at the Assessment Conference of the Education Commission of the States Y1 - 1985 A1 - J. R. McBride JF - Presented at the Assessment Conference of the Education Commission of the States CY - Boulder, CO ER - TY - JOUR T1 - Controlling item exposure conditional on ability in computerized adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 1985 A1 - Sympson, J. B. A1 - Hetter, R. D. VL - 23 ER - TY - CHAP T1 - Controlling item-exposure rates in computerized adaptive testing Y1 - 1985 A1 - Sympson, J. B. A1 - Hetter, R. D. CY - Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973-977). San Diego CA: Navy Personnel Research and Development Center. ER - TY - JOUR T1 - Current developments and future directions in computerized personality assessment JF - Journal of Consulting and Clinical Psychology Y1 - 1985 A1 - Butcher, J. N. A1 - Keller, L. S. A1 - Bacon, S. F. AB - Although computer applications in personality assessment have burgeoned rapidly in recent years, the majority of these uses capitalize on the computer's speed, accuracy, and memory capacity rather than its potential for the development of new, flexible assessment strategies. A review of current examples of computer usage in personality assessment reveals wide acceptance of automated clerical tasks such as test scoring and even test administration. The computer is also assuming tasks previously reserved for expert clinicians, such as writing narrative interpretive reports from test results. All of these functions represent automation of established assessment devices and interpretive strategies. The possibility also exists of harnessing some of the computer's unique adaptive capabilities to alter standard devices and even develop new ones. Three proposed strategies for developing computerized adaptive personality tests are described, with the conclusion that the computer's potential in this area justifies a call for further research efforts., (C) 1985 by the American Psychological Association VL - 53 N1 - Miscellaneous Article ER - TY - ABST T1 - Development of a microcomputer-based adaptive testing system: Phase II Implementation (Research Report ONR 85-5) Y1 - 1985 A1 - Vale, C. D. CY - St. Paul MN: Assessment Systems Corporation ER - TY - RPRT T1 - Equivalence of scores from computerized adaptive and paper-and-pencil ASVAB tests Y1 - 1985 A1 - Stoloff, P. H. PB - Center for Naval Analysis CY - Alexandria, VA. USA ER - TY - ABST T1 - Final report: Computerized adaptive measurement of achievement and ability Y1 - 1985 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - JOUR T1 - Implications for altering the context in which test items appear: A historical perspective on an immediate concern JF - Review of Educational Research Y1 - 1985 A1 - Leary, L. F. A1 - Dorans, N. J. VL - 55 ER - TY - CHAP T1 - Introduction Y1 - 1985 A1 - Weiss, D. J. CY - In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 1-8). New York: Academic Press. ER - TY - JOUR T1 - Latent structure and item sampling models for testing JF - Annual Review of Psychology Y1 - 1985 A1 - Traub, R. E. A1 - Lam, Y. R. VL - 36 ER - TY - CONF T1 - Methods of selecting successive items in adaptive testing T2 - Unpublished manuscript Y1 - 1985 A1 - Yu, L. JF - Unpublished manuscript CY - University of Pittsburgh ER - TY - JOUR T1 - Monitoring item calibrations from data yielded by an adaptive testing procedure JF - Educational Research Quarterly Y1 - 1985 A1 - Garrison, W. M. VL - 10 ER - TY - BOOK T1 - Proceedings of the 1982 Computerized Adaptive Testing Conference Y1 - 1985 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - CHAP T1 - Reducing the predictability of adaptive item sequences Y1 - 1985 A1 - Wetzel, C. D. A1 - J. R. McBride CY - Proceedings of the 27th Annual Conference of the Military Testing Association, San Diego, 43-48. ER - TY - BOOK T1 - Sequential analysis: Tests and confidence intervals Y1 - 1985 A1 - Siegmund, D. CY - New York: Springer-Verlag ER - TY - JOUR T1 - A structural comparison of conventional and adaptive versions of the ASVAB JF - Multivariate Behavioral Research Y1 - 1985 A1 - Cudeck, R. AB - Examined several structural models of similarity between the Armed Services Vocational Aptitude Battery (ASVAB) and a battery of computerized adaptive tests designed to measure the same aptitudes. 12 plausible models were fitted to sample data in a double cross-validation design. 1,411 US Navy recruits completed 10 ASVAB subtests. A computerized adaptive test version of the ASVAB subtests was developed on item pools of approximately 200 items each. The items were pretested using applicants from military entrance processing stations across the US, resulting in a total calibration sample size of approximately 60,000 for the computerized adaptive tests. Three of the 12 models provided reasonable summaries of the data. One model with a multiplicative structure (M. W. Browne; see record 1984-24964-001) performed quite well. This model provides an estimate of the disattenuated method correlation between conventional testing and adaptive testing. In the present data, this correlation was estimated to be 0.97 and 0.98 in the 2 halves of the data. Results support computerized adaptive tests as replacements for conventional tests. (33 ref) (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 20 N1 - Lawrence Erlbaum, US ER - TY - Generic T1 - Unidimensional and multidimensional models for item response theory T2 - Proceedings of the 1982 Computerized Adaptive Testing Conference Y1 - 1985 A1 - McDonald, R. P. JF - Proceedings of the 1982 Computerized Adaptive Testing Conference PB - University of Minnesota, Department of Psychology, Psychometrics Methods Program CY - Minneapolis, MN. USA ER - TY - CONF T1 - Validity of adaptive testing: A summary of research results T2 - Paper presented at the annual meeting of the American Psychological Association. Y1 - 1985 A1 - Sympson, J. B. A1 - Moreno, K. E. JF - Paper presented at the annual meeting of the American Psychological Association. N1 - #SY85-01 ER - TY - CONF T1 - A validity study of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery T2 - Proceedings of the 27th Annual Conference of the Military Testing Association Y1 - 1985 A1 - Moreno, K. E. A1 - Segall, D. O. A1 - Kieckhaefer, W. F. JF - Proceedings of the 27th Annual Conference of the Military Testing Association ER - TY - BOOK T1 - Adaptive self-referenced testing as a procedure for the measurement of individual change in instruction: A comparison of the reliabilities of change estimates obtained from conventional and adaptive testing procedures Y1 - 1984 A1 - Kingsbury, G. G. CY - Unpublished doctoral dissertation, Univerity of Minnesota, Minneapolis ER - TY - ABST T1 - Adaptive testing (Final Report Contract OPM-29-80) Y1 - 1984 A1 - Trollip, S. R. CY - Urbana-Champaign IL: University of Illinois, Aviation Research Laboratory ER - TY - ABST T1 - Analysis of experimental CAT ASVAB data Y1 - 1984 A1 - Allred, L. A A1 - Green, B. F. CY - Baltimore MD: Johns Hopkins University, Department of Psychology ER - TY - ABST T1 - Analysis of speeded test data from experimental CAT system Y1 - 1984 A1 - Greaud, V. A., A1 - Green, B. F. CY - Baltimore MD: Johns Hopkins University, Department of Psychology ER - TY - ABST T1 - Application of adaptive testing to a fraction test (Research Report 84-3-NIE) Y1 - 1984 A1 - Tatsuoka, K. K. A1 - Tatsuoka, M. M. A1 - Baillie, R. CY - Urbana IL: Univerity of Illinois, Computer-Based Education Research Laboratory ER - TY - JOUR T1 - Bias and Information of Bayesian Adaptive Testing JF - Applied Psychological Measurement Y1 - 1984 A1 - Weiss, D. J. A1 - J. R. McBride VL - 8 IS - 3 ER - TY - JOUR T1 - Bias and information of Bayesian adaptive testing JF - Applied Psychological Measurement Y1 - 1984 A1 - Weiss, D. J. A1 - J. R. McBride VL - 8 ER - TY - BOOK T1 - A comparison of the maximum likelihood strategy and stradaptive test on a micro-computer Y1 - 1984 A1 - Bill, B. C. CY - Unpublished M.S. thesis, University of Wisconsin, Madison. N1 - #BI84-01 ER - TY - JOUR T1 - Computerized adaptive testing in the Maryland Public Schools JF - MicroCAT News Y1 - 1984 A1 - Stevenson, J. VL - 1 ER - TY - JOUR T1 - Computerized diagnostic testing JF - Journal of Educational Measurement Y1 - 1984 A1 - MCArthur , D.L. A1 - Choppin, B. H. VL - 21 ER - TY - CONF T1 - The design of a computerized adaptive testing system for administering the ASVAB T2 - Presentation at the Annual Meeting of the American Educational Research Association Y1 - 1984 A1 - J. R. McBride JF - Presentation at the Annual Meeting of the American Educational Research Association CY - New Orleans, LA ER - TY - ABST T1 - Efficiency and precision in two-stage adaptive testing Y1 - 1984 A1 - Loyd, B.H. CY - West Palm Beach Florida: Eastern ERA ER - TY - ABST T1 - Evaluation of computerized adaptive testing of the ASVAB Y1 - 1984 A1 - Hardwicke, S. A1 - Vicino, F. A1 - J. R. McBride A1 - Nemeth, C. CY - San Diego, CA: Navy Personnel Research and Development Center, unpublished manuscript ER - TY - CONF T1 - An evaluation of the utility of large scale computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1984 A1 - Vicino, F. L. A1 - Hardwicke, S. B. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago ER - TY - CONF T1 - An evaluation of the utility of large scale computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1984 A1 - Vicino, F. L. A1 - Hardwicke, S. B. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - ABST T1 - Evaluation plan for the computerized adaptive vocational aptitude battery (Research Report 82-1) Y1 - 1984 A1 - Green, B. F. A1 - Bock, R. D. A1 - Humphreys, L. G. A1 - Linn, R. L. A1 - Reckase, M. D. N1 - Baltimore MD: The Johns Hopkins University, Department of Psychology. ER - TY - JOUR T1 - Issues in item banking JF - Journal of Educational Measurement Y1 - 1984 A1 - Millman, J. A1 - Arter, J.A. VL - 1 ER - TY - JOUR T1 - Item Location Effects and Their Implications for IRT Equating and Adaptive Testing JF - Applied Psychological Measurement Y1 - 1984 A1 - Kingston, N. M. A1 - Dorans, N. J. VL - 8 IS - 2 ER - TY - ABST T1 - Microcomputer network for computerized adaptive testing (CAT) (TR-84-33) Y1 - 1984 A1 - Quan, B. A1 - Park, T. A. A1 - Sandahl, G. A1 - Wolfe, J. H. CY - San Diego CA: Navy Personnel Research and Development Center ER - TY - JOUR T1 - A plan for scaling the computerized adaptive Armed Services Vocational Aptitude Battery JF - Journal of Educational Measurement Y1 - 1984 A1 - Green, B. F. A1 - Bock, B. D., A1 - Linn, R. L. A1 - Lord, F. M., A1 - Reckase, M. D. VL - 21 ER - TY - CONF T1 - Predictive validity of computerized adaptive testing in a military training environment T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1984 A1 - Sympson, J. B. A1 - Weiss, D. J. A1 - Ree, M. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - JOUR T1 - Relationship between corresponding Armed Services Vocational Aptitude Battery (ASVAB) and computerized adaptive testing (CAT) subtests JF - Applied Psychological Measurement Y1 - 1984 A1 - Moreno, K. E. A1 - Wetzel, C. D. A1 - J. R. McBride A1 - Weiss, D. J. KW - computerized adaptive testing AB - Investigated the relationships between selected subtests from the Armed Services Vocational Aptitude Battery (ASVAB) and corresponding subtests administered as computerized adaptive tests (CATs), using 270 17-26 yr old Marine recruits as Ss. Ss were administered the ASVAB before enlisting and approximately 2 wks after entering active duty, and the CAT tests were administered to Ss approximately 24 hrs after arriving at the recruit depot. Results indicate that 3 adaptive subtests correlated as well with ASVAB as did the 2nd administration of the ASVAB, although CAT subtests contained only half the number of items. Factor analysis showed CAT subtests to load on the same factors as the corresponding ASVAB subtests, indicating that the same abilities were being measured. It is concluded that CAT can achieve the same measurement precision as a conventional test, with half the number of items. (16 ref) VL - 8 N1 - Sage Publications, US ER - TY - JOUR T1 - Relationship Between Corresponding Armed Services Vocational Aptitude Battery (ASVAB) and Computerized Adaptive Testing (CAT) Subtests JF - Applied Psychological Measurement Y1 - 1984 A1 - Moreno, K. E. A1 - Wetzel, C. D. A1 - J. R. McBride A1 - Weiss, D. J. VL - 8 IS - 2 ER - TY - CONF T1 - The selection of items for decision making with a computer adaptive test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1984 A1 - Spray, J. A. A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - JOUR T1 - Technical guidelines for assessing computerized adaptive tests JF - Journal of Educational Measurement Y1 - 1984 A1 - Green, B. F. A1 - Bock, R. D. A1 - Humphreys, L. G. A1 - Linn, R. L. A1 - Reckase, M. D. KW - computerized adaptive testing KW - Mode effects KW - paper-and-pencil VL - 21 SN - 1745-3984 ER - TY - ABST T1 - Two simulated feasibility studies in computerized adaptive testing (RR-84-15) Y1 - 1984 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - BOOK T1 - Users manual for the MicroCAT Testing System Y1 - 1984 A1 - Assessment-Systems-Corporation CY - St. Paul MN: Author ER - TY - JOUR T1 - Using microcomputers to administer tests JF - Educational Measurement: Issues and Practice Y1 - 1984 A1 - W. C. Ward VL - 3(2) ER - TY - JOUR T1 - Using microcomputers to administer tests: An alternate point of view JF - Educational Measurement: Issues and Practice Y1 - 1984 A1 - Millman, J. VL - 3(2) ER - TY - CHAP T1 - Adaptive testing by computer Y1 - 1983 A1 - Green, B. F. CY - R. B. Ekstrom (ed.), Measurement, technology, and individuality in education. New directions for testing and measurement, Number 17. San Francisco: Jossey-Bass. ER - TY - ABST T1 - Alternate forms reliability and concurrent validity of adaptive and conventional tests with military recruits Y1 - 1983 A1 - Kiely, G. L. A1 - A Zara A1 - Weiss, D. J. CY - Minneapolis MN: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - JOUR T1 - An application of computerized adaptive testing in U. S. Army recruiting. JF - Journal of Computer-Based Instruction Y1 - 1983 A1 - Sands, W. A. A1 - Gade, P. A. VL - 10 ER - TY - ABST T1 - Bias and information of Bayesian adaptive testing (Research Report 83-2) Y1 - 1983 A1 - Weiss, D. J. A1 - J. R. McBride CY - Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory N1 - {PDF file, 1.066MB} ER - TY - CHAP T1 - A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure Y1 - 1983 A1 - Kingsbury, G. G. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257-283). New York: Academic Press. ER - TY - CHAP T1 - A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. T2 - New horizons in testing: Latent trait test theory and computerized adaptive testing Y1 - 1983 A1 - Kingsbury, G. G. A1 - Weiss, D. J. JF - New horizons in testing: Latent trait test theory and computerized adaptive testing PB - Academic Press. CY - New York, NY. USA ER - TY - CHAP T1 - A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure Y1 - 1983 A1 - Kingsbury, G.G. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 1-8). New York: Academic Press. ER - TY - BOOK T1 - Effects of item parameter error and other factors on trait estimation in latent trait based adaptive testing Y1 - 1983 A1 - Mattson, J. D. CY - Unpublished doctoral dissertation, University of Minnesota N1 - Dissertation Abstracts International, 44(3-B), 944. ER - TY - ABST T1 - An evaluation of one- and three-parameter logistic tailored testing procedures for use with small item pools (Research Report ONR83-1) Y1 - 1983 A1 - McKinley, R. L. A1 - Reckase, M. D. CY - Iowa City IA: American College Testing Program ER - TY - ABST T1 - Final report: Computer-based measurement of intellectual capabilities Y1 - 1983 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - Influence of fallible item parameters on test information during adaptive testing (Tech Rep 83-15). Y1 - 1983 A1 - Wetzel, C. D. A1 - J. R. McBride CY - San Diego CA: Navy Personnel Research and Development Center. N1 - #WE83-15 ER - TY - JOUR T1 - On item response theory and computerized adaptive testing: The coming technical revolution in testing JF - Journal of College Admissions Y1 - 1983 A1 - Wainer, H., VL - 28 ER - TY - BOOK T1 - New horizons in testing: Latent trait test theory and computerized adaptive testing Y1 - 1983 A1 - Weiss, D. J. CY - New York: Academic Press ER - TY - CHAP T1 - The person response curve: Fit of individuals to item response theory models Y1 - 1983 A1 - Trabin, T. E. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 83-108). New York: Academic Press. ER - TY - ABST T1 - Predictive utility evaluation of adaptive testing: Results of the Navy research Y1 - 1983 A1 - Hardwicke, S. A1 - White, K. E. CY - Falls Church VA: The Rehab Group Inc ER - TY - CHAP T1 - A procedure for decision making using tailored testing. T2 - New horizons in testing: Latent trait theory and computerized adaptive testing Y1 - 1983 A1 - Reckase, M. D. KW - CCAT KW - CLASSIFICATION Computerized Adaptive Testing KW - sequential probability ratio testing KW - SPRT JF - New horizons in testing: Latent trait theory and computerized adaptive testing PB - Academic Press CY - New York, NY. USA ER - TY - CHAP T1 - The promise of tailored tests Y1 - 1983 A1 - Green, B. F. CY - H. Wainer and S. Messick (Eds.). Principals of modern psychological measurement (pp. 69-80). Hillsdale NJ: Erlbaum. ER - TY - ABST T1 - Relationship between corresponding Armed Services Vocational Aptitude Battery (ASVAB) and computerized adaptive testing (CAT) subtests (TR 83-27) Y1 - 1983 A1 - Moreno, K. E. A1 - Wetzel, D. C. A1 - J. R. McBride A1 - Weiss, D. J. CY - San Diego CA: Navy Personnel Research and Development Center ER - TY - CHAP T1 - Reliability and validity of adaptive ability tests in a military setting Y1 - 1983 A1 - J. R. McBride A1 - Martin, J. T. CY - D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 224-236). New York: Academic Press. ER - TY - ABST T1 - Reliability and validity of adaptive ability tests in a military recruit population (Research Report 83-1) Y1 - 1983 A1 - J. R. McBride A1 - Martin, J. T. A1 - Weiss, D. J. CY - Minneapolis: Department of Psychology, Psychometric Methods Program, Computerized Testing Laboratory ER - TY - ABST T1 - Reliability and validity of adaptive vs. conventional tests in a military recruit population (Research Rep. No. 83-1). Y1 - 1983 A1 - Martin, J. T. A1 - J. R. McBride A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. N1 - {PDF file, 2.787 MB} ER - TY - CHAP T1 - Small N justifies Rasch model T2 - New horizons in testing: Latent trait test theory and computerized adaptive testing Y1 - 1983 A1 - Lord, F. M., ED - Bock, R. D. JF - New horizons in testing: Latent trait test theory and computerized adaptive testing PB - Academic Press CY - New York, NY. USA ER - TY - BOOK T1 - The stochastic modeling of elementary psychological processes Y1 - 1983 A1 - Townsend, J. T. A1 - Ashby, G. F. CY - Cambridge: Cambridge University Press ER - TY - ABST T1 - The stratified adaptive computerized ability test (Research Report 73-3) Y1 - 1983 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Tailored testing, its theory and practice. Part I: The basic model, the normal ogive submodels, and the tailored testing algorithm (NPRDC TR-83-00) Y1 - 1983 A1 - Urry, V. W. A1 - Dorans, N. J. CY - San Diego CA: Navy Personnel Research and Development Center ER - TY - JOUR T1 - Ability measurement, test bias reduction, and psychological reactions to testing as a function of computer adaptive testing versus conventional testing JF - Dissertation Abstracts International Y1 - 1982 A1 - Orban, J. A. KW - computerized adaptive testing VL - 42 ER - TY - JOUR T1 - Adaptive EAP estimation of ability in a microcomputer environment JF - Applied Psychological Measurement Y1 - 1982 A1 - Bock, B. D., A1 - Mislevy, R. J. VL - 6 ER - TY - ABST T1 - An adaptive Private Pilot Certification Exam Y1 - 1982 A1 - Trollip, S. R. A1 - Anderson, R. I. CY - Aviation, Space, and Environmental Medicine ER - TY - CONF T1 - Assessing mathematics achievement with a tailored testing program T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1982 A1 - Garrison, W. M. A1 - Baumgarten, B. S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York N1 - #GA82-01 ER - TY - JOUR T1 - Automated tailored testing using Raven’s Matrices and the Mill Hill vocabulary tests JF - International Journal of Man-Machine Studies Y1 - 1982 A1 - Watts, K., A1 - Baddeley, A. D. A1 - Williams, M. VL - 17 ER - TY - RPRT T1 - Comparison of live and simulated adaptive tests Y1 - 1982 A1 - HUnter, D. R. JF - Air Force Human Resources Laborarory PB - Air Force Systems Command CY - Brooks Air Force Base, Texas ER - TY - ABST T1 - Computerized adaptive testing project: Objectives and requirements (Tech Note 82-22) Y1 - 1982 A1 - J. R. McBride CY - San Diego CA: Navy Personnel Research and Development Center. (AD A118 447) N1 - #McB82-22 ER - TY - ABST T1 - Computerized adaptive testing system design: Preliminary design considerations (Tech. Report 82-52) Y1 - 1982 A1 - Croll, P. R. CY - San Diego CA: Navy Personnel Research and Development Center. (AD A118 495) ER - TY - ABST T1 - Computerized Adaptive Testing system development and project management. Y1 - 1982 A1 - J. R. McBride CY - Minutes of the ASVAB (Armed Services Vocational Aptitude Battery) Steering Committee. Washington, DC: Office of the Assistant Secretary of Defense (Manpower, Reserve Affairs and Logistics), Accession Policy Directorate. ER - TY - CHAP T1 - The computerized adaptive testing system development project Y1 - 1982 A1 - J. R. McBride A1 - Sympson, J. B. CY - D. J. Weiss (Ed.), Proceedings of the 1982 Item Response Theory and Computerized Adaptive Testing Conference (pp. 342-349). Minneapolis: University of Minnesota, Department of Psychology. N1 - {PDF file, 296 KB} ER - TY - CHAP T1 - Computerized testing in the German Federal Armed Forces (FAF): Empirical approaches Y1 - 1982 A1 - Wildgrube, W. CY - D. J. Weiss (Ed.), Proceedings of the 1982 Item Response Theory and Computerized Adaptive Testing Conference (pp.353-359). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - PDF file, 384 K ER - TY - CHAP T1 - Design of a Microcomputer-Based Adaptive Testing System Y1 - 1982 A1 - Vale, C. D. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Item Response Theory and Computerized Adaptive Testing Conference (pp. 360-371). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laborat N1 - {PDF file, 697 KB} ER - TY - CONF T1 - Development of a computerized adaptive testing system for enlisted personnel selection T2 - Presented at the Annual Convention of the American Psychological Association Y1 - 1982 A1 - J. R. McBride JF - Presented at the Annual Convention of the American Psychological Association CY - Washington, DC ER - TY - CHAP T1 - Discussion: Adaptive and sequential testing Y1 - 1982 A1 - Reckase, M. D. CY - D. J. Weiss (Ed.). Proceedings of the 1982 Computerized Adaptive Testing Conference (pp. 290-294). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 288 KB} ER - TY - JOUR T1 - Improving Measurement Quality and Efficiency with Adaptive Testing JF - Applied Psychological Measurement Y1 - 1982 A1 - Weiss, D. J. VL - 6 IS - 4 ER - TY - CHAP T1 - Item Calibrations for Computerized Adaptive Testing (CAT) Experimental Item Pools Adaptive Testing Y1 - 1982 A1 - Sympson, J. B. A1 - Hartmann, l. CY - D. J. Weiss (Ed.). Proceedings of the 1982 Computerized Adaptive Testing Conference (pp. 290-294). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - PDF file, 105 K ER - TY - CONF T1 - Legal and political considerations in large-scale adaptive testing T2 - Paper presented at the 23rd conference of the Military Testing Association. Y1 - 1982 A1 - B. K. Waters A1 - Lee, G. C. JF - Paper presented at the 23rd conference of the Military Testing Association. ER - TY - ABST T1 - Predictive validity of conventional and adaptive tests in an Air Force training environment (Report AFHRL-TR-81-40) Y1 - 1982 A1 - Sympson, J. B. A1 - Weiss, D. J. A1 - Ree, M. J. CY - Brooks Air Force Base TX: Air Force Human Resources Laboratory, Manpower and Personnel Division N1 - #SY82-01 ER - TY - JOUR T1 - Pros and cons of tailored testing: An examination of issues highlighted with an automated testing system JF - International Journal of Man-Machine Studies Y1 - 1982 A1 - Volans, P. J. VL - 17 ER - TY - CHAP T1 - Robustness of adaptive testing to multidimensionality Y1 - 1982 A1 - Weiss, D. J. A1 - Suhadolnik, D. CY - D. J. Weiss (Ed.), Proceedings of the 1982 Item Response Theory and Computerized Adaptive Testing Conference. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. {PDF file, 1. N1 - 42 MB} ER - TY - JOUR T1 - Sequential Testing for Selection JF - Applied Psychological Measurement Y1 - 1982 A1 - Weitzman, R. A. VL - 6 IS - 3 ER - TY - JOUR T1 - Sequential testing for selection JF - Applied Psychological Measurement Y1 - 1982 A1 - Weitzman, R. A. VL - 6 ER - TY - CHAP T1 - Use of Sequential Testing to Prescreen Prospective Entrants to Military Service. Y1 - 1982 A1 - Weitzman, R. A. CY - D. J. Weiss (Ed.), Proceedings of the 1982 Item Response Theory and Computerized Adaptive Testing Conference Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 483 KB} ER - TY - BOOK T1 - Ability measurement, test bias reduction, and psychological reactions to testing as a function of computer adaptive testing versus conventional Y1 - 1981 A1 - Orban, J. A. CY - Unpublished doctoral dissertation, Virginia Polytechnic Institute and State University. Dissertational Abstracts International, 1982, 42,(10-B), 4233 N1 - #OR81-01 ER - TY - ABST T1 - Adaptive testing without a computer Y1 - 1981 A1 - Friedman, D. A1 - Steinberg, A, A1 - Ree, M. J. CY - Catalog of Selected Documents in Psychology, Nov 1981, 11, 74-75 (Ms. No. 2350). AFHRL Technical Report 80-66. ER - TY - RPRT T1 - A comparison of a Bayesian and a maximum likelihood tailored testing procedure Y1 - 1981 A1 - McKinley, R. L., A1 - Reckase, M. D. JF - Research Report 81-2 PB - University of Missouri, Department of Educational Psychology, Tailored Testing Research Laboratory CY - Columbia MO ER - TY - CONF T1 - A comparison of a maximum likelihood and a Bayesian estimation procedure for tailored testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1981 A1 - Rosso, M. A. A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Los Angeles CA N1 - #RO81-01 ER - TY - ABST T1 - A comparison of two methods of interactive testing Final report. Y1 - 1981 A1 - Nicewander, W. A. A1 - Chang, H. S. A1 - Doody, E. N. CY - National Institute of Education Grant 79-1045 ER - TY - JOUR T1 - Design and implementation of a microcomputer-based adaptive testing system JF - Behavior Research Methods and Instrumentation Y1 - 1981 A1 - Vale, C. D. VL - 13 ER - TY - BOOK T1 - Effect of error in item parameter estimates on adaptive testing (Doctoral dissertation, University of Minnesota) Y1 - 1981 A1 - Crichton, L. I. CY - Dissertation Abstracts International, 42, 06-B N1 - (University Microfilms No. AAD81-25946) ER - TY - JOUR T1 - The Effects of Item Calibration Sample Size and Item Pool Size on Adaptive Testing JF - Applied Psychological Measurement Y1 - 1981 A1 - Ree, M. J. VL - 5 IS - 1 ER - TY - ABST T1 - Factors influencing the psychometric characteristics of an adaptive testing strategy for test batteries (Research Rep. No. 81-4) Y1 - 1981 A1 - Maurelli, V. A. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - {PDF file, 1.689 MB} ER - TY - ABST T1 - Final report: Computerized adaptive ability testing Y1 - 1981 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - Final report: Procedures for criterion referenced tailored testing Y1 - 1981 A1 - Reckase, M. D. CY - Columbia: University of Missouri, Educational Psychology Department ER - TY - JOUR T1 - Optimal item difficulty for the three-parameter normal ogive model JF - Psychometrika Y1 - 1981 A1 - Wolfe, J. H. VL - 46 ER - TY - ABST T1 - Tailored testing, its theory and practice. Part II: Ability and item parameter estimation, multiple ability application, and allied procedures (NPRDC TR-81) Y1 - 1981 A1 - Urry, V. W. CY - San Diego CA: Navy Personnel Research and Development Center N1 - Part II: Ability and item parameter estimation, multiple ability application, and allied procedures (NPRDC TR-81) ER - TY - ABST T1 - The use of the sequential probability ratio test in making grade classifications in conjunction with tailored testing (Research Report 81-4) Y1 - 1981 A1 - Reckase, M. D. CY - Columbia MO: University of Missouri, Department of Educational Psychology ER - TY - ABST T1 - A validity comparison of adaptive and conventional strategies for mastery testing (Research Report 81-3) Y1 - 1981 A1 - Kingsbury, G. G. A1 - Weiss, D. J. CY - Minneapolis, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - {PDF file, 1.855 MB} ER - TY - CHAP T1 - Adaptive verbal ability testing in a military setting Y1 - 1980 A1 - J. R. McBride CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 4-15). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. N1 - {PDF file, 635 KB} ER - TY - ABST T1 - An alternate-forms reliability and concurrent validity comparison of Bayesian adaptive and conventional ability tests (Research Report 80-5) Y1 - 1980 A1 - Kingsbury, G. G. A1 - Weiss, D. J. CY - Minneapolis, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - {PDF file, 1.11 MB} ER - TY - BOOK T1 - A comparative evaluation of two Bayesian adaptive ability estimation procedures Y1 - 1980 A1 - Gorman, S. CY - Unpublished doctoral dissertation, the Catholic University of America ER - TY - THES T1 - A comparative evaluation of two Bayesian adaptive ability estimation procedures with a conventional test strategy Y1 - 1980 A1 - Gorman, S. PB - Catholic University of America CY - Washington DC VL - Ph.D. ER - TY - ABST T1 - A comparison of adaptive, sequential, and conventional testing strategies for mastery decisions (Research Report 80-4) Y1 - 1980 A1 - Kingsbury, G. G. A1 - Weiss, D. J. CY - Minneapolis, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - {PDF file, 1.905 MB} ER - TY - CHAP T1 - A comparison of ICC-based adaptive mastery testing and the Waldian probability ratio method Y1 - 1980 A1 - Kingsbury, G. G. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.). Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 120-139). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - 51 MB} ER - TY - CHAP T1 - A comparison of the accuracy of Bayesian adaptive and static tests using a correction for regression Y1 - 1980 A1 - Gorman, S. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 35-50). Minneapolis MN: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory. N1 - {PDF file, 735 KB} ER - TY - JOUR T1 - Computer applications in audiology and rehabilitation of the hearing impaired JF - Journal of Communication Disorders Y1 - 1980 A1 - Levitt, H. VL - 13 ER - TY - JOUR T1 - Computer applications to ability testing JF - Association for Educational Data Systems Journal Y1 - 1980 A1 - McKinley, R. L., A1 - Reckase, M. D. VL - 13 ER - TY - ABST T1 - Computerized instructional adaptive testing model: Formulation and validation (AFHRL-TR-79-33, Final Report) Y1 - 1980 A1 - Kalisch, S. J. CY - Brooks Air Force Base TX: Air Force Human Resources Laboratory", Also Catalog of Selected Documents in Psychology, February 1981, 11, 20 (Ms. No, 2217) ER - TY - CHAP T1 - Computerized testing in the German Federal Armed Forces (FAF) Y1 - 1980 A1 - Wildgrube, W. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Item Response Theory and Computerized Adaptive Testing Conference (pp. 68-77). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laborator N1 - {PDF file, 595 KB} ER - TY - ABST T1 - Criterion-related validity of adaptive testing strategies (Research Report 80-3) Y1 - 1980 A1 - Thompson, J. G. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - #TH80-03 {PDF file, 2.708 MB} ER - TY - BOOK T1 - Development and evaluation of an adaptive testing strategy for use in multidimensional interest assessment Y1 - 1980 A1 - Vale, C. D. CY - Unpublished doctoral dissertation, University of Minnesota. Dissertational Abstract International, 42(11-B), 4248-4249 ER - TY - CHAP T1 - Discussion: Session 1 Y1 - 1980 A1 - B. K. Waters CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 51-55). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. N1 - #WA80-01 {PDF file, 283 KB} ER - TY - CHAP T1 - Discussion: Session 3 Y1 - 1980 A1 - Novick, M. R. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Item Response Theory and Computerized Adaptive Testing Conference (pp. 140-143). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laborat N1 - {PDF file, 286 KB} ER - TY - ABST T1 - Effects of computerized adaptive testing on Black and White students (Research Report 79-2) Y1 - 1980 A1 - Pine, S. M. A1 - Church, A. T. A1 - Gialluca, K. A. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 2.323 MB} ER - TY - CONF T1 - Effects of program parameters and item pool characteristics on the bias of a three-parameter tailored testing procedure T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1980 A1 - Patience, W. M. A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Boston MA, U N1 - A. ER - TY - ABST T1 - An empirical study of a broad range test of verbal ability Y1 - 1980 A1 - Kreitzberg, C. B. A1 - Jones, D. J. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Estimating the reliability of adaptive tests from a single test administration T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1980 A1 - Sympson, J. B. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Boston N1 - 1981 draft copy available.) {PDF File, 7,603 KB} ER - TY - ABST T1 - Final report: Computerized adaptive performance evaluation Y1 - 1980 A1 - Weiss, D. J. CY - Minneapolis: Univerity of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Final Report: Computerized adaptive testing, assessment of requirements Y1 - 1980 A1 - Rehab-Group-Inc. CY - Falls Church VA: Author ER - TY - JOUR T1 - Implied Orders Tailored Testing: Simulation with the Stanford-Binet JF - Applied Psychological Measurement Y1 - 1980 A1 - Cudeck, R. A1 - McCormick, D. J. A1 - N. Cliff VL - 4 IS - 2 ER - TY - JOUR T1 - Implied orders tailored testing: Simulation with the Stanford-Binet JF - Applied Psychological Measurement Y1 - 1980 A1 - Cudeck, R. A1 - McCormick, D. A1 - Cliff, N. A. VL - 4 ER - TY - CHAP T1 - Individualized testing on the basis of the Rasch model Y1 - 1980 A1 - Fischer, G. H. A1 - Pendl, P. CY - In J. Th. Van der Kamp, W. F. Langerak, and D. N. M. de Gruijter (Eds.). Psychometrics for educational debates. New York: Wiley. ER - TY - CHAP T1 - A model for computerized adaptive testing related to instructional situations Y1 - 1980 A1 - Kalisch, S. J. CY - D. J. Weiss (Ed.). Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 101-119). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. N1 - {PDF file, 965 KB} ER - TY - JOUR T1 - Operational characteristics of a one-parameter tailored testing procedure JF - Catalog of Selected Documents in Psychology Y1 - 1980 A1 - Patience, W. M., A1 - Reckase, M. D. VL - August 1980 N1 - No. 2104). ER - TY - CHAP T1 - Parallel forms reliability and measurement accuracy comparison of adaptive and conventional testing strategies Y1 - 1980 A1 - Johnson, M. J. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 16-34). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. N1 - {PDF file, 918 KB} ER - TY - BOOK T1 - Proceedings of the 1979 Computerized Adaptive Testing Conference Y1 - 1980 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - JOUR T1 - A simple form of tailored testing JF - British Journal of Educational Psychology Y1 - 1980 A1 - Nisbet, J. A1 - Adams, M. A1 - Arthur, J. VL - 50 ER - TY - CHAP T1 - Some decision procedures for use with tailored testing Y1 - 1980 A1 - Reckase, M. D. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 79-100). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. ER - TY - CHAP T1 - Some how and which for practical tailored testing Y1 - 1980 A1 - Lord, F. M., CY - L. J. T. van der Kamp, W. F. Langerak and D.N.M. de Gruijter (Eds): Psychometrics for educational debates (pp. 189-206). New York: John Wiley and Sons. Computer-Assisted Instruction, Testing, and Guidance (pp. 139-183). New York: Harper and Row. ER - TY - ABST T1 - A successful application of latent trait theory to tailored achievement testing (Research Report 80-1) Y1 - 1980 A1 - McKinley, R. L. A1 - Reckase, M. D. CY - University of Missouri, Department of Educational Psychology, Tailored Testing Research Laboratory ER - TY - CHAP T1 - A validity study of an adaptive test of reading comprehension Y1 - 1980 A1 - Hornke, L. F. A1 - Sauter, M. B. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 57-67). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 676 KB} ER - TY - ABST T1 - Adaptive mental testing: The state of the art (Technical Report 423) Y1 - 1979 A1 - J. R. McBride CY - Alexandria VA: U.S. Army Research Institute for the Behavioral and Social Sciences. ER - TY - ABST T1 - An adaptive testing strategy for mastery decisions (Research Report 79-5) Y1 - 1979 A1 - Kingsbury, G. G. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 2.146 MB} ER - TY - ABST T1 - Adaptive tests' usefulness for military personnel screening Y1 - 1979 A1 - J. R. McBride CY - In M. Wiskoff, Chair, Military Applications of Computerized Adaptive Testing. Symposium presented at the Annual Convention of the American Psychological Association, New York. ER - TY - ABST T1 - Bayesian sequential design and analysis of dichotomous experiments with special reference to mental testing Y1 - 1979 A1 - Owen, R. J. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - A comparison of a standard and a computerized adaptive paradigm in Bekesy fixed-frequency audiometry JF - Journal of Auditory Research Y1 - 1979 A1 - Harris, J. D. A1 - Smith, P. F. VL - 19 ER - TY - ABST T1 - Computerized adaptive testing: The state of the art (ARI Technical Report 423) Y1 - 1979 A1 - J. R. McBride CY - Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. ER - TY - CONF T1 - Criterion-related validity of conventional and adaptive tests in a military environment T2 - Paper presented at the 1979 Computerized Adaptive Testing Conference Y1 - 1979 A1 - Sympson, J. B. JF - Paper presented at the 1979 Computerized Adaptive Testing Conference CY - Minneapolis MN ER - TY - ABST T1 - The danger of relying solely on diagnostic adaptive testing when prior and subsequent instructional methods are different (CERL Report E-5) Y1 - 1979 A1 - Tatsuoka, K. A1 - Birenbaum, M. CY - Urbana IL: Univeristy of Illinois, Computer-Based Education Research Laboratory. N1 - #TA79-01 ER - TY - RPRT T1 - Efficiency of an adaptive inter-subtest branching strategy in the measurement of classroom achievement (Research Report 79-6) Y1 - 1979 A1 - Gialluca, K. A. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - An evaluation of computerized adaptive testing Y1 - 1979 A1 - J. R. McBride CY - In Proceedings of the 21st Military Testing Association Conference. SanDiego, CA: Navy Personnel Research and Development Center. ER - TY - JOUR T1 - Evaluation of implied orders as a basis for tailored testing with simulation data JF - Applied Psychological Measurement Y1 - 1979 A1 - Cliff, N. A. A1 - McCormick, D. VL - 3 ER - TY - JOUR T1 - Evaluation of Implied Orders as a Basis for Tailored Testing with Simulation Data JF - Applied Psychological Measurement Y1 - 1979 A1 - N. Cliff A1 - Cudeck, R. A1 - McCormick, D. J. VL - 3 IS - 4 ER - TY - JOUR T1 - Four realizations of pyramidal adaptive testing JF - Programmed Larning and Educational Technology Y1 - 1979 A1 - Hornke, L. F. VL - 16 ER - TY - JOUR T1 - Monte carlo evaluation of implied orders as a basis for tailored testing JF - Applied Psychological Measurement Y1 - 1979 A1 - Cudeck, R. A1 - McCormick, D. J. A1 - Cliff, N. A. VL - 3 ER - TY - JOUR T1 - Monte Carlo Evaluation of Implied Orders As a Basis for Tailored Testing JF - Applied Psychological Measurement Y1 - 1979 A1 - Cudeck, R. A1 - McCormick, D. A1 - N. Cliff VL - 3 IS - 1 ER - TY - CONF T1 - Operational characteristics of a Rasch model tailored testing procedure when program parameters and item pool attributes are varied T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1979 A1 - Patience, W. M. A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - ABST T1 - Problems in application of latent-trait models to tailored testing (Research Report 79-1) Y1 - 1979 A1 - Koch, W. J. A1 - Reckase, M. D. CY - Columbia MO: University of Missouri, Department of Psychology", (also presented at National Council on Measurement in Education, 1979: ERIC No. ED 177 196) note = " ER - TY - BOOK T1 - The Rasch model in computerized personality testing Y1 - 1979 A1 - Kunce, C. S. CY - Ph.D. dissertation, University of Missouri, Columbia, 1979 ER - TY - CONF T1 - Student reaction to computerized adaptive testing in the classroom T2 - Paper presented at the 87th annual meeting of the American Psychological Association Y1 - 1979 A1 - Johnson, M. J. JF - Paper presented at the 87th annual meeting of the American Psychological Association CY - New York N1 - #JO79-01 ER - TY - CONF T1 - An adaptive test designed for paper-and-pencil testing T2 - Presentation to the convention of the Western Psychological Association Y1 - 1978 A1 - J. R. McBride JF - Presentation to the convention of the Western Psychological Association CY - San Francisco, CA ER - TY - CHAP T1 - Applications of latent trait theory to criterion-referenced testing Y1 - 1978 A1 - J. R. McBride CY - D.J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis, MN: University of Minnesota. ER - TY - CHAP T1 - Applications of sequential testing procedures to performance testing Y1 - 1978 A1 - Epstein, K. I. A1 - Knerr, C. S. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - JOUR T1 - Combining auditory and visual stimuli in the adaptive testing of speech discrimination JF - Journal of Speech and Hearing Disorders Y1 - 1978 A1 - Steele, J. A. A1 - Binnie, C. A. A1 - Cooper, W. A. VL - 43 ER - TY - BOOK T1 - A comparison of Bayesian and maximum likelihood scoring in a simulated stradaptive test Y1 - 1978 A1 - Maurelli, V. A. CY - Unpublished Masters thesis, St. Mary’s University of Texas, San Antonio TX ER - TY - ABST T1 - A comparison of the fairness of adaptive and conventional testing strategies (Research Report 78-1) Y1 - 1978 A1 - Pine, S. M. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - JOUR T1 - Computer-assisted tailored testing: Examinee reactions and evaluation JF - Educational and Psychological Measurement Y1 - 1978 A1 - Schmidt, F. L. A1 - Urry, V. W. A1 - Gugel, J. F. VL - 38 ER - TY - JOUR T1 - Computerized adaptive testing: Principles and directions JF - Computers and Education Y1 - 1978 A1 - Kreitzberg, C. B. VL - 2 (4) ER - TY - JOUR T1 - Computerized adaptive testing: Principles and directions JF - Computers and Education Y1 - 1978 A1 - Kreitzberg, C. B. A1 - Stocking, M., A1 - Swanson, L. VL - 2 ER - TY - ABST T1 - A construct validation of adaptive achievement testing (Research Report 78-4) Y1 - 1978 A1 - Bejar, I. I. A1 - Weiss, D. J. CY - Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Evaluations of implied orders as a basis for tailored testing using simulations (Technical Report No. 4) Y1 - 1978 A1 - Cliff, N. A. A1 - Cudeck, R. A1 - McCormick, D. CY - Los Angeles CA: University of Southern California, Department of Psychology. N1 - #CL77-04 ER - TY - CONF T1 - A generalization of sequential analysis to decision making with tailored testing T2 - Paper presented at the meeting of the Military Testing Association Y1 - 1978 A1 - Reckase, M. D. JF - Paper presented at the meeting of the Military Testing Association CY - Oklahoma City OK ER - TY - ABST T1 - Implied orders as a basis for tailored testing (Technical Report No. 6) Y1 - 1978 A1 - Cliff, N. A. A1 - Cudeck, R. A1 - McCormick, D. CY - Los Angeles CA: University of Southern California, Department of Psychology. N1 - #CL78-06 ER - TY - ABST T1 - A live tailored testing comparison study of the one- and three-parameter logistic models (Research Report 78-1) Y1 - 1978 A1 - Koch, W. J. A1 - Reckase, M. D. CY - Columbia MO: University of Missouri, Department of Psychology ER - TY - Generic T1 - A model for testing with multidimensional items T2 - Proceedings of the 1977 Computerized Adaptive Testing Conference Y1 - 1978 A1 - Sympson, J. B. JF - Proceedings of the 1977 Computerized Adaptive Testing Conference PB - University of Minnesota, Department of Psychology, Psychometrics Methods Program CY - Minneapolis, MN. USA ER - TY - CHAP T1 - Panel discussion: Future directions for computerized adaptive testing Y1 - 1978 A1 - Lord, F. M., CY - D. J. Weiss (Ed.), Proceedings of the 1977 Item Response Theory and Computerized adaptive conference. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. ER - TY - JOUR T1 - Predictive ability of a branching test JF - Educational and Psychological Measurement Y1 - 1978 A1 - Brooks, S. A1 - Hartz, M. A. VL - 38 ER - TY - BOOK T1 - Proceedings of the 1977 Computerized Adaptive Testing Conference Y1 - 1978 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - JOUR T1 - The stratified adaptive ability test as a tool for personnel selection and placement JF - TIMS Studies in the Management Sciences Y1 - 1978 A1 - Vale, C. D. A1 - Weiss, D. J. VL - 8 ER - TY - JOUR T1 - A stratified adaptive test of verbal ability JF - Japanese Journal of Educational Psychology Y1 - 1978 A1 - Shiba, S. A1 - Noguchi, H. A1 - Haebra, T. VL - 26 ER - TY - CHAP T1 - Adaptive Branching in a Multi-Content Achievement Test Y1 - 1977 A1 - Pennell, R. J. A1 - Harris, D. A. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - CHAP T1 - An adaptive test of arithmetic reasoning Y1 - 1977 A1 - J. R. McBride CY - the Proceedings of the Nineteenth Military Testing Association conference, San Antonio, TX. ER - TY - CHAP T1 - Adaptive testing and the problem of classification Y1 - 1977 A1 - Vale, C. D. CY - D. Weiss (Ed.), Applications of computerized adaptive testing (Research Report 77-1). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - 28 MB} ER - TY - CHAP T1 - Adaptive Testing Applied to Hierarchically Structured Objectives-Based Programs Y1 - 1977 A1 - Hambleton, R. K. A1 - Eignor, D. R. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - An adaptive testing strategy for achievement test batteries (Research Rep No 77-6) Y1 - 1977 A1 - Brown, J. M A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 2.40 MB} ER - TY - JOUR T1 - Application of tailored testing to achievement measurement JF - Behavior Research Methods and Instrumentation Y1 - 1977 A1 - English, R. A. A1 - Reckase, M. D. A1 - Patience, W. M. VL - 9 ER - TY - BOOK T1 - An application of the Rasch one-parameter logistic model to individual intelligence testing in a tailored testing environment Y1 - 1977 A1 - Ireland, C. M. CY - Dissertation Abstracts International, 37 (9-A), 5766 ER - TY - CHAP T1 - Applications of adaptive testing in measuring achievement and performance Y1 - 1977 A1 - Bejar, I. I. CY - D. J. Weiss (Ed.), Applications of computerized adaptive testing (Research Report 77-1). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - 28 MB} ER - TY - ABST T1 - Applications of computerized adaptive testing (Research Report 77-1) Y1 - 1977 A1 - Weiss, D. J. CY - Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - {PDF file, 3.228 KB} ER - TY - Generic T1 - Applications of sequential testing procedures to performance testing T2 - 1977 Computerized Adaptive Testing Conference Y1 - 1977 A1 - Epstein, K. I. A1 - Knerr, C. S. JF - 1977 Computerized Adaptive Testing Conference PB - University of Minnesota CY - Minneapolis, MN. USA ER - TY - JOUR T1 - Bayesian tailored testing and the influence of item bank characteristics JF - Applied Psychological Measurement Y1 - 1977 A1 - Jensema, C J VL - 1 ER - TY - JOUR T1 - Bayesian Tailored Testing and the Influence of Item Bank Characteristics JF - Applied Psychological Measurement Y1 - 1977 A1 - Jensema, C J VL - 1 IS - 1 ER - TY - CHAP T1 - A brief overview of adaptive testing Y1 - 1977 A1 - J. R. McBride CY - D. J. Weiss (Ed.), Applications of computerized testing (Research Report 77-1). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - 28 MB} ER - TY - JOUR T1 - A Broad-Range Tailored Test of Verbal Ability JF - Applied Psychological Measurement Y1 - 1977 A1 - Lord, F M VL - 1 IS - 1 ER - TY - JOUR T1 - A broad-range tailored test of verbal ability JF - Applied Psychological Measurement Y1 - 1977 A1 - Lord, F. M., VL - 1 ER - TY - ABST T1 - Calibration of an item pool for the adaptive measurement of achievement (Research Report 77-5) Y1 - 1977 A1 - Bejar, I. I. A1 - Weiss, D. J. A1 - Kingsbury, G. G. CY - Minneapolis: Department of Psychology, Psychometric Methods Program ER - TY - CHAP T1 - A comparison of conventional and adaptive achievement testing Y1 - 1977 A1 - Bejar, I. I. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - BOOK T1 - A comparison of the classification of students by two methods of administration of a mathematics placement test Y1 - 1977 A1 - Brooks, S. CY - Unpublished doctoral dissertation, Syracuse University, 1977 ER - TY - BOOK T1 - A computer adaptive approach to the measurement of personality variables Y1 - 1977 A1 - Sapinkopf, R. C. CY - Unpublished doctoral dissertation, University of Maryland, Baltimore ER - TY - JOUR T1 - A computer simulation study of tailored testing strategies for objective-based instructional programs JF - Educational and Psychological Measurement Y1 - 1977 A1 - Spineti, J. P. A1 - Hambleton, R. K. AB - One possible way of reducing the amount of time spent testing in . objective-based instructional programs would involve the implementation of a tailored testing strategy. Our purpose was to provide some additional data on the effectiveness of various tailored testing strategies for different testing situations. The three factors of a tailored testing strategy under study with various hypothetical distributions of abilities across two learning hierarchies were test length, mastery cutting score, and starting point. Overall, our simulation results indicate that it is possible to obtain a reduction of more than 50% in testing time without any loss in decision-making accuracy, when compared to a conventional testing procedure, by implementing a tailored testing strategy. In addition, our study of starting points revealed that it was generally best to begin testing in the middle of the learning hierarchy. Finally we observed a 40% reduction in errors of classification as the number of items for testing each objective was increased from one to five. VL - 37 ER - TY - ABST T1 - Computer-assisted tailored testing: Examinee reactions and evaluation (PB-276 748) Y1 - 1977 A1 - Schmidt, F. L. A1 - Urry, V. W. A1 - Gugel, J. F. CY - Washington DC: U. S. Civil Service Commission, Personnel Research and Development Center. N1 - #SC77-01 ER - TY - CHAP T1 - Computerized Adaptive Testing and Personnel Accessioning System Design Y1 - 1977 A1 - Underwood, M. A. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - CHAP T1 - Computerized Adaptive Testing research and development Y1 - 1977 A1 - J. R. McBride CY - H. Taylor, Proceedings of the Second Training and Personnel Technology Conference. Washington, DC: Office of the Director of Defense Research and Engineering. ER - TY - CHAP T1 - Computerized Adaptive Testing with a Military Population Y1 - 1977 A1 - Gorman, S. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolls MN: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - JOUR T1 - Description of components in tailored testing JF - Behavior Research Methods and Instrumentation Y1 - 1977 A1 - Patience, W. M. VL - 9 ER - TY - JOUR T1 - Effects of Immediate Knowledge of Results and Adaptive Testing on Ability Test Performance JF - Applied Psychological Measurement Y1 - 1977 A1 - Betz, N. E. VL - 1 IS - 2 ER - TY - JOUR T1 - Effects of immediate knowledge of results and adaptive testing on ability test performance JF - Applied Psychological Measurement Y1 - 1977 A1 - Betz, N. E. VL - 2 ER - TY - CHAP T1 - Effects of Knowledge of Results and Varying Proportion Correct on Ability Test Performance and Psychological Variables Y1 - 1977 A1 - Prestwood, J. S. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - CHAP T1 - An empirical evaluation of implied orders as a basis for tailored testing Y1 - 1977 A1 - Cliff, N. A. A1 - Cudeck, R. A1 - McCormick, D. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - JOUR T1 - An Empirical Investigation of the Stratified Adaptive Computerized Testing Model JF - Applied Psychological Measurement Y1 - 1977 A1 - B. K. Waters VL - 1 IS - 1 ER - TY - CHAP T1 - Estimation of latent trait status in adaptive testing Y1 - 1977 A1 - Sympson, J. B. CY - D. J. Weiss (Ed.), Applications of computerized testing (Research Report 77-1). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - 28 MB} ER - TY - ABST T1 - Flexilevel adaptive testing paradigm: Validation in technical training Y1 - 1977 A1 - Hansen, D. N. A1 - Ross, S. A1 - Harris, D. A. CY - AFHRL Technical Report 77-35 (I) ER - TY - ABST T1 - Flexilevel adaptive training paradigm: Hierarchical concept structures Y1 - 1977 A1 - Hansen, D. N. A1 - Ross, S. A1 - Harris, D. A. CY - AFHRL Technical Report 77-35 (II) ER - TY - CONF T1 - Four realizations of pyramidal adaptive testing strategies T2 - Paper presented at the Third International Symposium on Educational Testing Y1 - 1977 A1 - Hornke, L. F. JF - Paper presented at the Third International Symposium on Educational Testing CY - University of Leiden, The Netherlands N1 - #HO77-01 ER - TY - CONF T1 - Group tailored tests and some problems of their utlization T2 - Third international Symposium on Educational testing Y1 - 1977 A1 - Lewy, A A1 - Doron, R JF - Third international Symposium on Educational testing CY - Leyden, The Netherlands ER - TY - CHAP T1 - Implementation of a Model Adaptive Testing System at an Armed Forces Entrance and Examination Station Y1 - 1977 A1 - Ree, M. J. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - CHAP T1 - Implementation of Tailored Testing at the Civil Service Commission Y1 - 1977 A1 - McKillip, R. H. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - An information comparison of conventional and adaptive tests in the measurement of classroom achievement (Research Report 77-7) Y1 - 1977 A1 - Bejar, I. I. A1 - Weiss, D. J. A1 - Gialluca, K. A. CY - Minneapolis: Department of Psychology, Psychometric Methods Program ER - TY - CHAP T1 - A Low-Cost Terminal Usable for Computerized Adaptive Testing Y1 - 1977 A1 - Lamos, J. P. A1 - B. K. Waters CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - CHAP T1 - A model for testing with multidimensional items Y1 - 1977 A1 - Sympson, J. B. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - CHAP T1 - Multi-Content Adaptive Measurement of Achievement Y1 - 1977 A1 - Weiss, D. J. A1 - Brown, J. M CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - CHAP T1 - A Multivariate Model Sampling Procedure and a Method of Multidimensional Tailored Testing Y1 - 1977 A1 - Urry, V. W. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - CHAP T1 - Operational Considerations in Implementing Tailored Testing Y1 - 1977 A1 - Segal, H. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - JOUR T1 - Procedures for computerized testing JF - Behavior Research Methods and Instrumentation Y1 - 1977 A1 - Reckase, M. D. VL - 70 ER - TY - ABST T1 - A rapid item search procedure for Bayesian adaptive testing (Research Report 77-4) Y1 - 1977 A1 - Vale, C. D. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - CONF T1 - Real-data simulation of a proposal for tailored teting T2 - Third International Conference on Educational Testing Y1 - 1977 A1 - Killcross, M. C. JF - Third International Conference on Educational Testing CY - Leyden, The Netherlands ER - TY - CHAP T1 - Reduction of Test Bias by Adaptive Testing Y1 - 1977 A1 - Pine, S. M. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - JOUR T1 - Some Properties of a Bayesian Adaptive Ability Testing Strategy JF - Applied Psychological Measurement Y1 - 1977 A1 - J. R. McBride VL - 1 IS - 1 ER - TY - JOUR T1 - Some properties of a Bayesian adaptive ability testing strategy JF - Applied Psychological Measurement Y1 - 1977 A1 - J. R. McBride VL - 1 ER - TY - CHAP T1 - Student attitudes toward tailored testing Y1 - 1977 A1 - Koch, W. R. A1 - Patience, W. M. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - JOUR T1 - TAILOR: A FORTRAN procedure for interactive tailored testing JF - Educational and Psychological Measurement Y1 - 1977 A1 - Cudeck, R. A. A1 - Cliff, N. A. A1 - Kehoe, J. VL - 37 ER - TY - JOUR T1 - TAILOR-APL: An interactive computer program for individual tailored testing JF - Educational and Psychological Measurement Y1 - 1977 A1 - McCormick, D. A1 - Cliff, N. A. VL - 37 ER - TY - ABST T1 - Tailored testing: A spectacular success for latent trait theory (TS 77-2) Y1 - 1977 A1 - Urry, V. W. CY - Washington DC: U. S. Civil Service Commission, Personnel Research and Development Center ER - TY - JOUR T1 - Tailored testing: A successful application of latent trait theory JF - Journal of Educational Measurement Y1 - 1977 A1 - Urry, V. W. VL - 14 ER - TY - JOUR T1 - A theory of consistency ordering generalizable to tailored testing JF - Psychometrika Y1 - 1977 A1 - Cliff, N. A. ER - TY - ABST T1 - A two-stage testing procedure (Memorandum 403-77) Y1 - 1977 A1 - de Gruijter, D. N. M. CY - University of Leyden, The Netherlands, Educational Research Center ER - TY - JOUR T1 - A Use of the Information Function in Tailored Testing JF - Applied Psychological Measurement Y1 - 1977 A1 - Samejima, F. VL - 1 IS - 2 ER - TY - ABST T1 - Adaptive mental testing: The state of the art (Technical Report 423) Y1 - 1976 A1 - J. R. McBride CY - Washington DC: U.S. Army Research Institute for the Social and Behavioral Sciences. N1 - . ER - TY - JOUR T1 - Adaptive testing: A Bayesian procedure for the efficient measurement of ability JF - Programmed Learning and Educational Technology Y1 - 1976 A1 - Wood, R. VL - 13(2) ER - TY - CHAP T1 - Adaptive testing research at Minnesota: Overview, recent results, and future directions Y1 - 1976 A1 - Weiss, D. J. CY - C. L. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 24-35). Washington DC: United States Civil Service Commission. N1 - {PDF file, 768 KB} ER - TY - CHAP T1 - Adaptive testing research at Minnesota: Some properties of a Bayesian sequential adaptive mental testing strategy Y1 - 1976 A1 - J. R. McBride CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 36-53). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 960 KB} ER - TY - CHAP T1 - Bandwidth, fidelity, and adaptive tests Y1 - 1976 A1 - J. R. McBride CY - T. J. McConnell, Jr. (Ed.), CAT/C 2 1975: The second conference on computer-assisted test construction. Atlanta GA: Atlanta Public Schools. N1 - PDF file, 783 K ER - TY - CHAP T1 - Bayesian tailored testing and the influence of item bank characteristics Y1 - 1976 A1 - Jensema, C J CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 82-89). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 370 KB} ER - TY - CHAP T1 - A broad range tailored test of verbal ability Y1 - 1976 A1 - Lord, F. M., CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 75-78). Washington DC: U.S. Government Printing Office. N1 - #LO75-01 {PDF file, 250 KB} ER - TY - CHAP T1 - Computer-assisted testing: An orderly transition from theory to practice Y1 - 1976 A1 - McKillip, R. H. A1 - Urry, V. W. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 95-96). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 191 KB} ER - TY - ABST T1 - Computer-assisted testing with live examinees: A rendezvous with reality (TN 75-3) Y1 - 1976 A1 - Urry, V. W. CY - Washington DC: U. S. Civil Service Commission, Personnel Research and Development Center ER - TY - CHAP T1 - Discussion Y1 - 1976 A1 - Lord, F. M., CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 113-117). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 318 KB} ER - TY - CHAP T1 - Discussion Y1 - 1976 A1 - Green, B. F. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. pp. 118-119). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 347 KB} ER - TY - CONF T1 - The effect of item pool characteristics on the operation of a tailored testing procedure T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1976 A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the Psychometric Society CY - Murray Hill NJ ER - TY - CHAP T1 - Effectiveness of the ancillary estimation procedure Y1 - 1976 A1 - Gugel, J. F. A1 - Schmidt, F. L. A1 - Urry, V. W. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 103-106). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 252 KB} ER - TY - ABST T1 - Effects of immediate knowledge of results and adaptive testing on ability test performance (Research Report 76-3) Y1 - 1976 A1 - Betz, N. E. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Elements of a basic test theory generalizable to tailored testing Y1 - 1976 A1 - Cliff, N. A. CY - Unpublished manuscript ER - TY - CHAP T1 - An empirical investigation of Weiss' stradaptive testing model Y1 - 1976 A1 - B. K. Waters CY - C. L. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 54-63.). Washington DC: U. S. Civil Service Commission. N1 - #WA75-01 {PDF file, 576 KB} ER - TY - THES T1 - An exploratory studyof the efficiency of the flexilevel testing procedure Y1 - 1976 A1 - Seguin, S. P. PB - University of Toronto CY - Toronto, Canada VL - Doctoral ER - TY - CHAP T1 - A five-year quest: Is computerized adaptive testing feasible? Y1 - 1976 A1 - Urry, V. W. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 97-102). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 453 KB} ER - TY - CHAP T1 - The graded response model of latent trait theory and tailored testing Y1 - 1976 A1 - Samejima, F. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 5-17). Washington DC: U.S. Government Printing Office. ER - TY - JOUR T1 - Hardware and software evolution of an adaptive ability measurement system JF - Behavior Research Methods and Instrumentation Y1 - 1976 A1 - DeWitt, L. J. A1 - Weiss, D. J. VL - 8 ER - TY - BOOK T1 - Incomplete orders and computerized testing Y1 - 1976 A1 - Cliff, N. A. CY - In C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 18-23). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 373 KB} ER - TY - CHAP T1 - Item parameterization procedures for the future Y1 - 1976 A1 - Schmidt, F. L. A1 - Urry, V. W. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 107-112.). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 331 KB} ER - TY - ABST T1 - Monte carlo results from a computer program for tailored testing (Technical Report No. 2) Y1 - 1976 A1 - Cudeck, R. A. A1 - Cliff, N. A. A1 - Reynolds, T. J. A1 - McCormick, D. J. CY - Los Angeles CA: University of California, Department of Psychology. N1 - #CU76-02 ER - TY - CHAP T1 - Opening remarks Y1 - 1976 A1 - Gorham, W. A. CY - W. H. Gorham (Chair), Computers and testing: Steps toward the inevitable conquest (PS 76-1). Symposium presented at the 83rd annual convention of the APA, Chicago IL. Washington DC: U.S. Civil Service Commission, Personnel Research and Developement Center ER - TY - CONF T1 - Procedures for computerized testing T2 - Paper presented at the sixth annual meeting of the National Conference on the Use of On-Line Computers in Psychology Y1 - 1976 A1 - Reckase, M. D. JF - Paper presented at the sixth annual meeting of the National Conference on the Use of On-Line Computers in Psychology CY - St. Louis MO N1 - #RE76-01 ER - TY - BOOK T1 - Proceedings of the first conference on computerized adaptive testing Y1 - 1976 A1 - Clark, C. K. CY - Washington DC: U.S. Government Printing Office N1 - {Complete document: PDF file, 7.494 MB; Table of contents and separate papers} ER - TY - ABST T1 - Psychological effects of immediate knowledge of results and adaptive ability testing (Research Report 76-4) Y1 - 1976 A1 - Betz, N. E. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory ER - TY - CHAP T1 - Reflections on adaptive testing Y1 - 1976 A1 - Hansen, D. N. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 90-94). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 464 KB} ER - TY - ABST T1 - Research on adaptive testing 1973-1976: A review of the literature Y1 - 1976 A1 - J. R. McBride CY - Unpublished manuscript, University of Minnesota ER - TY - ABST T1 - A review of research in tailored testing (Report APRE No Y1 - 1976 A1 - Killcross, M. C. CY - 9/76, Farnborough, Hants, U. K.: Ministry of Defence, Army Personnel Research Establishment.) ER - TY - BOOK T1 - Simulation studies of adaptive testing: A comparative evaluation Y1 - 1976 A1 - J. R. McBride CY - Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN ER - TY - CHAP T1 - Some likelihood functions found in tailored testing Y1 - 1976 A1 - Lord, F. M., CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 79-81). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 166 KB} ER - TY - ABST T1 - Some properties of a Bayesian adaptive ability testing strategy (Research Report 76-1) Y1 - 1976 A1 - J. R. McBride A1 - Weiss, D. J. CY - Minneapolis MN: Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Test theory and the public interest Y1 - 1976 A1 - Lord, F. M., CY - Proceedings of the Educational Testing Service Invitational Conference ER - TY - CHAP T1 - Using computerized tests to add new dimensions to the measurement of abilities which are important for on-job performance: An exploratory study Y1 - 1976 A1 - Cory, C. H. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 64-74). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 632 KB} ER - TY - ABST T1 - A basic test theory generalizable to tailored testing (Technical Report No 1) Y1 - 1975 A1 - Cliff, N. A. CY - Los Angeles CA: University of Southern California, Department of Psychology. ER - TY - JOUR T1 - A Bayesian sequential procedure for quantal response in the context of adaptive mental testing JF - Journal of the American Statistical Association Y1 - 1975 A1 - Owen, R. J. VL - 70 ER - TY - CONF T1 - Behavior of the maximum likelihood estimate in a simulated tailored testing situation T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1975 A1 - Samejima, F. JF - Paper presented at the annual meeting of the Psychometric Society CY - Iowa City N1 - {PDF file, 698 KB} ER - TY - ABST T1 - Best test design and self-tailored testing (Research Memorandum No 19) Y1 - 1975 A1 - Wright, B. D. A1 - Douglas, G. A. CY - Chicago: University of Chicago, Department of Education, Statistical Laboratory. ER - TY - ABST T1 - A broad range test of verbal ability (RB-75-5) Y1 - 1975 A1 - Lord, F. M., CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Complete orders from incomplete data: Interactive ordering and tailored testing JF - Psychological Bulletin Y1 - 1975 A1 - Cliff, N. A. VL - 82 ER - TY - JOUR T1 - Computerized adaptive ability measurement JF - Naval Research Reviews Y1 - 1975 A1 - Weiss, D. J. VL - 28 ER - TY - CHAP T1 - Computerized adaptive trait measurement: Problems and prospects (Research Report 75-5) Y1 - 1975 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - CHAP T1 - Discussion Y1 - 1975 A1 - Bock, R. D. CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 46-49. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 414 KB} ER - TY - CHAP T1 - Discussion Y1 - 1975 A1 - Linn, R. L. CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 44-46. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 414 KB} ER - TY - CONF T1 - The effect of item choice on ability estimation when using a simple logistic tailored testing model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1975 A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Washington, D.C. ER - TY - ABST T1 - Empirical and simulation studies of flexilevel ability testing (Research Report 75-3) Y1 - 1975 A1 - Betz, N. E. A1 - Weiss, D. J. CY - Minneapolis: Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - An empirical comparison of two-stage and pyramidal ability testing (Research Report 75-1) Y1 - 1975 A1 - Larkin, K. C. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - CHAP T1 - Evaluating the results of computerized adaptive testing Y1 - 1975 A1 - Sympson, J. B. CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 26-31. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 446 KB} ER - TY - CHAP T1 - New types of information and psychological implications Y1 - 1975 A1 - Betz, N. E. CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 32-43. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 609 KB} ER - TY - CHAP T1 - Scoring adaptive tests Y1 - 1975 A1 - J. R. McBride CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 17-25. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 442 KB} ER - TY - JOUR T1 - Sequential testing for instructional classification JF - Journal of Computer-Based Instruction. Y1 - 1975 A1 - Thomas, D. B. VL - 1 ER - TY - ABST T1 - A simulation study of stradaptive ability testing (Research Report 75-6) Y1 - 1975 A1 - Vale, C. D. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - CHAP T1 - Strategies of branching through an item pool Y1 - 1975 A1 - Vale, C. D. CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 1-16. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - #VA75-01 {PDF file, 600 KB} ER - TY - ABST T1 - A study of computer-administered stradaptive ability testing (Research Report 75-4) Y1 - 1975 A1 - Vale, C. D. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - CONF T1 - Tailored testing: Maximizing validity and utility for job selection T2 - Paper presented at the 86th Annual Convention of the American Psychological Association. Toronto Y1 - 1975 A1 - Croll, P. R. A1 - Urry, V. W. JF - Paper presented at the 86th Annual Convention of the American Psychological Association. Toronto CY - Canada ER - TY - JOUR T1 - An application of latent trait mental test theory JF - British Journal of Mathematical and Statistical Psychology Y1 - 1974 A1 - Jensema, C J VL - 27 N1 - #JE74029 ER - TY - CONF T1 - An application of the Rasch simple logistic model to tailored testing T2 - Paper presented at the annual meeting of the American Educational Research Association. Y1 - 1974 A1 - Reckase, M. D. JF - Paper presented at the annual meeting of the American Educational Research Association. CY - St. Loius MO ER - TY - CONF T1 - A Bayesian approach in sequential testing T2 - American Educational Research Association Y1 - 1974 A1 - Hsu, T. A1 - Pingel, K. JF - American Educational Research Association CY - Chicago IL ER - TY - BOOK T1 - The comparison of two tailored testing models and the effects of the models variables on actual loss Y1 - 1974 A1 - Kalisch, S. J. CY - Unpublished doctoral dissertation, Florida State University ER - TY - ABST T1 - A computer software system for adaptive ability measurement (Research Report 74-1) Y1 - 1974 A1 - De Witt, J. J. A1 - Weiss, D. J. CY - Minneapolis MN: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Computer-assisted testing: The calibration and evaluation of the verbal ability bank (Technical Study 74-3) Y1 - 1974 A1 - Urry, V. W. CY - Washington DC: U. S. Civil Service Commission, Personnel Research and Development Center ER - TY - ABST T1 - Computer-based adaptive testing models for the Air Force technical training environment: Phase I: Development of a computerized measurement system for Air Force technical Training Y1 - 1974 A1 - Hansen, D. N. A1 - Johnson, B. F. A1 - Fagan, R. L. A1 - Tan, P. A1 - Dick, W. CY - JSAS Catalogue of Selected Documents in Psychology, 5, 1-86 (MS No. 882). AFHRL Technical Report 74-48. ER - TY - ABST T1 - Development of a programmed testing system (Technical Paper 259) Y1 - 1974 A1 - Bayroff, A. G. A1 - Ross, R. M A1 - Fischl, M. A CY - Arlington VA: US Army Research Institute for the Behavioral and Social Sciences. NTIS No. AD A001534) ER - TY - ABST T1 - An empirical investigation of computer-administered pyramidal ability testing (Research Report 74-3) Y1 - 1974 A1 - Larkin, K. C. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program ER - TY - CONF T1 - An empirical investigation of the stability and accuracy of flexilevel tests T2 - Annual meeting of the National Council on Measurement in Education Y1 - 1974 A1 - Kocher, A.T. JF - Annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - BOOK T1 - An empirical investigation of the stratified adaptive computerized testing model for the measurement of human ability Y1 - 1974 A1 - B. K. Waters CY - Unpublished Ph.D. dissertation, Florida State University N1 - #WA74-01 ER - TY - BOOK T1 - An evaluation of the self-scoring flexilevel testing model Y1 - 1974 A1 - Olivier, P. CY - Unpublished dissertation, Florida State University. Dissertation Abstracts International, 35 (7-A), 4257 ER - TY - THES T1 - An evaluation of the self-scoring flexilevel testing model Y1 - 1974 A1 - Olivier, P. PB - Florida State University ER - TY - CHAP T1 - Individualized testing and item characteristic curve theory Y1 - 1974 A1 - Lord, F. M., CY - D. H. Krantz, R. C. Atkinson, R. D. Luce, and P. Suppes (Eds.), Contemporary developments in mathematical psychology (Vol. II). San Francisco: Freeman. ER - TY - JOUR T1 - An interactive computer program for tailored testing based on the one-parameter logistic model JF - Behavior Research Methods and Instrumentation Y1 - 1974 A1 - Reckase, M. D. VL - 6 ER - TY - ABST T1 - Practical methods for redesigning a homogeneous test, also for designing a multilevel test (RB-74-30) Y1 - 1974 A1 - Lord, F. M., CY - Princeton NJ: Educational Testing Service ER - TY - CHAP T1 - Recent and projected developments in ability testing by computer Y1 - 1974 A1 - J. R. McBride A1 - Weiss, D. J. CY - Earl Jones (Ed.), Symposium Proceedings: Occupational Research and the Navy–Prospectus 1980 (TR-74-14). San Diego, CA: Navy Personnel Research and Development Center. ER - TY - ABST T1 - Simulation studies of two-stage ability testing (Research Report 74-4) Y1 - 1974 A1 - Betz, N. E. A1 - Weiss, D. J. CY - Minneapolis: Department of Psychology, Psychometric Methods Program N1 - {PDF file, 2.92 MB} ER - TY - ABST T1 - Strategies of adaptive ability measurement (Research Report 74-5) Y1 - 1974 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 5.555 MB} ER - TY - JOUR T1 - A tailored testing model employing the beta distribution and conditional difficulties JF - Journal of Computer-Based Instruction Y1 - 1974 A1 - Kalisch, S. J. VL - 1 ER - TY - ABST T1 - A tailored testing model employing the beta distribution (unpublished manuscript) Y1 - 1974 A1 - Kalisch, S. J. CY - Florida State University, Educational Evaluation and Research Design Program ER - TY - CONF T1 - A tailored testing system for selection and allocation in the British Army T2 - Paper presented at the 18th International Congress of Applied Psychology Y1 - 1974 A1 - Killcross, M. C. JF - Paper presented at the 18th International Congress of Applied Psychology CY - Montreal Canada ER - TY - JOUR T1 - Testing and decision-making procedures for selected individualized instruction programs JF - Review of Educational Research Y1 - 1974 A1 - Hambleton, R. K. VL - 10 ER - TY - JOUR T1 - The validity of Bayesian tailored testing JF - Educational and Psychological Measurement Y1 - 1974 A1 - Jensema, C J VL - 34 ER - TY - ABST T1 - A word knowledge item pool for adaptive ability measurement (Research Report 74-2) Y1 - 1974 A1 - J. R. McBride A1 - Weiss, D. J. CY - Minneapolis MN: Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Ability measurement: Conventional or adaptive? (Research Report 73-1) Y1 - 1973 A1 - Weiss, D. J. A1 - Betz, N. E. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 4.98 MB}. ER - TY - CHAP T1 - Computer-based psychological testing Y1 - 1973 A1 - Jones, D. A1 - Weinman, J. CY - A. Elithorn and D. Jones (Eds.), Artificial and human thinking (pp. 83-93). San Francisco CA: Jossey-Bass. ER - TY - ABST T1 - An empirical study of computer-administered two-stage ability testing (Research Report 73-4) Y1 - 1973 A1 - Betz, N. E. A1 - Weiss, D. J. CY - Minneapolis: Department of Psychology, Psychometric Methods Program ER - TY - ABST T1 - Implementation of a Bayesian system for decision analysis in a program of individually prescribed instruction (Research Report No 60) Y1 - 1973 A1 - Ferguson, R. L. A1 - Novick, M. R. CY - Iowa City IA: American College Testing Program N1 - #FE73-01 ER - TY - CONF T1 - An interactive computer program for tailored testing based on the one-parameter logistic model T2 - Paper presented at the National Conference on the Us of On-Line Computers in Psychology Y1 - 1973 A1 - Reckase, M. D. JF - Paper presented at the National Conference on the Us of On-Line Computers in Psychology CY - St. Louis MO ER - TY - BOOK T1 - A multivariate experimental study of three computerized adaptive testing models for the measurement of attitude toward teaching effectiveness Y1 - 1973 A1 - Tam, P. T.-K. CY - Unpublished doctoral dissertation, Florida State University ER - TY - ABST T1 - An overview of tailored testing (unpublished manuscript) Y1 - 1973 A1 - Olivier, P. CY - Florida State University, Program of Educational Evaluation and Research Design ER - TY - CONF T1 - The potential use of tailored testing for allocation to army employments T2 - NATO Conference on Utilisation of Human Resources Y1 - 1973 A1 - Killcross, M. C. A1 - Cassie, A JF - NATO Conference on Utilisation of Human Resources CY - Lisbon, Portugal ER - TY - JOUR T1 - Response-contingent testing JF - Review of Educational Research Y1 - 1973 A1 - Wood, R. L. VL - 43 ER - TY - ABST T1 - A review of testing and decision-making procedures (Technical Bulletin No. 15 Y1 - 1973 A1 - Hambleton, R. K. CY - Iowa City IA: American College Testing Program. ER - TY - ABST T1 - The stratified adaptive computerized ability test (Research Report 73-3) Y1 - 1973 A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 2.498 MB} ER - TY - JOUR T1 - A tailored testing model employing the beta distribution and conditional difficulties JF - Journal of Computer-Based Instruction Y1 - 1973 A1 - Kalisch, S. J. VL - 1 ER - TY - ABST T1 - An application of latent trait mental test theory to the Washington Pre-College Testing Battery Y1 - 1972 A1 - Jensema, C J CY - Unpublished doctoral dissertation, University of Washington N1 - #JE72-01 ER - TY - ABST T1 - Fully adaptive sequential testing: A Bayesian procedure for efficient ability measurement Y1 - 1972 A1 - Wood, R. L. CY - Unpublished manuscript, University of Chicago ER - TY - JOUR T1 - Individual intelligence testing without the examiner: reliability of an automated method JF - Journal of Consulting and Clinical Psychology Y1 - 1972 A1 - Elwood, D. L. A1 - Griffin, H.R. VL - 38 ER - TY - ABST T1 - Individualized testing and item characteristic curve theory (RB-72-50) Y1 - 1972 A1 - Lord, F. M., CY - Princeton NJ: Educational Testing Service ER - TY - BOOK T1 - A modification to Lord’s model for tailored tests Y1 - 1972 A1 - Mussio, J. J. CY - Unpublished doctoral dissertation, University of Toronto ER - TY - JOUR T1 - Sequential testing for dichotomous decisions. JF - Educational and Psychological Measurement Y1 - 1972 A1 - Linn, R. L. A1 - Rock, D. A. A1 - Cleary, T. A. KW - CCAT KW - CLASSIFICATION Computerized Adaptive Testing KW - sequential probability ratio testing KW - SPRT VL - 32 ER - TY - ABST T1 - The application of item generators for individualizing mathematics testing and instruction (Report 1971/14) Y1 - 1971 A1 - Ferguson, R. L. A1 - Hsu, T. CY - Pittsburgh PA: University of Pittsburgh Learning Research and Development Center ER - TY - JOUR T1 - A comparison of computer-simulated conventional and branching tests JF - Educational and Psychological Measurement Y1 - 1971 A1 - Waters, C. J. A1 - Bayroff, A. G. VL - 31 ER - TY - ABST T1 - A comparison of four methods of selecting items for computer-assisted testing (Technical Bulletin STB 72-5) Y1 - 1971 A1 - Bryson, R. CY - San Diego: Naval Personnel and Training Research Laboratory ER - TY - ABST T1 - Computer assistance for individualizing measurement Y1 - 1971 A1 - Ferguson, R. L. CY - Pittsburgh PA: University of Pittsburgh R and D Center ER - TY - BOOK T1 - Computerized adaptive sequential testing Y1 - 1971 A1 - Wood, R. L. CY - Unpublished doctoral dissertation, University of Chicago ER - TY - ABST T1 - Individualized testing by Bayesian estimation Y1 - 1971 A1 - Urry, V. W. CY - Seattle: University of Washington, Bureau of Testing Project 0171-177 ER - TY - JOUR T1 - A model for computer-assisted criterion-referenced measurement JF - Education Y1 - 1971 A1 - Ferguson, R. L. VL - 81 ER - TY - JOUR T1 - Robbins-Monro procedures for tailored testing JF - Educational and Psychological Measurement Y1 - 1971 A1 - Lord, F. M., VL - 31 ER - TY - JOUR T1 - The self-scoring flexilevel test JF - Journal of Educational Measurement Y1 - 1971 A1 - Lord, F. M., VL - 8 ER - TY - ABST T1 - Tailored testing: An application of stochastic approximation (RM 71-2) Y1 - 1971 A1 - Lord, F. M., CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Tailored testing, an approximation of stochastic approximation JF - Journal of the American Statistical Association Y1 - 1971 A1 - Lord, F. M., VL - 66 ER - TY - JOUR T1 - A theoretical study of the measurement effectiveness of flexilevel tests JF - Educational and Psychological Measurement Y1 - 1971 A1 - Lord, F. M., VL - 31 ER - TY - JOUR T1 - A theoretical study of two-stage testing JF - Psychometrika Y1 - 1971 A1 - Lord, F. M., VL - 36 ER - TY - JOUR T1 - Adaptive testing of cognitive skills JF - Proceedings of the Annual Convention of the American Psychological Association Y1 - 1970 A1 - Wargo, M. J. VL - 5 (part 1) ER - TY - CHAP T1 - Comments on tailored testing Y1 - 1970 A1 - Green, B. F. CY - W. H. Holtzman, (Ed.), Computer-assisted instruction, testing, and guidance (pp. 184-197). New York: Harper and Row. ER - TY - JOUR T1 - Computer assistance for individualizing measurement JF - Computers and Automation Y1 - 1970 A1 - Ferguson, R. L. VL - March 1970 ER - TY - ABST T1 - Computer assistance for individualizing measurement Y1 - 1970 A1 - Ferguson, R. L. CY - Pittsburgh PA: University of Pittsburgh, Learning Research and Development Center ER - TY - CHAP T1 - Individually tailored testing: Discussion Y1 - 1970 A1 - Holtzman, W. H. CY - W. H. Holtzman, (Ed.), Computer-assisted instruction, testing, and guidance (pp.198-200). New York: Harper and Row. N1 - #HO70198 ER - TY - CONF T1 - A model for computer-assisted criterion-referenced measurement T2 - Paper presented at the annual meeting of the American Educational Research Association/National Council on Measurement in Education Y1 - 1970 A1 - Ferguson, R. L. JF - Paper presented at the annual meeting of the American Educational Research Association/National Council on Measurement in Education CY - Minneapolis MN ER - TY - ABST T1 - The self-scoring flexilevel test (RB-7043) Y1 - 1970 A1 - Lord, F. M., CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Sequential testing for dichotomous decisions. College Entrance Examination Board Research and Development Report (RDR 69-70, No 3", and Educational Testing Service RB-70-31) Y1 - 1970 A1 - Linn, R. L. A1 - Rock, D. A. A1 - Cleary, T. A. CY - Princeton NJ: Educational Testing Service. N1 - #LI70-31 ER - TY - CHAP T1 - Some test theory for tailored testing Y1 - 1970 A1 - Lord, F. M., CY - W. H. Holtzman (Ed.), Computer-assisted instruction, testing, and guidance (pp.139-183). New York: Harper and Row. N1 - #LO70139 ER - TY - JOUR T1 - Automation of psychological testing JF - American Psychologist Y1 - 1969 A1 - Elwood, D. L. VL - 24 ER - TY - ABST T1 - A Bayesian approach to tailored testing (Research Report 69-92) Y1 - 1969 A1 - Owen, R. J. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Bayesian methods in psychological testing (Research Bulletin RB-69-31) Y1 - 1969 A1 - Novick, M. R. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - Computer-assisted criterion-referenced measurement (Working Paper No 49) Y1 - 1969 A1 - Ferguson, R. L. CY - Pittsburgh PA: University of Pittsburgh, Learning and Research Development Center. (ERIC No. ED 037 089) ER - TY - JOUR T1 - The development and evaluation of several programmed testing methods JF - Educational and Psychological Measurement Y1 - 1969 A1 - Linn, R. L. A1 - Cleary, T. A. VL - 29 ER - TY - BOOK T1 - The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction Y1 - 1969 A1 - Ferguson, R. L. CY - Doctoral dissertation, University of Pittsburgh. Dissertation Abstracts International, 30-09A, 3856. (University Microfilms No. 70-4530). ER - TY - JOUR T1 - The efficacy of tailored testing JF - Educational Research Y1 - 1969 A1 - Wood, R. L. VL - 11 ER - TY - JOUR T1 - An exploratory study of programmed tests JF - Educational and Psychological Measurement Y1 - 1969 A1 - Cleary, T. A. A1 - Linn, R. L. A1 - Rock, D. A. VL - 28 ER - TY - CONF T1 - Individualized assessment of differential abilities T2 - Paper presented at the 77th annual meeting of the American Psychological Association. Y1 - 1969 A1 - Weiss, D. J. JF - Paper presented at the 77th annual meeting of the American Psychological Association. ER - TY - CHAP T1 - An investigation of computer-based science testing Y1 - 1969 A1 - Hansen, D. N. CY - R. C. Atkinson and H. A. Wilson (Eds.), Computer-assisted instruction: A book of readings. New York: Academic Press. ER - TY - CONF T1 - Psychometric problems with branching tests T2 - Paper presented at the annual meeting of the American Psychological Association. Y1 - 1969 A1 - Bayroff, A. G. JF - Paper presented at the annual meeting of the American Psychological Association. N1 - #BA69-01 ER - TY - ABST T1 - Short tailored tests (RB-69-63) Y1 - 1969 A1 - Stocking, M. L. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - Use of an on-line computer for psychological testing with the up-and-down method JF - American Psychologist Y1 - 1969 A1 - Kappauf, W. E. VL - 24 ER - TY - BOOK T1 - Computer-assisted testing (Eds.) Y1 - 1968 A1 - Harman, H. H. A1 - Helm, C. E. A1 - Loye, D. E. CY - Princeton NJ: Educational Testing Service ER - TY - ABST T1 - The development and evaluation of several programmed testing methods (Research Bulletin 68-5) Y1 - 1968 A1 - Linn, R. L. A1 - Rock, D. A. A1 - Cleary, T. A. CY - Princeton NJ: Educational Testing Service N1 - #LI68-05 ER - TY - ABST T1 - An investigation of computer-based science testing Y1 - 1968 A1 - Hansen, D. N. A1 - Schwarz, G. CY - Tallahassee: Institute of Human Learning, Florida State University ER - TY - ABST T1 - An investigation of computer-based science testing Y1 - 1968 A1 - Hansen, D. N. A1 - Schwarz, G. CY - Tallahassee FL: Florida State University N1 - #HA68-01 (See published version.) ER - TY - JOUR T1 - Methodological determination of the PEST (parameter estimation by sequential testing) procedure JF - Perception and psychophysics Y1 - 1968 A1 - Pollack, I VL - 3 ER - TY - JOUR T1 - Reproduction of total test score through the use of sequential programmed tests JF - Journal of Educational Measurement Y1 - 1968 A1 - Cleary, T. A. A1 - Linn, R. L. A1 - Rock, D. A. VL - 5 ER - TY - ABST T1 - An exploratory study of branching tests (Technical Research Note 188) Y1 - 1967 A1 - Bayroff, A. G. A1 - Seeley, L. C. CY - Washington DC: US Army Behavioral Science Research Laboratory. (NTIS No. AD 655263) ER - TY - CHAP T1 - New light on test strategy from decision theory Y1 - 1966 A1 - Cronbach, L. J. CY - A. Anastasi (Ed.). Testing problems in perspective. Washington DC: American Council on Education. ER - TY - CHAP T1 - Programmed testing in the examinations of the National Board of Medical Examiners Y1 - 1966 A1 - Hubbard, J. P. CY - A. Anastasi (Ed.), Testing problems in perspective. Washington DC: American Council on Education. ER - TY - JOUR T1 - Adaptive testing in an older population JF - Journal of Psychology Y1 - 1965 A1 - Greenwood, D. I. A1 - Taylor, C. VL - 60 ER - TY - JOUR T1 - Feasibility of a programmed testing machine Y1 - 1964 A1 - Bayroff, A. G. CY - US Army Personnel Research Office Research Study 64-3. ER - TY - ABST T1 - Preliminary evaluation of simulated branching tests Y1 - 1964 A1 - Waters, C. J. CY - U.S. Army Personnel Research Office Technical Research Note 140. ER - TY - BOOK T1 - An evaluation of the sequential method of testing Y1 - 1962 A1 - Paterson, J. J. CY - Unpublished doctoral dissertation, Michigan State University N1 - #PA62-1 University Microfilms Number 63-1748. ER - TY - ABST T1 - Exploratory study of a sequential item test Y1 - 1962 A1 - Seeley, L. C. A1 - Morton, M. A. A1 - Anderson, A. A. CY - U.S. Army Personnel Research Office, Technical Research Note 129. ER - TY - BOOK T1 - An analysis of the application of utility theory to the development of two-stage testing models Y1 - 1961 A1 - Rosenbach, J. H. CY - Unpublished doctoral dissertation, University of Buffalo ER - TY - ABST T1 - Construction of an experimental sequential item test (Research Memorandum 60-1) Y1 - 1960 A1 - Bayroff, A. G. A1 - Thomas, J. J A1 - Anderson, A. A. CY - Washington DC: Personnel Research Branch, Department of the Army ER - TY - ABST T1 - Progress report on the sequential item test Y1 - 1959 A1 - Krathwohl, D. CY - East Lansing MI: Michigan State University, Bureau of Educational Research ER - TY - ABST T1 - The multi-level experiment: A study of a two-level test system for the College Board Scholastic Aptitude Test Y1 - 1958 A1 - Angoff, W. H. Huddleston, E. M. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - The sequential item test JF - American Psychologist Y1 - 1956 A1 - Krathwohl, D. R. A1 - Huyser, R. J. VL - 2 ER - TY - JOUR T1 - An empirical study of the applicability of sequential analysis to item selection Y1 - 1953 A1 - Anastasi, A. VL - 13 ER - TY - JOUR T1 - Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis Y1 - 1950 A1 - Armitage, P. VL - 12 ER - TY - JOUR T1 - Some empirical aspects of the sequential analysis technique as applied to an achievement examination JF - Journal of Experimental Education Y1 - 1950 A1 - Moonan, W. J. VL - 18 ER - TY - JOUR T1 - A clinical study of consecutive and adaptive testing with the revised Stanford-Binet JF - Journal of Consulting Psychology Y1 - 1947 A1 - Hutt, M. L. VL - 11 ER - TY - JOUR T1 - An application of sequential sampling to testing students JF - Journal of the American Statistical Association Y1 - 1946 A1 - Cowden, D. J. VL - 41 ER - TY - BOOK T1 - A method of measuring the development of the intelligence of young children Y1 - 1915 A1 - Binet, A. A1 - Simon, T. CY - Chicago: Chicago Medical Book Co ER - TY - JOUR T1 - Le development de lintelligence chez les enfants JF - LAnee Psychologique Y1 - 1908 A1 - Binet, A. A1 - Simon, T. VL - 14 N1 - In French ER - TY - JOUR T1 - Mthode nouvelle pour le diagnostic du niveau intellectuel des anormaux JF - L'Anne Psychologique Y1 - 1905 A1 - Binet, A. A1 - Simon, Th. A. VL - 11 N1 - (also cited as: Applications des methods nouvelles au diagnostic du niveau intellectual chez des enfants normaux et anourmaux dhospice et decole primaire, 245-336.) In French ER - TY - Generic T1 - Microcomputer network for computer­ized adaptive testing (CAT) Y1 - 0 A1 - Quan, B. A1 - Park, T.A. A1 - Sandahl, G. A1 - Wolfe, J.H. JF - NPRDC-TR-84-33 PB - San Diego: Navy Personnel Research and Development Center. UR - https://apps.dtic.mil/dtic/tr/fulltext/u2/a140256.pdf ER -