%0 Journal Article %J Applied Psychological Measurement %D 2020 %T A Blocked-CAT Procedure for CD-CAT %A Mehmet Kaplan %A Jimmy de la Torre %X This article introduces a blocked-design procedure for cognitive diagnosis computerized adaptive testing (CD-CAT), which allows examinees to review items and change their answers during test administration. Four blocking versions of the new procedure were proposed. In addition, the impact of several factors, namely, item quality, generating model, block size, and test length, on the classification rates was investigated. Three popular item selection indices in CD-CAT were used and their efficiency compared using the new procedure. An additional study was carried out to examine the potential benefit of item review. The results showed that the new procedure is promising in that allowing item review resulted only in a small loss in attribute classification accuracy under some conditions. Moreover, using a blocked-design CD-CAT is beneficial to the extent that it alleviates the negative impact of test anxiety on examinees’ true performance. %B Applied Psychological Measurement %V 44 %P 49-64 %U https://doi.org/10.1177/0146621619835500 %R 10.1177/0146621619835500 %0 Journal Article %J Journal of Computerized Adaotive Testing %D 2020 %T Three Measures of Test Adaptation Based on Optimal Test Information %A G. Gage Kingsbury %A Steven L. Wise %B Journal of Computerized Adaotive Testing %V 8 %P 1-19 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/80/37 %N 1 %R 10.7333/2002-0801001 %0 Journal Article %J Journal of Computerized Adaotive Testing %D 2020 %T Three Measures of Test Adaptation Based on Optimal Test Information %A G. Gage Kingsbury %A Steven L. Wise %B Journal of Computerized Adaotive Testing %V 8 %P 1-19 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/80/37 %N 1 %R 10.7333/2002-0801001 %0 Journal Article %J Applied Psychological Measurement %D 2019 %T Adaptive Testing With a Hierarchical Item Response Theory Model %A Wenhao Wang %A Neal Kingston %X The hierarchical item response theory (H-IRT) model is very flexible and allows a general factor and subfactors within an overall structure of two or more levels. When an H-IRT model with a large number of dimensions is used for an adaptive test, the computational burden associated with interim scoring and selection of subsequent items is heavy. An alternative approach for any high-dimension adaptive test is to reduce dimensionality for interim scoring and item selection and then revert to full dimensionality for final score reporting, thereby significantly reducing the computational burden. This study compared the accuracy and efficiency of final scoring for multidimensional, local multidimensional, and unidimensional item selection and interim scoring methods, using both simulated and real item pools. The simulation study was conducted under 10 conditions (i.e., five test lengths and two H-IRT models) with a simulated sample of 10,000 students. The study with the real item pool was conducted using item parameters from an actual 45-item adaptive test with a simulated sample of 10,000 students. Results indicate that the theta estimations provided by the local multidimensional and unidimensional item selection and interim scoring methods were relatively as accurate as the theta estimation provided by the multidimensional item selection and interim scoring method, especially during the real item pool study. In addition, the multidimensional method required the longest computation time and the unidimensional method required the shortest computation time. %B Applied Psychological Measurement %V 43 %P 51-67 %U https://doi.org/10.1177/0146621618765714 %R 10.1177/0146621618765714 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2019 %T How Adaptive Is an Adaptive Test: Are All Adaptive Tests Adaptive? %A Mark Reckase %A Unhee Ju %A Sewon Kim %K computerized adaptive test %K multistage test %K statistical indicators of amount of adaptation %B Journal of Computerized Adaptive Testing %V 7 %P 1-14 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/69/34 %N 1 %R 10.7333/1902-0701001 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2018 %T Adaptive Item Selection Under Matroid Constraints %A Daniel Bengs %A Ulf Brefeld %A Ulf Kröhne %B Journal of Computerized Adaptive Testing %V 6 %P 15-36 %G English %U http://www.iacat.org/jcat/index.php/jcat/article/view/64/32 %N 2 %R 10.7333/1808-0602015 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Measuring patient-reported outcomes adaptively: Multidimensionality matters! %A Paap, Muirne C. S. %A Kroeze, Karel A. %A Glas, C. A. W. %A Terwee, C. B. %A van der Palen, Job %A Veldkamp, Bernard P. %B Applied Psychological Measurement %R 10.1177/0146621617733954 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T A Top-Down Approach to Designing the Computerized Adaptive Multistage Test %A Luo, Xiao %A Kim, Doyoung %X Abstract The top-down approach to designing a multistage test is relatively understudied in the literature and underused in research and practice. This study introduced a route-based top-down design approach that directly sets design parameters at the test level and utilizes the advanced automated test assembly algorithm seeking global optimality. The design process in this approach consists of five sub-processes: (1) route mapping, (2) setting objectives, (3) setting constraints, (4) routing error control, and (5) test assembly. Results from a simulation study confirmed that the assembly, measurement and routing results of the top-down design eclipsed those of the bottom-up design. Additionally, the top-down design approach provided unique insights into design decisions that could be used to refine the test. Regardless of these advantages, it is recommended applying both top-down and bottom-up approaches in a complementary manner in practice. %B Journal of Educational Measurement %V 55 %P 243-263 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12174 %R 10.1111/jedm.12174 %0 Journal Article %J Educational and Psychological Measurement %D 2017 %T The Development of MST Test Information for the Prediction of Test Performances %A Ryoungsun Park %A Jiseon Kim %A Hyewon Chung %A Barbara G. Dodd %X The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance. %B Educational and Psychological Measurement %V 77 %P 570-586 %U http://dx.doi.org/10.1177/0013164416662960 %R 10.1177/0013164416662960 %0 Journal Article %J Journal of Educational Measurement %D 2017 %T Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing %A Kang, Hyeon-Ah %A Zhang, Susu %A Chang, Hua-Hua %X The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen-Shannon (JS) divergence, a symmetrized version of the Kullback-Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. %B Journal of Educational Measurement %V 54 %P 165–183 %U http://dx.doi.org/10.1111/jedm.12139 %R 10.1111/jedm.12139 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T From Blueprints to Systems: An Integrated Approach to Adaptive Testing %A Gage Kingsbury %A Tony Zara %K CAT %K integrated approach %K Keynote %X

For years, test blueprints have told test developers how many items and what types of items will be included in a test. Adaptive testing adopted this approach from paper testing, and it is reasonably useful. Unfortunately, 'how many items and what types of items' are not all the elements one should consider when choosing items for an adaptive test. To fill in gaps, practitioners have developed tools to allow an adaptive test to behave appropriately (i.e. examining exposure control, content balancing, item drift procedures, etc.). Each of these tools involves the use of a separate process external to the primary item selection process.

The use of these subsidiary processes makes item selection less optimal and makes it difficult to prioritize aspects of selection. This discussion describes systems-based adaptive testing. This approach uses metadata concerning items, test takers and test elements to select items. These elements are weighted by the stakeholders to shape an expanded blueprint designed for adaptive testing. 

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1CBaAfH4ES7XivmvrMjPeKyFCsFZOpQMJ %0 Journal Article %J Quality of Life Research %D 2017 %T Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life %A Paap, Muirne C. S. %A Kroeze, Karel A. %A Terwee, Caroline B. %A van der Palen, Job %A Veldkamp, Bernard P. %B Quality of Life Research %V 26 %P 2909–2918 %U https://doi.org/10.1007/s11136-017-1624-3 %R 10.1007/s11136-017-1624-3 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Multi-stage Testing for a Multi-disciplined End-of primary-school Test %A Hendrik Straat %A Maaike van Groen %A Wobbe Zijlstra %A Marie-Anne Keizer-Mittelhaëuser %A Michel Lamoré %K mst %K Multidisciplined %K proficiency %X

The Dutch secondary education system consists of five levels: basic, lower, and middle vocational education, general secondary education, and pre-academic education. The individual decision for level of secondary education is based on a combination of the teacher’s judgment and an end-of-primaryschool placement test.

This placement test encompasses the measurement of reading, language, mathematics and writing; each skill consisting of one to four subdomains. The Dutch end-of-primaryschool test is currently administered in two linear 200-item paper-based versions. The two versions differ in difficulty so as to motivate both less able and more able students, and measure both groups of students precisely. The primary goal of the test is providing a placement advice for five levels of secondary education. The secondary goal is the assessment of six different fundamental reference levels defined on reading, language, and mathematics. Because of the high stakes advice of the test, the Dutch parliament has instructed to change the format to a multistage test. A major advantage of multistage testing is that the tailoring of the tests is more strongly related to the ability of the students than to the teacher’s judgment. A separate multistage test is under development for each of the three skills measured by the reference levels to increase the classification accuracy for secondary education placement and to optimally measure the performance on the reference-level-related skills.

This symposium consists of three presentations discussing the challenges in transitioning from a linear paper-based test to a computer-based multistage test within an existing curriculum and the specification of the multistage test to meet the measurement purposes. The transitioning to a multistage test has to improve both classification accuracy and measurement precision.

First, we describe the Dutch educational system and the role of the end-of-primary-school placement test within this system. Special attention will be paid to the advantages of multistage testing over both linear testing and computerized adaptive testing, and on practical implications related to the transitioning from a linear to a multistage test.

Second, we discuss routing and reporting on the new multi-stage test. Both topics have a major impact on the quality of the placement advice and the reference mastery decisions. Several methods for routing and reporting are compared.

Third, the linear test contains 200 items to cover a broad range of different skills and to obtain a precise measurement of those skills separately. Multistage testing creates opportunities to reduce the cognitive burden for the students while maintaining the same quality of placement advice and assessment of mastering of reference levels. This presentation focuses on optimal allocation of items to test modules, optimal number of stages and modules per stage and test length reduction.

Session Video 1

Session Video 2

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1C5ys178p_Wl9eemQuIsI56IxDTck2z8P %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T A New Cognitive Diagnostic Computerized Adaptive Testing for Simultaneously Diagnosing Skills and Misconceptions %A Bor-Chen Kuo %A Chun-Hua Chen %K CD-CAT %K Misconceptions %K Simultaneous diagnosis %X

In education diagnoses, diagnosing misconceptions is important as well as diagnosing skills. However, traditional cognitive diagnostic computerized adaptive testing (CD-CAT) is usually developed to diagnose skills. This study aims to propose a new CD-CAT that can simultaneously diagnose skills and misconceptions. The proposed CD-CAT is based on a recently published new CDM, called the simultaneously identifying skills and misconceptions (SISM) model (Kuo, Chen, & de la Torre, in press). A new item selection algorithm is also proposed in the proposed CD-CAT for achieving high adaptive testing performance. In simulation studies, we compare our new item selection algorithm with three existing item selection methods, including the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) proposed by Cheng (2009) and the modified PWKL (MPWKL) proposed by Kaplan, de la Torre, and Barrada (2015). The results show that our proposed CD-CAT can efficiently diagnose skills and misconceptions; the accuracy of our new item selection algorithm is close to the MPWKL but less computational burden; and our new item selection algorithm outperforms the KL and PWKL methods on diagnosing skills and misconceptions.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. doi: 10.1007/s11336-009-9123-2

Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. doi:10.1177/0146621614554650

Kuo, B.-C., Chen, C.-H., & de la Torre, J. (in press). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Journal Article %J Applied Psychological MeasurementApplied Psychological Measurement %D 2017 %T Projection-Based Stopping Rules for Computerized Adaptive Testing in Licensure Testing %A Luo, Xiao %A Kim, Doyoung %A Dickison, Philip %X The confidence interval (CI) stopping rule is commonly used in licensure settings to make classification decisions with fewer items in computerized adaptive testing (CAT). However, it tends to be less efficient in the near-cut regions of the ? scale, as the CI often fails to be narrow enough for an early termination decision prior to reaching the maximum test length. To solve this problem, this study proposed the projection-based stopping rules that base the termination decisions on the algorithmically projected range of the final ? estimate at the hypothetical completion of the CAT. A simulation study and an empirical study were conducted to show the advantages of the projection-based rules over the CI rule, in which the projection-based rules reduced the test length without jeopardizing critical psychometric qualities of the test, such as the ? and classification precision. Operationally, these rules do not require additional regularization parameters, because the projection is simply a hypothetical extension of the current test within the existing CAT environment. Because these new rules are specifically designed to address the decreased efficiency in the near-cut regions as opposed to for the entire scale, the authors recommend using them in conjunction with the CI rule in practice. %B Applied Psychological MeasurementApplied Psychological Measurement %V 42 %P 275 - 290 %8 2018/06/01 %@ 0146-6216 %U https://doi.org/10.1177/0146621617726790 %N 4 %! Applied Psychological Measurement %0 Journal Article %J Quality of Life Research %D 2017 %T The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study %A Barthel, D. %A Otto, C. %A Nolte, S. %A Meyrose, A.-K. %A Fischer, F. %A Devine, J. %A Walter, O. %A Mierke, A. %A Fischer, K. I. %A Thyen, U. %A Klein, M. %A Ankermann, T. %A Rose, M. %A Ravens-Sieberer, U. %X Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. %B Quality of Life Research %V 26 %P 1105–1117 %8 May %U https://doi.org/10.1007/s11136-016-1437-9 %R 10.1007/s11136-016-1437-9 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2016 %T Effect of Imprecise Parameter Estimation on Ability Estimation in a Multistage Test in an Automatic Item Generation Context %A Colvin, Kimberly %A Keller, Lisa A %A Robin, Frederic %K Adaptive Testing %K automatic item generation %K errors in item parameters %K item clones %K multistage testing %B Journal of Computerized Adaptive Testing %V 4 %P 1-18 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/59/27 %N 1 %R 10.7333/1608-040101 %0 Journal Article %J Journal of Educational Measurement %D 2016 %T Modeling Student Test-Taking Motivation in the Context of an Adaptive Achievement Test %A Wise, Steven L. %A Kingsbury, G. Gage %X This study examined the utility of response time-based analyses in understanding the behavior of unmotivated test takers. For the data from an adaptive achievement test, patterns of observed rapid-guessing behavior and item response accuracy were compared to the behavior expected under several types of models that have been proposed to represent unmotivated test taking behavior. Test taker behavior was found to be inconsistent with these models, with the exception of the effort-moderated model. Effort-moderated scoring was found to both yield scores that were more accurate than those found under traditional scoring, and exhibit improved person fit statistics. In addition, an effort-guided adaptive test was proposed and shown by a simulation study to alleviate item difficulty mistargeting caused by unmotivated test taking. %B Journal of Educational Measurement %V 53 %P 86–105 %U http://dx.doi.org/10.1111/jedm.12102 %R 10.1111/jedm.12102 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Parameter Drift Detection in Multidimensional Computerized Adaptive Testing Based on Informational Distance/Divergence Measures %A Kang, Hyeon-Ah %A Chang, Hua-Hua %X An informational distance/divergence-based approach is proposed to detect the presence of parameter drift in multidimensional computerized adaptive testing (MCAT). The study presents significance testing procedures for identifying changes in multidimensional item response functions (MIRFs) over time based on informational distance/divergence measures that capture the discrepancy between two probability functions. To approximate the MIRFs from the observed response data, the k-nearest neighbors algorithm is used with the random search method. A simulation study suggests that the distance/divergence-based drift measures perform effectively in identifying the instances of parameter drift in MCAT. They showed moderate power with small samples of 500 examinees and excellent power when the sample size was as large as 1,000. The proposed drift measures also adequately controlled for Type I error at the nominal level under the null hypothesis. %B Applied Psychological Measurement %V 40 %P 534-550 %U http://apm.sagepub.com/content/40/7/534.abstract %R 10.1177/0146621616663676 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Stochastic Curtailment of Questionnaires for Three-Level Classification: Shortening the CES-D for Assessing Low, Moderate, and High Risk of Depression %A Smits, Niels %A Finkelman, Matthew D. %A Kelderman, Henk %X In clinical assessment, efficient screeners are needed to ensure low respondent burden. In this article, Stochastic Curtailment (SC), a method for efficient computerized testing for classification into two classes for observable outcomes, was extended to three classes. In a post hoc simulation study using the item scores on the Center for Epidemiologic Studies–Depression Scale (CES-D) of a large sample, three versions of SC, SC via Empirical Proportions (SC-EP), SC via Simple Ordinal Regression (SC-SOR), and SC via Multiple Ordinal Regression (SC-MOR) were compared at both respondent burden and classification accuracy. All methods were applied under the regular item order of the CES-D and under an ordering that was optimal in terms of the predictive power of the items. Under the regular item ordering, the three methods were equally accurate, but SC-SOR and SC-MOR needed less items. Under the optimal ordering, additional gains in efficiency were found, but SC-MOR suffered from capitalization on chance substantially. It was concluded that SC-SOR is an efficient and accurate method for clinical screening. Strengths and weaknesses of the methods are discussed. %B Applied Psychological Measurement %V 40 %P 22-36 %U http://apm.sagepub.com/content/40/1/22.abstract %R 10.1177/0146621615592294 %0 Journal Article %J Journal of Educational Measurement %D 2015 %T Assessing Individual-Level Impact of Interruptions During Online Testing %A Sinharay, Sandip %A Wan, Ping %A Choi, Seung W. %A Kim, Dong-In %X With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as Hill and Sinharay et al. examined the impact of interruptions at an aggregate level. However, there is a lack of research on the assessment of impact of interruptions at an individual level. We attempt to fill that void. We suggest four methodological approaches, primarily based on statistical hypothesis testing, linear regression, and item response theory, which can provide evidence on the individual-level impact of interruptions. We perform a realistic simulation study to compare the Type I error rate and power of the suggested approaches. We then apply the approaches to data from the 2013 Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions. %B Journal of Educational Measurement %V 52 %P 80–105 %U http://dx.doi.org/10.1111/jedm.12064 %R 10.1111/jedm.12064 %0 Journal Article %J Journal of Educational Measurement %D 2015 %T A Comparison of IRT Proficiency Estimation Methods Under Adaptive Multistage Testing %A Kim, Sooyeon %A Moses, Tim %A Yoo, Hanwook (Henry) %X This inquiry is an investigation of item response theory (IRT) proficiency estimators’ accuracy under multistage testing (MST). We chose a two-stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two-stage MST panels (i.e., forms) by manipulating two assembly conditions in each module, such as difficulty level and module length. For each panel, we investigated the accuracy of examinees’ proficiency levels derived from seven IRT proficiency estimators. The choice of Bayesian (prior) versus non-Bayesian (no prior) estimators was of more practical significance than the choice of number-correct versus item-pattern scoring estimators. The Bayesian estimators were slightly more efficient than the non-Bayesian estimators, resulting in smaller overall error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for low- and high-performing examinees. %B Journal of Educational Measurement %V 52 %P 70–79 %U http://dx.doi.org/10.1111/jedm.12063 %R 10.1111/jedm.12063 %0 Journal Article %J Educational Measurement: Issues and Practice %D 2015 %T Evaluating Content Alignment in Computerized Adaptive Testing %A Wise, S. L. %A Kingsbury, G. G. %A Webb, N. L. %X The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process. %B Educational Measurement: Issues and Practice %V 34 %N 4 %R http://dx.doi.org/10.1111/emip.12094 %0 Journal Article %J Educational and Psychological Measurement %D 2015 %T Investigation of Response Changes in the GRE Revised General Test %A Liu, Ou Lydia %A Bridgeman, Brent %A Gu, Lixiong %A Xu, Jun %A Kong, Nan %X Research on examinees’ response changes on multiple-choice tests over the past 80 years has yielded some consistent findings, including that most examinees make score gains by changing answers. This study expands the research on response changes by focusing on a high-stakes admissions test—the Verbal Reasoning and Quantitative Reasoning measures of the GRE revised General Test. We analyzed data from 8,538 examinees for Quantitative and 9,140 for Verbal sections who took the GRE revised General Test in 12 countries. The analyses yielded findings consistent with prior research. In addition, as examinees’ ability increases, the benefit of response changing increases. The study yielded significant implications for both test agencies and test takers. Computer adaptive tests often do not allow the test takers to review and revise. Findings from this study confirm the benefit of such features. %B Educational and Psychological Measurement %V 75 %P 1002-1020 %U http://epm.sagepub.com/content/75/6/1002.abstract %R 10.1177/0013164415573988 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T New Item Selection Methods for Cognitive Diagnosis Computerized Adaptive Testing %A Kaplan, Mehmet %A de la Torre, Jimmy %A Barrada, Juan Ramón %X This article introduces two new item selection methods, the modified posterior-weighted Kullback–Leibler index (MPWKL) and the generalized deterministic inputs, noisy “and” gate (G-DINA) model discrimination index (GDI), that can be used in cognitive diagnosis computerized adaptive testing. The efficiency of the new methods is compared with the posterior-weighted Kullback–Leibler (PWKL) item selection index using a simulation study in the context of the G-DINA model. The impact of item quality, generating models, and test termination rules on attribute classification accuracy or test length is also investigated. The results of the study show that the MPWKL and GDI perform very similarly, and have higher correct attribute classification rates or shorter mean test lengths compared with the PWKL. In addition, the GDI has the shortest implementation time among the three indices. The proportion of item usage with respect to the required attributes across the different conditions is also tracked and discussed. %B Applied Psychological Measurement %V 39 %P 167-188 %U http://apm.sagepub.com/content/39/3/167.abstract %R 10.1177/0146621614554650 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2014 %T Cognitive Diagnostic Models and Computerized Adaptive Testing: Two New Item-Selection Methods That Incorporate Response Times %A Finkelman, M. D. %A Kim, W. %A Weissman, A. %A Cook, R.J. %B Journal of Computerized Adaptive Testing %V 2 %P 59-76 %G English %U http://www.iacat.org/jcat/index.php/jcat/article/view/43/21 %N 4 %R 10.7333/1412-0204059 %0 Journal Article %J Journal of Educational Measurement %D 2014 %T Determining the Overall Impact of Interruptions During Online Testing %A Sinharay, Sandip %A Wan, Ping %A Whitaker, Mike %A Kim, Dong-In %A Zhang, Litong %A Choi, Seung W. %X

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees’ scores. There is a lack of research on this topic due to the novelty of the problem. This article is an attempt to fill that void. Several methods, primarily based on propensity score matching, linear regression, and item response theory, were suggested to determine the overall impact of the interruptions on the examinees’ scores. A realistic simulation study shows that the suggested methods have satisfactory Type I error rate and power. Then the methods were applied to data from the Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions in 2013. The results indicate that the interruptions did not have a significant overall impact on the student scores for the ISTEP+ test.

%B Journal of Educational Measurement %V 51 %P 419–440 %U http://dx.doi.org/10.1111/jedm.12052 %R 10.1111/jedm.12052 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Enhancing Pool Utilization in Constructing the Multistage Test Using Mixed-Format Tests %A Park, Ryoungsun %A Kim, Jiseon %A Chung, Hyewon %A Dodd, Barbara G. %X

This study investigated a new pool utilization method of constructing multistage tests (MST) using the mixed-format test based on the generalized partial credit model (GPCM). MST simulations of a classification test were performed to evaluate the MST design. A linear programming (LP) model was applied to perform MST reassemblies based on the initial MST construction. Three subsequent MST reassemblies were performed. For each reassembly, three test unit replacement ratios (TRRs; 0.22, 0.44, and 0.66) were investigated. The conditions of the three passing rates (30%, 50%, and 70%) were also considered in the classification testing. The results demonstrated that various MST reassembly conditions increased the overall pool utilization rates, while maintaining the desired MST construction. All MST testing conditions performed equally well in terms of the precision of the classification decision.

%B Applied Psychological Measurement %V 38 %P 268-280 %U http://apm.sagepub.com/content/38/4/268.abstract %R 10.1177/0146621613515545 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Stratified Item Selection and Exposure Control in Unidimensional Adaptive Testing in the Presence of Two-Dimensional Data %A Kalinowski, Kevin E. %A Natesan, Prathiba %A Henson, Robin K. %X

It is not uncommon to use unidimensional item response theory models to estimate ability in multidimensional data with computerized adaptive testing (CAT). The current Monte Carlo study investigated the penalty of this model misspecification in CAT implementations using different item selection methods and exposure control strategies. Three item selection methods—maximum information (MAXI), a-stratification (STRA), and a-stratification with b-blocking (STRB) with and without Sympson–Hetter (SH) exposure control strategy—were investigated. Calibrating multidimensional items as unidimensional items resulted in inaccurate item parameter estimates. Therefore, MAXI performed better than STRA and STRB in estimating the ability parameters. However, all three methods had relatively large standard errors. SH exposure control had no impact on the number of overexposed items. Existing unidimensional CAT implementations might consider using MAXI only if recalibration as multidimensional model is too expensive. Otherwise, building a CAT pool by calibrating multidimensional data as unidimensional is not recommended.

%B Applied Psychological Measurement %V 38 %P 563-576 %U http://apm.sagepub.com/content/38/7/563.abstract %R 10.1177/0146621614536768 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2013 %T Item Ordering in Stochastically Curtailed Health Questionnaires With an Observable Outcome %A Finkelman, M. D. %A Kim, W. %A He, Y. %A Lai, A.M. %B Journal of Computerized Adaptive Testing %V 1 %P 38-66 %G en %N 3 %R 10.7333/1304-0103038 %0 Book Section %B Research on PISA. %D 2013 %T Reporting differentiated literacy results in PISA by using multidimensional adaptive testing. %A Frey, A. %A Seitz, N-N. %A Kröhne, U. %B Research on PISA. %I Dodrecht: Springer %G eng %0 Journal Article %J Journal of Higher Education %D 2012 %T Computerized Adaptive Testing for Student Selection to Higher Education %A Kalender, I. %X

The purpose of the present study is to discuss applicability of computerized adaptive testing format as an alternative for current student selection examinations to higher education in Turkey. In the study, first problems associated with current student selection system are given. These problems exerts pressure on students that results in test anxiety, produce measurement experiences that can be criticized, and lessen credibility of student selection system. Next, computerized adaptive test are introduced and advantages they provide are presented. Then results of a study that used two research designs (simulation and live testing) were presented. Results revealed that (i) computerized adaptive format provided a reduction up to 80% in the number of items given to students compared to paper and pencil format of student selection examination, (ii) ability estimations have high reliabilities. Correlations between ability estimations obtained from simulation and traditional format were higher than 0.80. At the end of the study solutions provided by computerized adaptive testing implementation to the current problems were discussed. Also some issues for application of CAT format for student selection examinations in Turkey are given.

%B Journal of Higher Education %G English %& 1 %0 Journal Article %J Archives of General Psychiatry %D 2012 %T Development of a computerized adaptive test for depression %A Robert D. Gibbons %A David .J. Weiss %A Paul A. Pilkonis %A Ellen Frank %A Tara Moore %A Jong Bae Kim %A David J. Kupfer %B Archives of General Psychiatry %V 69 %P 1105-1112 %G English %U WWW.ARCHGENPSYCHIATRY.COM %N 11 %R 10.1001/archgenpsychiatry.2012.14 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing %A Yen, Yung-Chin %A Ho, Rong-Guey %A Laio, Wen-Wei %A Chen, Li-Ju %A Kuo, Ching-Chin %X

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item administration. To enhance the robust performance of CAT against aberrant responses, Barton and Lord proposed the four-parameter logistic (4PL) item response theory (IRT) model. However, most studies relevant to the 4PL IRT model were conducted based on simulation experiments. This study attempts to investigate the performance of the 4PL IRT model as a slip-correction mechanism with an empirical experiment. The results showed that the 4PL IRT model could not only reduce the problematic underestimation of the examinees’ ability introduced by careless mistakes in practical situations but also improve measurement efficiency.

%B Applied Psychological Measurement %V 36 %P 75-87 %U http://apm.sagepub.com/content/36/2/75.abstract %R 10.1177/0146621611432862 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T Panel Design Variations in the Multistage Test Using the Mixed-Format Tests %A Kim, Jiseon %A Chung, Hyewon %A Dodd, Barbara G. %A Park, Ryoungsun %X

This study compared various panel designs of the multistage test (MST) using mixed-format tests in the context of classification testing. Simulations varied the design of the first-stage module. The first stage was constructed according to three levels of test information functions (TIFs) with three different TIF centers. Additional computerized adaptive test (CAT) conditions provided baseline comparisons. Three passing rate conditions were also included. The various MST conditions using mixed-format tests were constructed properly and performed well. When the levels of TIFs at the first stage were higher, the simulations produced a greater number of correct classifications. CAT with the randomesque-10 procedure yielded comparable results to the MST with increased levels of TIFs. Finally, all MST conditions achieved better test security results compared with CAT’s maximum information conditions.

%B Educational and Psychological Measurement %V 72 %P 574-588 %U http://epm.sagepub.com/content/72/4/574.abstract %R 10.1177/0013164411428977 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Continuous Testing (an avenue for CAT research) %A G. Gage Kingsbury %K CAT %K item filter %K item filtration %X

Publishing an Adaptive Test

Problems with Publishing

Research Questions

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Journal Article %J Journal of Applied Testing Technology %D 2011 %T Creating a K-12 Adaptive Test: Examining the Stability of Item Parameter Estimates and Measurement Scales %A Kingsbury, G. G. %A Wise, S. L. %X

Development of adaptive tests used in K-12 settings requires the creation of stable measurement scales to measure the growth of individual students from one grade to the next, and to measure change in groups from one year to the next. Accountability systems
like No Child Left Behind require stable measurement scales so that accountability has meaning across time. This study examined the stability of the measurement scales used with the Measures of Academic Progress. Difficulty estimates for test questions from the reading and mathematics scales were examined over a period ranging from 7 to 22 years. Results showed high correlations between item difficulty estimates from the time at which they where originally calibrated and the current calibration. The average drift in item difficulty estimates was less than .01 standard deviations. The average impact of change in item difficulty estimates was less than the smallest reported difference on the score scale for two actual tests. The findings of the study indicate that an IRT scale can be stable enough to allow consistent measurement of student achievement.

%B Journal of Applied Testing Technology %V 12 %G English %U http://www.testpublishers.org/journal-of-applied-testing-technology %0 Generic %D 2011 %T Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients %A Giesinger, J. M. %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Arraras, J. I. %A Conroy, T. %A Gamper, E. M. %A Kemmler, G. %A King, M. T. %A Oberguggenberger, A. S. %A Velikova, G. %A Young, T. %A Holzner, B. %A Eortc-Qlg, E. O. %X ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. %B Health and Quality of Life Outcomes %7 2011/03/31 %V 9 %P 10 %8 March 29, 2011 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G Eng %M 21447160 %0 Thesis %B THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY %D 2011 %T Effects of Different Computerized Adaptive Testing Strategies of Recovery of Ability %A Kalender, I. %X

The purpose of the present study is to compare ability estimations obtained from computerized adaptive testing (CAT) procedure with the paper and pencil test administration results of Student Selection Examination (SSE) science subtest considering different ability estimation methods and test termination rules. There are two phases in the present study. In the first phase, a post-hoc simulation was conducted to find out relationships between examinee ability levels estimated by CAT and paper and pencil test versions of the SSE. Maximum Likelihood Estimation and Expected A Posteriori were used as ability estimation method. Test termination rules were standard error threshold and fixed number of items. Second phase was actualized by implementing a CAT administration to a group of examinees to investigate performance of CAT administration in an environment other than simulated administration. Findings of post-hoc simulations indicated CAT could be implemented by using Expected A Posteriori estimation method with standard error threshold value of 0.30 or higher for SSE. Correlation between ability estimates obtained by CAT and real SSE was found to be 0.95. Mean of number of items given to examinees by CAT is 18.4. Correlation between live CAT and real SSE ability estimations was 0.74. Number of items used for CAT administration is approximately 50% of the items in paper and pencil SSE science subtest. Results indicated that CAT for SSE science subtest provided ability estimations with higher reliability with fewer items compared to paper and pencil format.

%B THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY %V Ph.D. %G eng %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T High-throughput Health Status Measurement using CAT in the Era of Personal Genomics: Opportunities and Challenges %A Eswar Krishnan %K CAT %K health applications %K PROMIS %B Annual Conference of the International Association for Computerized Adaptive Testing %G eng %0 Journal Article %J BMC Medical Informatics and Decision Making %D 2011 %T A new adaptive testing algorithm for shortening health literacy assessments %A Kandula, S. %A Ancker, J.S. %A Kaufman, D.R. %A Currie, L.M. %A Qing, Z.-T. %X

 

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178473/?tool=pmcentrez
%B BMC Medical Informatics and Decision Making %V 11 %G English %N 52 %R 10.1186/1472-6947-11-52 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Detecting Person Misfit in Adaptive Testing %A Meijer, R. R. %A van Krimpen-Stoop, E. M. L. A. %B Elements of Adaptive Testing %P 315-329 %G eng %& 16 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J Quality of Life Research %D 2010 %T Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Chie, W. C. %A Conroy, T. %A Costantini, A. %A Fayers, P. %A Helbostad, J. %A Holzner, B. %A Kaasa, S. %A Singer, S. %A Velikova, G. %A Young, T. %X PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. %B Quality of Life Research %7 2010/10/26 %V 20 %P 479-490 %@ 1573-2649 (Electronic)0962-9343 (Linking) %G Eng %M 20972628 %0 Journal Article %J Applied Psychological Measurement %D 2010 %T Item Selection and Hypothesis Testing for the Adaptive Measurement of Change %A Finkelman, M. D. %A Weiss, D. J. %A Kim-Kang, G. %K change %K computerized adaptive testing %K individual change %K Kullback–Leibler information %K likelihood ratio %K measuring change %X

Assessing individual change is an important topic in both psychological and educational measurement. An adaptive measurement of change (AMC) method had previously been shown to exhibit greater efficiency in detecting change than conventional nonadaptive methods. However, little work had been done to compare different procedures within the AMC framework. This study introduced a new item selection criterion and two new test statistics for detecting change with AMC that were specifically designed for the paradigm of hypothesis testing. In two simulation sets, the new methods for detecting significant change improved on existing procedures by demonstrating better adherence to Type I error rates and substantially better power for detecting relatively small change. 

%B Applied Psychological Measurement %V 34 %P 238-254 %G eng %N 4 %R 10.1177/0146621609344844 %0 Book Section %D 2009 %T Adaptive item calibration: A process for estimating item parameters within a computerized adaptive test %A Kingsbury, G. G. %X The characteristics of an adaptive test change the characteristics of the field testing that is necessary to add items to an existing measurement scale. The process used to add field-test items to the adaptive test might lead to scale drift or disrupt the test by administering items of inappropriate difficulty. The current study makes use of the transitivity of examinee and item in item response theory to describe a process for adaptive item calibration. In this process an item is successively administered to examinees whose ability levels match the performance of a given field-test item. By treating the item as if it were taking an adaptive test, examinees can be selected who provide the most information about the item at its momentary difficulty level. This should provide a more efficient procedure for estimating item parameters. The process is described within the context of the one-parameter logistic IRT model. The process is then simulated to identify whether it can be more accurate and efficient than random presentation of field-test items to examinees. Results indicated that adaptive item calibration might provide a viable approach to item calibration within the context of an adaptive test. It might be most useful for expanding item pools in settings with small sample sizes or needs for large numbers of items. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Adequacy of an item pool measuring proficiency in English language to implement a CAT procedure %A Karino, C. A. %A Costa, D. R. %A Laros, J. A. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T A comparison of three methods of item selection for computerized adaptive testing %A Costa, D. R. %A Karino, C. A. %A Moura, F. A. S. %A Andrade, D. F. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Criterion-related validity of an innovative CAT-based personality measure %A Schneider, R. J. %A McLellan, R. A. %A Kantrowitz, T. M. %A Houston, J. S. %A Borman, W. C. %X This paper describes development and initial criterion-related validation of the PreVisor Computer Adaptive Personality Scales (PCAPS), a computerized adaptive testing-based personality measure that uses an ideal point IRT model based on forced-choice, paired-comparison responses. Based on results from a large consortium study, a composite of six PCAPS scales identified as relevant to the population of interest (first-line supervisors) had an estimated operational validity against an overall job performance criterion of ρ = .25. Uncorrected and corrected criterion-related validity results for each of the six PCAPS scales making up the composite are also reported. Because the PCAPS algorithm computes intermediate scale scores until a stopping rule is triggered, we were able to graph number of statement-pairs presented against criterion-related validities. Results showed generally monotonically increasing functions. However, asymptotic validity levels, or at least a reduction in the rate of increase in slope, were often reached after 5-7 statement-pairs were presented. In the case of the composite measure, there was some evidence that validities decreased after about six statement-pairs. A possible explanation for this is provided. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Developing item variants: An empirical study %A Wendt, A. %A Kao, S. %A Gorham, J. %A Woo, A. %X Large-scale standardized test have been widely used for educational and licensure testing. In computerized adaptive testing (CAT), one of the practical concerns for maintaining large-scale assessments is to ensure adequate numbers of high-quality items that are required for item pool functioning. Developing items at specific difficulty levels and for certain areas of test plans is a wellknown challenge. The purpose of this study was to investigate strategies for varying items that can effectively generate items at targeted difficulty levels and specific test plan areas. Each variant item generation model was developed by decomposing selected source items possessing ideal measurement properties and targeting the desirable content domains. 341 variant items were generated from 72 source items. Data were collected from six pretest periods. Items were calibrated using the Rasch model. Initial results indicate that variant items showed desirable measurement properties. Additionally, compared to an average of approximately 60% of the items passing pretest criteria, an average of 84% of the variant items passed the pretest criteria. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Pain %D 2009 %T Development and preliminary testing of a computerized adaptive assessment of chronic pain %A Anatchkova, M. D. %A Saris-Baglama, R. N. %A Kosinski, M. %A Bjorner, J. B. %K *Computers %K *Questionnaires %K Activities of Daily Living %K Adaptation, Psychological %K Chronic Disease %K Cohort Studies %K Disability Evaluation %K Female %K Humans %K Male %K Middle Aged %K Models, Psychological %K Outcome Assessment (Health Care) %K Pain Measurement/*methods %K Pain, Intractable/*diagnosis/psychology %K Psychometrics %K Quality of Life %K User-Computer Interface %X The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain. %B Journal of Pain %7 2009/07/15 %V 10 %P 932-943 %8 Sep %@ 1528-8447 (Electronic)1526-5900 (Linking) %G eng %M 19595636 %2 2763618 %0 Journal Article %J Rehabilitation Psychology %D 2009 %T Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis %A Forkmann, T. %A Boecker, M. %A Norra, C. %A Eberle, N. %A Kircher, T. %A Schauerte, P. %A Mischke, K. %A Westhofen, M. %A Gauggel, S. %A Wirtz, M. %K Adaptation, Psychological %K Adult %K Aged %K Depressive Disorder/*diagnosis/psychology %K Diagnosis, Computer-Assisted %K Female %K Heart Diseases/*psychology %K Humans %K Male %K Mental Disorders/*psychology %K Middle Aged %K Models, Statistical %K Otorhinolaryngologic Diseases/*psychology %K Personality Assessment/statistics & numerical data %K Personality Inventory/*statistics & numerical data %K Psychometrics/statistics & numerical data %K Questionnaires %K Reproducibility of Results %K Sick Role %X OBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. %B Rehabilitation Psychology %7 2009/05/28 %V 54 %P 186-97 %8 May %@ 0090-5550 (Print)0090-5550 (Linking) %G eng %M 19469609 %0 Journal Article %J International Journal for Methods in Psychiatric Research %D 2009 %T Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application %A Fliege, H. %A Becker, J. %A Walter, O. B. %A Rose, M. %A Bjorner, J. B. %A Klapp, B. F. %X In the past, a German Computerized Adaptive Test, based on Item Response Theory (IRT), was developed for purposes of assessing the construct depression [Computer-adaptive test for depression (D-CAT)]. This study aims at testing the feasibility and validity of the real computer-adaptive application.The D-CAT, supplied by a bank of 64 items, was administered on personal digital assistants (PDAs) to 423 consecutive patients suffering from psychosomatic and other medical conditions (78 with depression). Items were adaptively administered until a predetermined reliability (r >/= 0.90) was attained. For validation purposes, the Hospital Anxiety and Depression Scale (HADS), the Centre for Epidemiological Studies Depression (CES-D) scale, and the Beck Depression Inventory (BDI) were administered. Another sample of 114 patients was evaluated using standardized diagnostic interviews [Composite International Diagnostic Interview (CIDI)].The D-CAT was quickly completed (mean 74 seconds), well accepted by the patients and reliable after an average administration of only six items. In 95% of the cases, 10 items or less were needed for a reliable score estimate. Correlations between the D-CAT and the HADS, CES-D, and BDI ranged between r = 0.68 and r = 0.77. The D-CAT distinguished between diagnostic groups as well as established questionnaires do.The D-CAT proved an efficient, well accepted and reliable tool. Discriminative power was comparable to other depression measures, whereby the CAT is shorter and more precise. Item usage raises questions of balancing the item selection for content in the future. Copyright (c) 2009 John Wiley & Sons, Ltd. %B International Journal for Methods in Psychiatric Research %7 2009/02/06 %V 18 %P 233-236 %8 Feb 4 %@ 1049-8931 (Print) %G Eng %M 19194856 %0 Journal Article %J Journal of Clinical Epidemiology %D 2009 %T An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception %A Kocalevent, R. D. %A Rose, M. %A Becker, J. %A Walter, O. B. %A Fliege, H. %A Bjorner, J. B. %A Kleiber, D. %A Klapp, B. F. %K *Diagnosis, Computer-Assisted %K Adolescent %K Adult %K Aged %K Aged, 80 and over %K Confidence Intervals %K Female %K Humans %K Male %K Middle Aged %K Perception %K Quality of Health Care/*standards %K Questionnaires %K Reproducibility of Results %K Sickness Impact Profile %K Stress, Psychological/*diagnosis/psychology %K Treatment Outcome %X OBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making. %B Journal of Clinical Epidemiology %7 2008/07/22 %V 62 %P 278-287 %@ 1878-5921 (Electronic)0895-4356 (Linking) %G eng %M 18639439 %0 Book Section %D 2009 %T Features of J-CAT (Japanese Computerized Adaptive Test) %A Imai, S. %A Ito, S. %A Nakamura, Y. %A Kikuchi, K. %A Akagi, Y. %A Nakasono, H. %A Honda, A. %A Hiramura, T. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Item selection and hypothesis testing for the adaptive measurement of change %A Finkelman, M. %A Weiss, D. J. %A Kim-Kang, G. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Rheumatology %D 2009 %T Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing %A Fries, J.F. %A Cella, D. %A Rose, M. %A Krishnan, E. %A Bruce, B. %K *Disability Evaluation %K *Outcome Assessment (Health Care) %K Arthritis/diagnosis/*physiopathology %K Health Surveys %K Humans %K Prognosis %K Reproducibility of Results %X OBJECTIVE: Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated. METHODS: A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ. RESULTS: Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms. CONCLUSION: Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length. %B Journal of Rheumatology %7 2009/09/10 %V 36 %P 2061-2066 %8 Sep %@ 0315-162X (Print)0315-162X (Linking) %G eng %M 19738214 %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T Adaptive measurement of individual change %A Kim-Kang, G. %A Weiss, D. J. %B Zeitschrift für Psychologie / Journal of Psychology %V 216 %P 49-58 %G eng %R 10.1027/0044-3409.216.1.49 %0 Journal Article %J Educational and Psychological Measurement %D 2008 %T Computer-Based and Paper-and-Pencil Administration Mode Effects on a Statewide End-of-Course English Test %A Kim, Do-Hong %A Huynh, Huynh %X

The current study compared student performance between paper-and-pencil testing (PPT) and computer-based testing (CBT) on a large-scale statewide end-of-course English examination. Analyses were conducted at both the item and test levels. The overall results suggest that scores obtained from PPT and CBT were comparable. However, at the content domain level, a rather large difference in the reading comprehension section suggests that reading comprehension test may be more affected by the test administration mode. Results from the confirmatory factor analysis suggest that the administration mode did not alter the construct of the test.

%B Educational and Psychological Measurement %V 68 %P 554-570 %U http://epm.sagepub.com/content/68/4/554.abstract %R 10.1177/0013164407310132 %0 Journal Article %J Spine %D 2008 %T Computerized adaptive testing in back pain: Validation of the CAT-5D-QOL %A Kopec, J. A. %A Badii, M. %A McKenna, M. %A Lima, V. D. %A Sayre, E. C. %A Dvorak, M. %K *Disability Evaluation %K *Health Status Indicators %K *Quality of Life %K Adult %K Aged %K Algorithms %K Back Pain/*diagnosis/psychology %K British Columbia %K Diagnosis, Computer-Assisted/*standards %K Feasibility Studies %K Female %K Humans %K Internet %K Male %K Middle Aged %K Predictive Value of Tests %K Questionnaires/*standards %K Reproducibility of Results %X STUDY DESIGN: We have conducted an outcome instrument validation study. OBJECTIVE: Our objective was to develop a computerized adaptive test (CAT) to measure 5 domains of health-related quality of life (HRQL) and assess its feasibility, reliability, validity, and efficiency. SUMMARY OF BACKGROUND DATA: Kopec and colleagues have recently developed item response theory based item banks for 5 domains of HRQL relevant to back pain and suitable for CAT applications. The domains are Daily Activities (DAILY), Walking (WALK), Handling Objects (HAND), Pain or Discomfort (PAIN), and Feelings (FEEL). METHODS: An adaptive algorithm was implemented in a web-based questionnaire administration system. The questionnaire included CAT-5D-QOL (5 scales), Modified Oswestry Disability Index (MODI), Roland-Morris Disability Questionnaire (RMDQ), SF-36 Health Survey, and standard clinical and demographic information. Participants were outpatients treated for mechanical back pain at a referral center in Vancouver, Canada. RESULTS: A total of 215 patients completed the questionnaire and 84 completed a retest. On average, patients answered 5.2 items per CAT-5D-QOL scale. Reliability ranged from 0.83 (FEEL) to 0.92 (PAIN) and was 0.92 for the MODI, RMDQ, and Physical Component Summary (PCS-36). The ceiling effect was 0.5% for PAIN compared with 2% for MODI and 5% for RMQ. The CAT-5D-QOL scales correlated as anticipated with other measures of HRQL and discriminated well according to the level of satisfaction with current symptoms, duration of the last episode, sciatica, and disability compensation. The average relative discrimination index was 0.87 for PAIN, 0.67 for DAILY and 0.62 for WALK, compared with 0.89 for MODI, 0.80 for RMDQ, and 0.59 for PCS-36. CONCLUSION: The CAT-5D-QOL is feasible, reliable, valid, and efficient in patients with back pain. This methodology can be recommended for use in back pain research and should improve outcome assessment, facilitate comparisons across studies, and reduce patient burden. %B Spine %7 2008/05/23 %V 33 %P 1384-90 %8 May 20 %@ 1528-1159 (Electronic)0362-2436 (Linking) %G eng %M 18496353 %0 Conference Paper %B Joint Meeting on Adolescent Treatment Effectiveness %D 2008 %T Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing %A Dennis, M. L. %A Funk, R. %A Titus, J. %A Riley, B. B. %A Hosman, S. %A Kinne, S. %B Joint Meeting on Adolescent Treatment Effectiveness %C Washington D.C., USA %8 2008 %G eng %( 2008 %) ADDED 1 Aug 2008 %F 205795 %0 Journal Article %J Depression and Anxiety %D 2008 %T Functioning and validity of a computerized adaptive test to measure anxiety (A CAT) %A Becker, J. %A Fliege, H. %A Kocalevent, R. D. %A Bjorner, J. B. %A Rose, M. %A Walter, O. B. %A Klapp, B. F. %X Background: The aim of this study was to evaluate the Computerized Adaptive Test to measure anxiety (A-CAT), a patient-reported outcome questionnaire that uses computerized adaptive testing to measure anxiety. Methods: The A-CAT builds on an item bank of 50 items that has been built using conventional item analyses and item response theory analyses. The A-CAT was administered on Personal Digital Assistants to n=357 patients diagnosed and treated at the department of Psychosomatic Medicine and Psychotherapy, Charité Berlin, Germany. For validation purposes, two subgroups of patients (n=110 and 125) answered the A-CAT along with established anxiety and depression questionnaires. Results: The A-CAT was fast to complete (on average in 2 min, 38 s) and a precise item response theory based CAT score (reliability>.9) could be estimated after 4–41 items. On average, the CAT displayed 6 items (SD=4.2). Convergent validity of the A-CAT was supported by correlations to existing tools (Hospital Anxiety and Depression Scale-A, Beck Anxiety Inventory, Berliner Stimmungs-Fragebogen A/D, and State Trait Anxiety Inventory: r=.56–.66); discriminant validity between diagnostic groups was higher for the A-CAT than for other anxiety measures. Conclusions: The German A-CAT is an efficient, reliable, and valid tool for assessing anxiety in patients suffering from anxiety disorders and other conditions with significant potential for initial assessment and long-term treatment monitoring. Future research directions are to explore content balancing of the item selection algorithm of the CAT, to norm the tool to a healthy sample, and to develop practical cutoff scores. Depression and Anxiety, 2008. © 2008 Wiley-Liss, Inc. %B Depression and Anxiety %V 25 %P E182-E194 %@ 1520-6394 %G eng %0 Journal Article %J Zeitschrift für Psychologie %D 2008 %T ICAT: An adaptive testing procedure for the identification of idiosyncratic knowledge patterns %A Kingsbury, G. G. %A Houser, R.L. %K computerized adaptive testing %X

Traditional adaptive tests provide an efficient method for estimating student achievements levels, by adjusting the characteristicsof the test questions to match the performance of each student. These traditional adaptive tests are not designed to identify diosyncraticknowledge patterns. As students move through their education, they learn content in any number of different ways related to their learning style and cognitive development. This may result in a student having different achievement levels from one content area to another within a domain of content. This study investigates whether such idiosyncratic knowledge patterns exist. It discusses the differences between idiosyncratic knowledge patterns and multidimensionality. Finally, it proposes an adaptive testing procedure that can be used to identify a student’s areas of strength and weakness more efficiently than current adaptive testing approaches. The findings of the study indicate that a fairly large number of students may have test results that are influenced by their idiosyncratic knowledge patterns. The findings suggest that these patterns persist across time for a large number of students, and that the differences in student performance between content areas within a subject domain are large enough to allow them to be useful in instruction. Given the existence of idiosyncratic patterns of knowledge, the proposed testing procedure may enable us to provide more useful information to teachers. It should also allow us to differentiate between idiosyncratic patterns or knowledge, and important mutidimensionality in the testing data.

%B Zeitschrift für Psychologie %V 216 %P 40-48 %G eng %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T ICAT: An adaptive testing procedure for the identification of idiosyncratic knowledge patterns %A Kingsbury, G. G. %A Houser, R.L. %B Zeitschrift für Psychologie / Journal of Psychology %V 216(1) %P 40–48 %G eng %0 Journal Article %J BMC Musculoskelet Disorders %D 2008 %T An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain %A Elhan, A. H. %A Oztuna, D. %A Kutlay, S. %A Kucukdeveci, A. A. %A Tennant, A. %X ABSTRACT: BACKGROUND: Recent approaches to outcome measurement involving Computerized Adaptive Testing (CAT) offer an approach for measuring disability in low back pain (LBP) in a way that can reduce the burden upon patient and professional. The aim of this study was to explore the potential of CAT in LBP for measuring disability as defined in the International Classification of Functioning, Disability and Health (ICF) which includes impairments, activity limitation, and participation restriction. METHODS: 266 patients with low back pain answered questions from a range of widely used questionnaires. An exploratory factor analysis (EFA) was used to identify disability dimensions which were then subjected to Rasch analysis. Reliability was tested by internal consistency and person separation index (PSI). Discriminant validity of disability levels were evaluated by Spearman correlation coefficient (r), intraclass correlation coefficient [ICC(2,1)] and the Bland-Altman approach. A CAT was developed for each dimension, and the results checked against simulated and real applications from a further 133 patients. RESULTS: Factor analytic techniques identified two dimensions named "body functions" and "activity-participation". After deletion of some items for failure to fit the Rasch model, the remaining items were mostly free of Differential Item Functioning (DIF) for age and gender. Reliability exceeded 0.90 for both dimensions. The disability levels generated using all items and those obtained from the real CAT application were highly correlated (i.e. >0.97 for both dimensions). On average, 19 and 14 items were needed to estimate the precise disability levels using the initial CAT for the first and second dimension. However, a marginal increase in the standard error of the estimate across successive iterations substantially reduced the number of items required to make an estimate. CONCLUSIONS: Using a combination approach of EFA and Rasch analysis this study has shown that it is possible to calibrate items onto a single metric in a way that can be used to provide the basis of a CAT application. Thus there is an opportunity to obtain a wide variety of information to evaluate the biopsychosocial model in its more complex forms, without necessarily increasing the burden of information collection for patients. %B BMC Musculoskelet Disorders %7 2008/12/20 %V 9 %P 166 %8 Dec 18 %@ 1471-2474 (Electronic) %G Eng %M 19094219 %0 Journal Article %J Psychiatric Services %D 2008 %T Using computerized adaptive testing to reduce the burden of mental health assessment %A Gibbons, R. D. %A Weiss, D. J. %A Kupfer, D. J. %A Frank, E. %A Fagiolini, A. %A Grochocinski, V. J. %A Bhaumik, D. K. %A Stover, A. %A Bock, R. D. %A Immekus, J. C. %K *Diagnosis, Computer-Assisted %K *Questionnaires %K Adolescent %K Adult %K Aged %K Agoraphobia/diagnosis %K Anxiety Disorders/diagnosis %K Bipolar Disorder/diagnosis %K Female %K Humans %K Male %K Mental Disorders/*diagnosis %K Middle Aged %K Mood Disorders/diagnosis %K Obsessive-Compulsive Disorder/diagnosis %K Panic Disorder/diagnosis %K Phobic Disorders/diagnosis %K Reproducibility of Results %K Time Factors %X OBJECTIVE: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. METHODS: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. RESULTS: Tests of competing models based on item response theory supported the scale's bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients--one with bipolar disorder and one without--on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. CONCLUSIONS: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden. %B Psychiatric Services %7 2008/04/02 %V 59 %P 361-8 %8 Apr %@ 1075-2730 (Print) %G eng %M 18378832 %0 Book Section %D 2007 %T Comparison of computerized adaptive testing and classical methods for measuring individual change %A Kim-Kang, G. %A Weiss, D. J. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Quality of Life Research %D 2007 %T Development and evaluation of a computer adaptive test for “Anxiety” (Anxiety-CAT) %A Walter, O. B. %A Becker, J. %A Bjorner, J. B. %A Fliege, H. %A Klapp, B. F. %A Rose, M. %B Quality of Life Research %V 16 %P 143-155 %G eng %0 Book Section %D 2007 %T ICAT: An adaptive testing procedure to allow the identification of idiosyncratic knowledge patterns %A Kingsbury, G. G. %A Houser, R.L. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T An evaluation of a patient-reported outcomes found computerized adaptive testing was efficient in assessing osteoarthritis impact %A Kosinski, M. %A Bjorner, J. %A Warejr, J. %A Sullivan, E. %A Straus, W. %X BACKGROUND AND OBJECTIVES: Evaluate a patient-reported outcomes questionnaire that uses computerized adaptive testing (CAT) to measure the impact of osteoarthritis (OA) on functioning and well-being. MATERIALS AND METHODS: OA patients completed 37 questions about the impact of OA on physical, social and role functioning, emotional well-being, and vitality. Questionnaire responses were calibrated and scored using item response theory, and two scores were estimated: a Total-OA score based on patients' responses to all 37 questions, and a simulated CAT-OA score where the computer selected and scored the five most informative questions for each patient. Agreement between Total-OA and CAT-OA scores was assessed using correlations. Discriminant validity of Total-OA and CAT-OA scores was assessed with analysis of variance. Criterion measures included OA pain and severity, patient global assessment, and missed work days. RESULTS: Simulated CAT-OA and Total-OA scores correlated highly (r = 0.96). Both Total-OA and simulated CAT-OA scores discriminated significantly between patients differing on the criterion measures. F-statistics across criterion measures ranged from 39.0 (P < .001) to 225.1 (P < .001) for the Total-OA score, and from 40.5 (P < .001) to 221.5 (P < .001) for the simulated CAT-OA score. CONCLUSIONS: CAT methods produce valid and precise estimates of the impact of OA on functioning and well-being with significant reduction in response burden. %B Journal of Clinical Epidemiology %V 59 %P 715-723 %@ 08954356 %G eng %0 Journal Article %J Applied Measurement in Education %D 2005 %T Constructing a Computerized Adaptive Test for University Applicants With Disabilities %A Moshinsky, Avital %A Kazin, Cathrael %B Applied Measurement in Education %V 18 %P 381-405 %U http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1804_3 %R 10.1207/s15324818ame1804_3 %0 Journal Article %J Quality of Life Research %D 2005 %T Development of a computer-adaptive test for depression (D-CAT) %A Fliege, H. %A Becker, J. %A Walter, O. B. %A Bjorner, J. B. %A Klapp, B. F. %A Rose, M. %B Quality of Life Research %V 14 %P 2277–2291 %G eng %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Report %D 2005 %T The use of person-fit statistics in computerized adaptive testing %A Meijer, R. R. %A van Krimpen-Stoop, E. M. L. A. %B LSAC Research Report Series %I Law School Administration Council %C Newton, PA. USA %8 September, 2005 %@ Computerized Testing Report 97-14 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2004 %T Computer adaptive testing and the No Child Left Behind Act %A Kingsbury, G. G. %A Hauser, C. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego CA %G eng %0 Journal Article %J BMC Psychiatry %D 2004 %T Computerized adaptive measurement of depression: A simulation study %A Gardner, W. %A Shear, K. %A Kelleher, K. J. %A Pajer, K. A. %A Mammen, O. %A Buysse, D. %A Frank, E. %K *Computer Simulation %K Adult %K Algorithms %K Area Under Curve %K Comparative Study %K Depressive Disorder/*diagnosis/epidemiology/psychology %K Diagnosis, Computer-Assisted/*methods/statistics & numerical data %K Factor Analysis, Statistical %K Female %K Humans %K Internet %K Male %K Mass Screening/methods %K Patient Selection %K Personality Inventory/*statistics & numerical data %K Pilot Projects %K Prevalence %K Psychiatric Status Rating Scales/*statistics & numerical data %K Psychometrics %K Research Support, Non-U.S. Gov't %K Research Support, U.S. Gov't, P.H.S. %K Severity of Illness Index %K Software %X Background: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity. %B BMC Psychiatry %V 4 %P 13-23 %G eng %M 15132755 %0 Book Section %D 2004 %T Computerized adaptive testing and item banking %A Bjorner, J. B. %A Kosinski, M. %A Ware, J. E %A Jr. %C P. M. Fayers and R. D. Hays (Eds.) Assessing Quality of Life. Oxford: Oxford University Press. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Computerized adaptive testing with multiple-form structures %A Armstrong, R. D. %A Jones, D. H. %A Koppel, N. B. %A Pashley, P. J. %K computerized adaptive testing %K Law School Admission Test %K multiple-form structure %K testlets %X A multiple-form structure (MFS) is an ordered collection or network of testlets (i.e., sets of items). An examinee's progression through the network of testlets is dictated by the correctness of an examinee's answers, thereby adapting the test to his or her trait level. The collection of paths through the network yields the set of all possible test forms, allowing test specialists the opportunity to review them before they are administered. Also, limiting the exposure of an individual MFS to a specific period of time can enhance test security. This article provides an overview of methods that have been developed to generate parallel MFSs. The approach is applied to the assembly of an experimental computerized Law School Admission Test (LSAT). (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 28 %P 147-164 %@ 0146-6216 (Print) %G eng %M 2004-13800-001 %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Computerized Adaptive Testing With Multiple-Form Structures %A Armstrong, Ronald D. %A Jones, Douglas H. %A Koppel, Nicole B. %A Pashley, Peter J. %X

A multiple-form structure (MFS) is an orderedcollection or network of testlets (i.e., sets of items).An examinee’s progression through the networkof testlets is dictated by the correctness of anexaminee’s answers, thereby adapting the test tohis or her trait level. The collection of pathsthrough the network yields the set of all possibletest forms, allowing test specialists the opportunityto review them before they are administered. Also,limiting the exposure of an individual MFS to aspecific period of time can enhance test security.This article provides an overview of methods thathave been developed to generate parallel MFSs.The approach is applied to the assembly of anexperimental computerized Law School Admission Test (LSAT).

%B Applied Psychological Measurement %V 28 %P 147-164 %U http://apm.sagepub.com/content/28/3/147.abstract %R 10.1177/0146621604263652 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2004 %T Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Kosinski, M. %A Ni, P. %K Boston %K Factor Analysis, Statistical %K Humans %K Outcome Assessment (Health Care)/*methods %K Prospective Studies %K Questionnaires/standards %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals. %B Archives of Physical Medicine and Rehabilitation %7 2004/04/15 %V 85 %P 661-6 %8 Apr %@ 0003-9993 (Print) %G eng %M 15083444 %0 Journal Article %J Quality of Life Research %D 2004 %T Validating the German computerized adaptive test for anxiety on healthy sample (A-CAT) %A Becker, J. %A Walter, O. B. %A Fliege, H. %A Bjorner, J. B. %A Kocalevent, R. D. %A Schmid, G. %A Klapp, B. F. %A Rose, M. %B Quality of Life Research %V 13 %P 1515 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Calibrating CAT pools and online pretest items using nonparametric and adjusted marginal maximum likelihood methods %A Krass, I. A. %A Williams, B. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Quality of Life Research %D 2003 %T Calibration of an item pool for assessing the burden of headaches: an application of item response theory to the Headache Impact Test (HIT) %A Bjorner, J. B. %A Kosinski, M. %A Ware, J. E., Jr. %K *Cost of Illness %K *Decision Support Techniques %K *Sickness Impact Profile %K Adolescent %K Adult %K Aged %K Comparative Study %K Disability Evaluation %K Factor Analysis, Statistical %K Headache/*psychology %K Health Surveys %K Human %K Longitudinal Studies %K Middle Aged %K Migraine/psychology %K Models, Psychological %K Psychometrics/*methods %K Quality of Life/*psychology %K Software %K Support, Non-U.S. Gov't %X BACKGROUND: Measurement of headache impact is important in clinical trials, case detection, and the clinical monitoring of patients. Computerized adaptive testing (CAT) of headache impact has potential advantages over traditional fixed-length tests in terms of precision, relevance, real-time quality control and flexibility. OBJECTIVE: To develop an item pool that can be used for a computerized adaptive test of headache impact. METHODS: We analyzed responses to four well-known tests of headache impact from a population-based sample of recent headache sufferers (n = 1016). We used confirmatory factor analysis for categorical data and analyses based on item response theory (IRT). RESULTS: In factor analyses, we found very high correlations between the factors hypothesized by the original test constructers, both within and between the original questionnaires. These results suggest that a single score of headache impact is sufficient. We established a pool of 47 items which fitted the generalized partial credit IRT model. By simulating a computerized adaptive health test we showed that an adaptive test of only five items had a very high concordance with the score based on all items and that different worst-case item selection scenarios did not lead to bias. CONCLUSION: We have established a headache impact item pool that can be used in CAT of headache impact. %B Quality of Life Research %V 12 %P 913-933 %G eng %M 14661767 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T A comparison of item exposure control procedures using a CAT system based on the generalized partial credit model %A Burt, W. M %A Kim, S.-J %A Davis, L. L. %A Dodd, B. G. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Journal Article %J Quality of Life Research %D 2003 %T The feasibility of applying item response theory to measures of migraine impact: a re-analysis of three clinical studies %A Bjorner, J. B. %A Kosinski, M. %A Ware, J. E., Jr. %K *Sickness Impact Profile %K Adolescent %K Adult %K Aged %K Comparative Study %K Cost of Illness %K Factor Analysis, Statistical %K Feasibility Studies %K Female %K Human %K Male %K Middle Aged %K Migraine/*psychology %K Models, Psychological %K Psychometrics/instrumentation/*methods %K Quality of Life/*psychology %K Questionnaires %K Support, Non-U.S. Gov't %X BACKGROUND: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. OBJECTIVES: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact--the Migraine Specific Questionnaire (MSQ). METHODS: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. RESULTS: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. CONCLUSIONS: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing. %B Quality of Life Research %V 12 %P 887-902 %G eng %M 14661765 %0 Book Section %B New developments in psychometrics %D 2003 %T Item selection in polytomous CAT %A Veldkamp, B. P. %E A. Okada %E K. Shigenasu %E Y. Kano %E J. Meulman %K computerized adaptive testing %B New developments in psychometrics %I Psychometric Society, Springer %C Tokyo, Japan %P 207–214 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Recalibration of IRT item parameters in CAT: Sparse data matrices and missing data treatments %A Harmes, J. C. %A Parshall, C. G. %A Kromrey, J. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Psychometrika %D 2003 %T Using response times to detect aberrant responses in computerized adaptive testing %A van der Linden, W. J. %A van Krimpen-Stoop, E. M. L. A. %K Adaptive Testing %K Behavior %K Computer Assisted Testing %K computerized adaptive testing %K Models %K person Fit %K Prediction %K Reaction Time %X A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered. %B Psychometrika %V 68 %P 251-265 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Detection of person misfit in computerized adaptive tests with polytomous items %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %X Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. For a computerized adaptive test (CAT) using dichotomous items, several person-fit statistics for detecting mis.tting item score patterns have been proposed. Both for paper-and-pencil (P&P) tests and CATs, detection ofperson mis.t with polytomous items is hardly explored. In this study, the nominal and empirical null distributions ofthe standardized log-likelihood statistic for polytomous items are compared both for P&P tests and CATs. Results showed that the empirical distribution of this statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Second, a new person-fit statistic based on the cumulative sum (CUSUM) procedure from statistical process control was proposed. By means ofsimulated data, critical values were determined that can be used to classify a pattern as fitting or misfitting. The effectiveness of the CUSUM to detect simulees with item preknowledge was investigated. Detection rates using the CUSUM were high for realistic numbers ofdisclosed items. %B Applied Psychological Measurement %V 26 %P 164-180 %G eng %0 Conference Paper %B annual meeting of the American Educational Research Association %D 2002 %T An empirical comparison of achievement level estimates from adaptive tests and paper-and-pencil tests %A Kingsbury, G. G. %K computerized adaptive testing %B annual meeting of the American Educational Research Association %C New Orleans, LA. USA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T An empirical comparison of achievement level estimates from adaptive tests and paper-and-pencil tests %A Kingsbury, G. G. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book Section %B Item generation for test development %D 2002 %T Generating abstract reasoning items with cognitive theory %A Embretson, S. E. %E P. Kyllomen %K Cognitive Processes %K Measurement %K Reasoning %K Test Construction %K Test Items %K Test Validity %K Theories %X (From the chapter) Developed and evaluated a generative system for abstract reasoning items based on cognitive theory. The cognitive design system approach was applied to generate matrix completion problems. Study 1 involved developing the cognitive theory with 191 college students who were administered Set I and Set II of the Advanced Progressive Matrices. Study 2 examined item generation by cognitive theory. Study 3 explored the psychometric properties and construct representation of abstract reasoning test items with 728 young adults. Five structurally equivalent forms of Abstract Reasoning Test (ART) items were prepared from the generated item bank and administered to the Ss. In Study 4, the nomothetic span of construct validity of the generated items was examined with 728 young adults who were administered ART items, and 217 young adults who were administered ART items and the Advanced Progressive Matrices. Results indicate the matrix completion items were effectively generated by the cognitive design system approach. (PsycINFO Database Record (c) 2005 APA ) %B Item generation for test development %I Lawrence Erlbaum Associates, Inc. %C Mahwah, N.J. USA %P 219-250 %G eng %0 Journal Article %J Medical Care %D 2002 %T Multidimensional adaptive testing for mental health problems in primary care %A Gardner, W. %A Kelleher, K. J. %A Pajer, K. A. %K Adolescent %K Child %K Child Behavior Disorders/*diagnosis %K Child Health Services/*organization & administration %K Factor Analysis, Statistical %K Female %K Humans %K Linear Models %K Male %K Mass Screening/*methods %K Parents %K Primary Health Care/*organization & administration %X OBJECTIVES: Efficient and accurate instruments for assessing child psychopathology are increasingly important in clinical practice and research. For example, screening in primary care settings can identify children and adolescents with disorders that may otherwise go undetected. However, primary care offices are notorious for the brevity of visits and screening must not burden patients or staff with long questionnaires. One solution is to shorten assessment instruments, but dropping questions typically makes an instrument less accurate. An alternative is adaptive testing, in which a computer selects the items to be asked of a patient based on the patient's previous responses. This research used a simulation to test a child mental health screen based on this technology. RESEARCH DESIGN: Using half of a large sample of data, a computerized version was developed of the Pediatric Symptom Checklist (PSC), a parental-report psychosocial problem screen. With the unused data, a simulation was conducted to determine whether the Adaptive PSC can reproduce the results of the full PSC with greater efficiency. SUBJECTS: PSCs were completed by parents on 21,150 children seen in a national sample of primary care practices. RESULTS: Four latent psychosocial problem dimensions were identified through factor analysis: internalizing problems, externalizing problems, attention problems, and school problems. A simulated adaptive test measuring these traits asked an average of 11.6 questions per patient, and asked five or fewer questions for 49% of the sample. There was high agreement between the adaptive test and the full (35-item) PSC: only 1.3% of screening decisions were discordant (kappa = 0.93). This agreement was higher than that obtained using a comparable length (12-item) short-form PSC (3.2% of decisions discordant; kappa = 0.84). CONCLUSIONS: Multidimensional adaptive testing may be an accurate and efficient technology for screening for mental health problems in primary care settings. %B Medical Care %7 2002/09/10 %V 40 %P 812-23 %8 Sep %@ 0025-7079 (Print)0025-7079 (Linking) %G eng %M 12218771 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Application of score information for CAT pool development and its connection with "likelihood test information %A Krass, I. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Journal Article %J Language Learning and Technology %D 2001 %T Concerns with computerized adaptive oral proficiency assessment. A commentary on "Comparing examinee attitudes Toward computer-assisted and other oral proficient assessments": Response to the Norris Commentary %A Norris, J. M. %A Kenyon, D. M. %A Malabonga, V. %X Responds to an article on computerized adaptive second language (L2) testing, expressing concerns about the appropriateness of such tests for informing language educators about the language skills of L2 learners and users and fulfilling the intended purposes and achieving the desired consequences of language test use.The authors of the original article respond. (Author/VWL) %B Language Learning and Technology %V 5 %P 95-108 %G eng %M EJ625007 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2001 %T CUSUM-based person-fit statistics for adaptive testing %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %B Journal of Educational and Behavioral Statistics %V 26 %P 199-218 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Nearest neighbors, simple strata, and probabilistic parameters: An empirical comparison of methods for item exposure control in CATs %A Parshall, C. G. %A Kromrey, J. D. %A Harmes, J. C. %A Sentovich, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Generic %D 2001 %T Online item parameter recalibration: Application of missing data treatments to overcome the effects of sparse data conditions in a computerized adaptive version of the MCAT %A Harmes, J. C. %A Kromrey, J. D. %A Parshall, C. G. %C Unpublished manuscript %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Using response times to detect aberrant behavior in computerized adaptive testing %A van der Linden, W. J. %A van Krimpen-Stoop, E. M. L. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education in New Orleans %D 2000 %T Change in distribution of latent ability with item position in CAT sequence %A Krass, I. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education in New Orleans %C LA %G eng %0 Book Section %B Development of Computerised Middle School Achievement Tests %D 2000 %T Computer-adaptive testing: A methodology whose time has come %A Linacre, J. M. %E Kang, U. %E Jean, E. %E Linacre, J. M. %K computerized adaptive testing %B Development of Computerised Middle School Achievement Tests %I MESA %C Chicago, IL. USA %V 69 %G eng %0 Generic %D 2000 %T Computerized adaptive rating scales (CARS): Development and evaluation of the concept %A Borman, W. C. %A Hanson, M. A. %A Kubisiak, U. C. %A Buck, D. E. %C (Institute Rep No. 350). Tampa FL: Personnel Decisions Research Institute. %G eng %0 Book Section %D 2000 %T Detecting person misfit in adaptive testing using statistical process control techniques %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %C W. J. van der Linden, and C. A. W. Glas (Editors). Computerized Adaptive Testing: Theory and Practice. Norwell MA: Kluwer. %G eng %0 Book Section %B Computer adaptive testing: Theory and practice %D 2000 %T Detecting person misfit in adaptive testing using statistical process control techniques %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %K person Fit %B Computer adaptive testing: Theory and practice %I Kluwer Academic. %C Dordrecht, The Netherlands %P 201-219 %G eng %0 Generic %D 2000 %T Detection of person misfit in computerized adaptive testing with polytomous items (Research Report 00-01) %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (Research Report 2000-4) %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA: ACT Inc %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (ACT Research 2000-4) %A Chang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA, ACT, Inc %G eng %0 Report %D 2000 %T Estimating Item Parameters from Classical Indices for Item Pool Development with a Computerized Classification Test. %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. A. %I ACT, Inc. %C Iowa City, Iowa %G eng %0 Journal Article %J Florida Journal of Educational Research %D 2000 %T Item exposure control in computer-adaptive testing: The use of freezing to augment stratification %A Parshall, C. %A Harmes, J. C. %A Kromrey, J. D. %B Florida Journal of Educational Research %V 40 %P 28-52 %G eng %0 Journal Article %J Dissertation Abstracts International Section A: Humanities and Social Sciences %D 2000 %T Lagrangian relaxation for constrained curve-fitting with binary variables: Applications in educational testing %A Koppel, N. B. %K Analysis %K Educational Measurement %K Mathematical Modeling %K Statistical %X This dissertation offers a mathematical programming approach to curve fitting with binary variables. Various Lagrangian Relaxation (LR) techniques are applied to constrained curve fitting. Applications in educational testing with respect to test assembly are utilized. In particular, techniques are applied to both static exams (i.e. conventional paper-and-pencil (P&P)) and adaptive exams (i.e. a hybrid computerized adaptive test (CAT) called a multiple-forms structure (MFS)). This dissertation focuses on the development of mathematical models to represent these test assembly problems as constrained curve-fitting problems with binary variables and solution techniques for the test development. Mathematical programming techniques are used to generate parallel test forms with item characteristics based on item response theory. A binary variable is used to represent whether or not an item is present on a form. The problem of creating a test form is modeled as a network flow problem with additional constraints. In order to meet the target information and the test characteristic curves, a Lagrangian relaxation heuristic is applied to the problem. The Lagrangian approach works by multiplying the constraint by a "Lagrange multiplier" and adding it to the objective. By systematically varying the multiplier, the test form curves approach the targets. This dissertation explores modifications to Lagrangian Relaxation as it is applied to the classical paper-and-pencil exams. For the P&P exams, LR techniques are also utilized to include additional practical constraints to the network problem, which limit the item selection. An MFS is a type of a computerized adaptive test. It is a hybrid of a standard CAT and a P&P exam. The concept of an MFS will be introduced in this dissertation, as well as, the application of LR as it is applied to constructing parallel MFSs. The approach is applied to the Law School Admission Test for the assembly of the conventional P&P test as well as an experimental computerized test using MFSs. (PsycINFO Database Record (c) 2005 APA ) %B Dissertation Abstracts International Section A: Humanities and Social Sciences %V 61 %P 1063 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T The null distribution of person-fit statistics for conventional and adaptive tests %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %B Applied Psychological Measurement %V 23 %P 327-345 %G eng %0 Journal Article %J Psicologica %D 2000 %T Practical issues in developing and maintaining a computerized adaptive testing program %A Wise, S. L. %A Kingsbury, G. G. %B Psicologica %V 21 %P 135-155 %0 Journal Article %J Medical Care %D 2000 %T Response to Hays et al and McHorney and Cohen: Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales %A Ware, J. E., Jr. %A Bjorner, J. B. %A Kosinski, M. %B Medical Care %V 38 %P 73-82 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Sufficient simplicity or comprehensive complexity? A comparison of probabilitic and stratification methods of exposure control %A Parshall, C. G. %A Kromrey, J. D. %A Hogarty, K. Y. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Educatio %D 2000 %T Test security and item exposure control for computer-based %A Kalohn, J. %B Paper presented at the annual meeting of the National Council on Measurement in Educatio %C Chicago %G eng %0 Generic %D 2000 %T Using response times to detect aberrant behavior in computerized adaptive testing (Research Report 00-09) %A van der Linden, W. J. %A van Krimpen-Stoop, E. M. L. A. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Automated flawed item detection and graphical item used in on-line calibration of CAT-ASVAB. %A Krass, I. A. %A Thomasson, G. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T A comparison of conventional and adaptive testing procedures for making single-point decisions %A Kingsbury, G. G. %A A Zara %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Annual Meeting of the National Council on Measurement in Education %D 1999 %T Computerized testing – Issues and applications (Mini-course manual) %A Parshall, C. %A Davey, T. %A Spray, J. %A Kalohn, J. %B Annual Meeting of the National Council on Measurement in Education %C Montreal %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1999 %T CUSUM-based person-fit statistics for adaptive testing %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %B Journal of Educational and Behavioral Statistics %V 26 %P 199-218 %G eng %0 Generic %D 1999 %T CUSUM-based person-fit statistics for adaptive testing (Research Report 99-05) %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Book Section %D 1999 %T Developing computerized adaptive tests for school children %A Kingsbury, G. G. %A Houser, R.L. %C F. Drasgow and J. B. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 93-115). Mahwah NJ: Erlbaum. %G eng %0 Journal Article %J Quality of Life Newsletter %D 1999 %T Dynamic health assessments: The search for more practical and more precise outcomes measures %A Ware, J. E., Jr. %A Bjorner, J. B. %A Kosinski, M. %B Quality of Life Newsletter %P 11-13 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1999 %T The effect of model misspecification on classification decisions made using a computerized test %A Kalohn, J.C. %A Spray, J. A. %K computerized adaptive testing %X Many computerized testing algorithms require the fitting of some item response theory (IRT) model to examinees' responses to facilitate item selection, the determination of test stopping rules, and classification decisions. Some IRT models are thought to be particularly useful for small volume certification programs that wish to make the transition to computerized adaptive testing (CAT). The 1-parameter logistic model (1-PLM) is usually assumed to require a smaller sample size than the 3-parameter logistic model (3-PLM) for item parameter calibrations. This study examined the effects of model misspecification on the precision of the decisions made using the sequential probability ratio test. For this comparison, the 1-PLM was used to estimate item parameters, even though the items' characteristics were represented by a 3-PLM. Results demonstrate that the 1-PLM produced considerably more decision errors under simulation conditions similar to a real testing environment, compared to the true model and to a fixed-form standard reference set of items. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Journal of Educational Measurement %V 36 %P 47-59 %G eng %0 Journal Article %J Academic Medicine %D 1999 %T Evaluating the usefulness of computerized adaptive testing for medical in-course assessment %A Kreiter, C. D. %A Ferguson, K. %A Gruppen, L. D. %K *Automation %K *Education, Medical, Undergraduate %K Educational Measurement/*methods %K Humans %K Internal Medicine/*education %K Likelihood Functions %K Psychometrics/*methods %K Reproducibility of Results %X PURPOSE: This study investigated the feasibility of converting an existing computer-administered, in-course internal medicine test to an adaptive format. METHOD: A 200-item internal medicine extended matching test was used for this research. Parameters were estimated with commercially available software with responses from 621 examinees. A specially developed simulation program was used to retrospectively estimate the efficiency of the computer-adaptive exam format. RESULTS: It was found that the average test length could be shortened by almost half with measurement precision approximately equal to that of the full 200-item paper-and-pencil test. However, computer-adaptive testing with this item bank provided little advantage for examinees at the upper end of the ability continuum. An examination of classical item statistics and IRT item statistics suggested that adding more difficult items might extend the advantage to this group of examinees. CONCLUSIONS: Medical item banks presently used for incourse assessment might be advantageously employed in adaptive testing. However, it is important to evaluate the match between the items and the measurement objective of the test before implementing this format. %B Academic Medicine %7 1999/10/28 %V 74 %P 1125-8 %8 Oct %@ 1040-2446 (Print) %G eng %M 10536635 %! Acad Med %0 Conference Paper %B Paper presented at the annual meeting of the *?*. %D 1999 %T Formula score and direct optimization algorithms in CAT ASVAB on-line calibration %A Levine, M. V. %A Krass, I. A. %B Paper presented at the annual meeting of the *?*. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1999 %T Item exposure in adaptive tests: An empirical investigation of control strategies %A Parshall, C. %A Hogarty, K. %A Kromrey, J. %B Paper presented at the annual meeting of the Psychometric Society %C Lawrence KS %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T The null distribution of person-fit statistics for conventional and adaptive tests %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %B Applied Psychological Measurement %V 23 %P 327-345 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T A procedure to compare conventional and adaptive testing procedures for making single-point decisions %A Kingsbury, G. G. %A A Zara %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Standard errors of proficiency estimates in stratum scored CAT %A Kingsbury, G. G. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Journal Article %J Educational Assessment %D 1999 %T Threats to score comparability with applications to performance assessments and computerized adaptive tests %A Kolen, M. J. %X Develops a conceptual framework that addresses score comparability for performance assessments, adaptive tests, paper-and-pencil tests, and alternate item pools for computerized tests. Outlines testing situation aspects that might threaten score comparability and describes procedures for evaluating the degree of score comparability. Suggests ways to minimize threats to comparability. (SLD) %B Educational Assessment %V 6 %P 73-96 %G eng %M EJ604330 %0 Journal Article %J Educational Assessment %D 1999 %T Threats to score comparability with applications to performance assessments and computerized adaptive tests %A Kolen, M. J. %B Educational Assessment %V 6 %P 73-96 %G eng %0 Journal Article %J American Journal of Occupational Therapy %D 1999 %T The use of Rasch analysis to produce scale-free measurement of functional ability %A Velozo, C. A. %A Kielhofner, G. %A Lai, J-S. %K *Activities of Daily Living %K Disabled Persons/*classification %K Human %K Occupational Therapy/*methods %K Predictive Value of Tests %K Questionnaires/standards %K Sensitivity and Specificity %X Innovative applications of Rasch analysis can lead to solutions for traditional measurement problems and can produce new assessment applications in occupational therapy and health care practice. First, Rasch analysis is a mechanism that translates scores across similar functional ability assessments, thus enabling the comparison of functional ability outcomes measured by different instruments. This will allow for the meaningful tracking of functional ability outcomes across the continuum of care. Second, once the item-difficulty order of an instrument or item bank is established by Rasch analysis, computerized adaptive testing can be used to target items to the patient's ability level, reducing assessment length by as much as one half. More importantly, Rasch analysis can provide the foundation for "equiprecise" measurement or the potential to have precise measurement across all levels of functional ability. The use of Rasch analysis to create scale-free measurement of functional ability demonstrates how this methodlogy can be used in practical applications of clinical and outcome assessment. %B American Journal of Occupational Therapy %V 53 %P 83-90 %G eng %M 9926224 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T Application of direct optimization for on-line calibration in computerized adaptive testing %A Krass, I. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Conference Paper %B Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology %D 1998 %T Computerized adaptive rating scales that measure contextual performance %A Borman, W. C. %A Hanson, M. A. %A Montowidlo, S. J %A F Drasgow %A Foster, L %A Kubisiak, U. C. %B Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology %C Dallas TX %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T Effect of item selection on item exposure rates within a computerized classification test %A Kalohn, J.C. %A Spray, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Generic %D 1998 %T Person fit based on statistical process control in an adaptive testing environment (Research Report 98-13) %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Generic %D 1998 %T The relationship between computer familiarity and performance on computer-based TOEFL test tasks (Research Report 98-08) %A Taylor, C. %A Jamieson, J. %A Eignor, D. R. %A Kirsch, I. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1998 %T Simulating the null distribution of person-fit statistics for conventional and adaptive tests (Research Report 98-02) %A Meijer, R. R. %A van Krimpen-Stoop, E. M. L. A. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Generic %D 1998 %T Statistical tests for person misfit in computerized adaptive testing (Research Report 98-01) %A Glas, C. A. W. %A Meijer, R. R. %A van Krimpen-Stoop, E. M. L. A. %C Enschede, The Netherlands : University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Report %D 1998 %T Statistical tests for person misfit in computerized adaptive testing %A Glas, C. A. W. %A Meijer, R. R. %A van Krimpen-Stoop, E. M. %I Faculty of Educational Science and Technology, Univeersity of Twente %C Enschede, The Netherlands %P 28 %@ 98-01 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1997 %T Evaluating comparability in computerized adaptive testing: A theoretical framework with an example %A Wang, T. %A Kolen, M. J. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Conference Paper %B Paper presented at the 62nd Annual meeting of Psychometric Society %D 1997 %T Getting more precision on computer adaptive testing %A Krass, I. A. %B Paper presented at the 62nd Annual meeting of Psychometric Society %C University of Tennessee, Knoxville, TN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Item pool development and maintenance %A Kingsbury, G. G. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Some questions that must be addressed to develop and maintain an item pool for use in an adaptive test %A Kingsbury, G. G. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Book Section %D 1997 %T Validation of the experimental CAT-ASVAB system %A Segall, D. O. %A Moreno, K. E. %A Kieckhaefer, W. F. %A Vicino, F. L. %A J. R. McBride %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Item review and adaptive testing %A Kingsbury, G. G. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York %G eng %0 Journal Article %J Applied Psychological Measurement %D 1995 %T Computerized adaptive testing with polytomous items %A Dodd, B. G. %A De Ayala, R. J., %A Koch, W. R. %X Discusses polytomous item response theory models and the research that has been conducted to investigate a variety of possible operational procedures (item bank, item selection, trait estimation, stopping rule) for polytomous model-based computerized adaptive testing (PCAT). Studies are reviewed that compared PCAT systems based on competing item response theory models that are appropriate for the same measurement objective, as well as applications of PCAT in marketing and educational psychology. Directions for future research using PCAT are suggested. %B Applied Psychological Measurement %V 19 %P 5–22. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1995 %T Computerized Adaptive Testing With Polytomous Items %A Dodd, B. G. %A De Ayala, R. J. %A Koch. W.R., %B Applied Psychological Measurement %V 19 %P 5-22 %G English %N 1 %0 Conference Paper %B Paper presented at the Annual Meeting of the Psychometric Society %D 1995 %T The effect of model misspecification on classification decisions made using a computerized test: 3-PLM vs. 1PLM (and UIRT versus MIRT) %A Spray, J. A. %A Kalohn, J.C. %A Schulz, M. %A Fleer, P. Jr. %B Paper presented at the Annual Meeting of the Psychometric Society %C Minneapolis, MN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1995 %T The influence of examinee test-taking behavior motivation in computerized adaptive testing %A Kim, J. %A McLean, J. E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco CA %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1995 %T An investigation of procedures for computerized adaptive testing using the successive intervals Rasch model %A Koch, W. R. %A Dodd, B. G. %B Educational and Psychological Measurement %V 55 %P 976-990. %G eng %0 Book Section %D 1995 %T Prerequisite relationships for the adaptive assessment of knowledge %A Dowling, C. E. %A Kaluscha, R. %C Greer, J. (Ed.) Proceedings of AIED'95, 7th World Conference on Artificial Intelligence in Education, Washington, DC, AACE 43-50. %G eng %0 Journal Article %J Dissertation Abstracts International Section A: Humanities & Social Sciences %D 1994 %T Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing %A Kim, H-O. %K computerized adaptive testing %B Dissertation Abstracts International Section A: Humanities & Social Sciences %V 54 %P 2548 %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 1993 %T Assessing the utility of item response models: computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %K computerized adaptive testing %B Educational Measurement: Issues and Practice %V 12 %P 21-27 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1993 %T Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules %A Dodd, B. G. %A Koch, W. R. %A De Ayala, R. J., %X Simulated datasets were used to research the effects of the systematic variation of three major variables on the performance of computerized adaptive testing (CAT) procedures for the partial credit model. The three variables studied were the stopping rule for terminating the CATs, item pool size, and the distribution of the difficulty of the items in the pool. Results indicated that the standard error stopping rule performed better across the variety of CAT conditions than the minimum information stopping rule. In addition it was found that item pools that consisted of as few as 30 items were adequate for CAT provided that the item pool was of medium difficulty. The implications of these findings for implementing CAT systems based on the partial credit model are discussed. %B Educational and Psychological Measurement %V 53 %P 61-77. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Mid-South Educational Research Association %D 1993 %T Individual differences in computerized adaptive testing %A Kim, J. %B Paper presented at the annual meeting of the Mid-South Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T An investigation of restricted self-adapted testing %A Wise, S. L. %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta GA %G eng %0 Conference Paper %B Paper presented at the meeting of the National Council on Measurement in Education %D 1993 %T Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing %A Kim, H. %A Plake, B. S. %B Paper presented at the meeting of the National Council on Measurement in Education %C Atlanta, GA %G eng %0 Conference Paper %B Paper presented to the annual meeting of the American Educational Research Association: Atlanta GA. %D 1993 %T A practical examination of the use of free-response questions in computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented to the annual meeting of the American Educational Research Association: Atlanta GA. %G eng %0 Journal Article %J Applied Measurement in Education %D 1992 %T A comparison of the partial credit and graded response models in computerized adaptive testing %A De Ayala, R. J. %A Dodd, B. G. %A Koch, W. R. %B Applied Measurement in Education %V 5 %P 17-34 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1992 %T A comparison of the performance of simulated hierarchical and linear testlets %A Wainer, H., %A Kaplan, B. %A Lewis, C. %B Journal of Educational Measurement %V 29 %P 243-251 %G eng %0 Conference Paper %B Paper presented at the annual meeting if the American Educational Research Association %D 1992 %T Estimation of ability level by using only observable quantities in adaptive testing %A Kirisci, L. %B Paper presented at the annual meeting if the American Educational Research Association %C Chicago %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1992 %T Scaling of two-stage adaptive test configurations for achievement testing %A Hendrickson, A. B. %A Kolen, M. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Journal of Educational Measurement %D 1991 %T Building algebra testlets: A comparison of hierarchical and linear structures %A Wainer, H., %A Lewis, C. %A Kaplan, B. %A Braswell, J. %B Journal of Educational Measurement %V 8 %P xxx-xxx %G eng %0 Journal Article %J Dissertation Abstracts International %D 1991 %T A comparison of paper-and-pencil, computer-administered, computerized feedback, and computerized adaptive testing methods for classroom achievement testing %A Kuan, Tsung Hao %K computerized adaptive testing %B Dissertation Abstracts International %V 52 %P 1719 %G eng %0 Journal Article %J Applied Measurement in Education %D 1991 %T A comparison of procedures for content-sensitive item selection %A Kingsbury, G. G. %B Applied Measurement in Education %G eng %0 Journal Article %J Applied Measurement in Education %D 1991 %T A comparison of procedures for content-sensitive item selection in computerized adaptive tests %A Kingsbury, G. G. %A A Zara %B Applied Measurement in Education %V 4 %P 241-261 %G eng %0 Generic %D 1991 %T Some empirical guidelines for building testlets (Technical Report 91-56) %A Wainer, H., %A Kaplan, B. %A Lewis, C. %C Princeton NJ: Educational Testing Service, Program Statistics Research %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 1990 %T Adapting adaptive testing: Using the MicroCAT Testing System in a local school district %A Kingsbury, G. G. %B Educational Measurement: Issues and Practice %V 29 (2) %P 3-6 %G eng %0 Generic %D 1990 %T An adaptive algebra test: A testlet-based, hierarchically structured test with validity-based scoring %A Wainer, H., %A Lewis, C. %A Kaplan, B, %A Braswell, J. %C ETS Technical Report 90-92 %G eng %0 Conference Paper %B A paper presented to the annual meeting of the National Council of Measurement in Education %D 1990 %T Assessing the utility of item response models: Computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %B A paper presented to the annual meeting of the National Council of Measurement in Education %C Boston MA %G eng %0 Journal Article %J Measurement and Evaluation in Counseling and Development %D 1990 %T Computerized adaptive measurement of attitudes %A Koch, W. R. %A Dodd, B. G. %A Fitzpatrick, S. J. %B Measurement and Evaluation in Counseling and Development %V 23 %P 20-30 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1990 %T A simulation and comparison of flexilevel and Bayesian computerized adaptive testing %A De Ayala, R. J., %A Dodd, B. G. %A Koch, W. R. %K computerized adaptive testing %X Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT. %B Journal of Educational Measurement %V 27 %P 227-239 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Adaptive and Conventional Versions of the DAT: The First Complete Test Battery Comparison %A Henly, S. J. %A Klebe, K. J. %A J. R. McBride %A Cudeck, R. %B Applied Psychological Measurement %V 13 %P 363-371 %G English %N 4 %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Adaptive and conventional versions of the DAT: The first complete test battery comparison %A Henly, S. J. %A Klebe, K. J. %A J. R. McBride %A Cudeck, R. %B Applied Psychological Measurement %V 13 %P 363-371 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1989 %T Assessing the impact of using item parameter estimates obtained from paper-and-pencil testing for computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Book Section %D 1989 %T Die Optimierung der Mebgenauikeit beim branched adaptiven Testen [Optimization of measurement precision for branched-adaptive testing %A Kubinger, K. D. %C K. D. Kubinger (Ed.), Moderne Testtheorie Ein Abrib samt neusten Beitrgen [Modern test theory Overview and new issues] (pp. 187-218). Weinhem, Germany: Beltz. %G eng %0 Journal Article %J Applied Measurement in Education %D 1989 %T An investigation of procedures for computerized adaptive testing using partial credit scoring %A Koch, W. R. %A Dodd, B. G. %B Applied Measurement in Education %V 2 %P 335-357 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Operational characteristics of adaptive testing procedures using the graded response model %A Dodd, B. G. %A Koch, W. R. %A De Ayala, R. J., %B Applied Psychological Measurement %V 13 %P 129-143 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Operational Characteristics of Adaptive Testing Procedures Using the Graded Response Model %A Dodd, B. G. %A Koch, W. R. %A De Ayala, R. J. %B Applied Psychological Measurement %V 13 %P 129-143 %G English %N 2 %0 Journal Article %J Applied Measurement in Education %D 1989 %T Procedures for selecting items for computerized adaptive tests %A Kingsbury, G. G. %A A Zara %B Applied Measurement in Education %V 2 %P 359-375 %G eng %0 Journal Article %J Journal of Personality Assessment %D 1989 %T Tailored interviewing: An application of item response theory for personality measurement %A Kamakura, W. A., %A Balasubramanian, S. K. %B Journal of Personality Assessment %V 53 %P 502-519 %G eng %0 Journal Article %J School Psychology Review %D 1988 %T Assessment of academic skills of learning disabled students with classroom microcomputers %A Watkins, M. W. %A Kush, J. C. %B School Psychology Review %V 17 %P 81-88 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1988 %T A comparison of achievement level estimates from computerized adaptive testing and paper-and-pencil testing %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1988 %T Computerized adaptive attitude measurement: A comparison of the graded response and rating scale models %A Dodd, B. G. %A Koch, W. R. %A De Ayala, R. J., %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans %G eng %0 Journal Article %J Technological Horizons in Education %D 1988 %T Computerized adaptive testing: A four-year-old pilot study shows that CAT can work %A Kingsbury, G. G. %A et. al. %B Technological Horizons in Education %V 16 (4) %P 73-76 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1988 %T A predictive analysis approach to adaptive testing %A Kirisci, L. %A Hsu, T.-C. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book Section %D 1988 %T On a Rasch-model-based test for non-computerized adaptive testing %A Kubinger, K. D. %C Langeheine, R. and Rost, J. (Ed.), Latent trait and latent class models. New York: Plenum Press. %G eng %0 Journal Article %J Journal of Educational Measurement %D 1987 %T CATS, testlets, and test construction: A rationale for putting test developers back into CAT %A Wainer, H., %A Kiely, G. L. %B Journal of Educational Measurement %V 32 %P 185-202 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1987 %T Computerized adaptive testing: A comparison of the nominal response model and the three-parameter logistic model %A De Ayala, R. J., %A Koch, W. R. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Washington DC %G eng %0 Generic %D 1987 %T Functional and design specifications for the National Council of State Boards of Nursing adaptive testing system %A A Zara %A Bosma, J. %A Kaplan, R. %C Unpublished manuscript %G eng %0 Journal Article %J Journal of Educational Measurement %D 1987 %T Item clusters and computerized adaptive testing: A case for testlets %A Wainer, H., %A Kiely, G. L. %B Journal of Educational Measurement %V 24 %P 185-201 %N 3 %0 Generic %D 1986 %T CATs, testlets, and test construction: A rationale for putting test developers back into CAT (Technical Report 86-71) %A Wainer, H., %A Kiely, G. L. %C Princeton NJ: Educational Testing Service, Program Statistics Research %G eng %0 Generic %D 1986 %T College Board computerized placement tests: Validation of an adaptive test of basic skills (Research Report 86-29) %A W. C. Ward %A Kline, R. G. %A Flaugher, J. %C Princeton NJ: Educational Testing Service. %G eng %0 Book Section %D 1986 %T Computerized adaptive testing: A pilot project %A Kingsbury, G. G. %C W. C. Ryan (ed.), Proceedings: NECC 86, National Educational Computing Conference (pp.172-176). Eugene OR: University of Oregon, International Council on Computers in Education. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1986 %T Operational characteristics of adaptive testing procedures using partial credit scoring %A Koch, W. R. %A Dodd. B. G. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco CA %G eng %0 Journal Article %J Dissertation Abstracts International %D 1985 %T Adaptive self-referenced testing as a procedure for the measurement of individual change due to instruction: A comparison of the reliabilities of change estimates obtained from conventional and adaptive testing procedures %A Kingsbury, G. G. %K computerized adaptive testing %B Dissertation Abstracts International %V 45 %P 3057 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1985 %T ALPHATAB: A lookup table for Bayesian computerized adaptive testing %A De Ayala, R. J., %A Koch, W. R. %B Applied Psychological Measurement %V 9 %P 326 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1985 %T Computerized adaptive attitude measurement %A Koch, W. R. %A Dodd, B. G. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Journal Article %J Journal of Consulting and Clinical Psychology %D 1985 %T Current developments and future directions in computerized personality assessment %A Butcher, J. N. %A Keller, L. S. %A Bacon, S. F. %X Although computer applications in personality assessment have burgeoned rapidly in recent years, the majority of these uses capitalize on the computer's speed, accuracy, and memory capacity rather than its potential for the development of new, flexible assessment strategies. A review of current examples of computer usage in personality assessment reveals wide acceptance of automated clerical tasks such as test scoring and even test administration. The computer is also assuming tasks previously reserved for expert clinicians, such as writing narrative interpretive reports from test results. All of these functions represent automation of established assessment devices and interpretive strategies. The possibility also exists of harnessing some of the computer's unique adaptive capabilities to alter standard devices and even develop new ones. Three proposed strategies for developing computerized adaptive personality tests are described, with the conclusion that the computer's potential in this area justifies a call for further research efforts., (C) 1985 by the American Psychological Association %B Journal of Consulting and Clinical Psychology %V 53 %P 803-815 %G eng %M 00004730-198512000-00007 %0 Conference Paper %B Proceedings of the 27th Annual Conference of the Military Testing Association %D 1985 %T A validity study of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery %A Moreno, K. E. %A Segall, D. O. %A Kieckhaefer, W. F. %B Proceedings of the 27th Annual Conference of the Military Testing Association %G eng %0 Book %D 1984 %T Adaptive self-referenced testing as a procedure for the measurement of individual change in instruction: A comparison of the reliabilities of change estimates obtained from conventional and adaptive testing procedures %A Kingsbury, G. G. %C Unpublished doctoral dissertation, Univerity of Minnesota, Minneapolis %G eng %0 Journal Article %J Applied Psychological Measurement %D 1984 %T Item Location Effects and Their Implications for IRT Equating and Adaptive Testing %A Kingston, N. M. %A Dorans, N. J. %B Applied Psychological Measurement %V 8 %P 147-154 %G English %N 2 %0 Generic %D 1983 %T Alternate forms reliability and concurrent validity of adaptive and conventional tests with military recruits %A Kiely, G. L. %A A Zara %A Weiss, D. J. %C Minneapolis MN: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory %G eng %0 Book Section %D 1983 %T A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure %A Kingsbury, G.G. %A Weiss, D. J. %C D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 1-8). New York: Academic Press. %G eng %0 Book Section %B New horizons in testing: Latent trait test theory and computerized adaptive testing %D 1983 %T A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. %A Kingsbury, G. G. %A Weiss, D. J. %B New horizons in testing: Latent trait test theory and computerized adaptive testing %I Academic Press. %C New York, NY. USA %P 258-283 %G eng %0 Book Section %D 1983 %T A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure %A Kingsbury, G. G. %A Weiss, D. J. %C D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257-283). New York: Academic Press. %G eng %0 Generic %D 1981 %T A validity comparison of adaptive and conventional strategies for mastery testing (Research Report 81-3) %A Kingsbury, G. G. %A Weiss, D. J. %C Minneapolis, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1980 %T An alternate-forms reliability and concurrent validity comparison of Bayesian adaptive and conventional ability tests (Research Report 80-5) %A Kingsbury, G. G. %A Weiss, D. J. %C Minneapolis, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1980 %T A comparison of adaptive, sequential, and conventional testing strategies for mastery decisions (Research Report 80-4) %A Kingsbury, G. G. %A Weiss, D. J. %C Minneapolis, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory %G eng %0 Book Section %D 1980 %T A comparison of ICC-based adaptive mastery testing and the Waldian probability ratio method %A Kingsbury, G. G. %A Weiss, D. J. %C D. J. Weiss (Ed.). Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 120-139). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1980 %T Computerized instructional adaptive testing model: Formulation and validation (AFHRL-TR-79-33, Final Report) %A Kalisch, S. J. %C Brooks Air Force Base TX: Air Force Human Resources Laboratory", Also Catalog of Selected Documents in Psychology, February 1981, 11, 20 (Ms. No, 2217) %G eng %0 Generic %D 1980 %T An empirical study of a broad range test of verbal ability %A Kreitzberg, C. B. %A Jones, D. J. %C Princeton NJ: Educational Testing Service %G eng %0 Book Section %D 1980 %T A model for computerized adaptive testing related to instructional situations %A Kalisch, S. J. %C D. J. Weiss (Ed.). Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 101-119). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. %G eng %0 Generic %D 1979 %T An adaptive testing strategy for mastery decisions (Research Report 79-5) %A Kingsbury, G. G. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1979 %T Problems in application of latent-trait models to tailored testing (Research Report 79-1) %A Koch, W. J. %A Reckase, M. D. %C Columbia MO: University of Missouri, Department of Psychology", (also presented at National Council on Measurement in Education, 1979: ERIC No. ED 177 196) note = " %G eng %0 Book %D 1979 %T The Rasch model in computerized personality testing %A Kunce, C. S. %C Ph.D. dissertation, University of Missouri, Columbia, 1979 %G eng %0 Book Section %D 1978 %T Applications of sequential testing procedures to performance testing %A Epstein, K. I. %A Knerr, C. S. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Journal Article %J Computers and Education %D 1978 %T Computerized adaptive testing: Principles and directions %A Kreitzberg, C. B. %B Computers and Education %V 2 (4) %P 319-329 %G eng %0 Journal Article %J Computers and Education %D 1978 %T Computerized adaptive testing: Principles and directions %A Kreitzberg, C. B. %A Stocking, M., %A Swanson, L. %B Computers and Education %V 2 %P 319-329 %G eng %0 Generic %D 1978 %T A live tailored testing comparison study of the one- and three-parameter logistic models (Research Report 78-1) %A Koch, W. J. %A Reckase, M. D. %C Columbia MO: University of Missouri, Department of Psychology %G eng %0 Conference Proceedings %B 1977 Computerized Adaptive Testing Conference %D 1977 %T Applications of sequential testing procedures to performance testing %A Epstein, K. I. %A Knerr, C. S. %B 1977 Computerized Adaptive Testing Conference %I University of Minnesota %C Minneapolis, MN. USA %G eng %0 Generic %D 1977 %T Calibration of an item pool for the adaptive measurement of achievement (Research Report 77-5) %A Bejar, I. I. %A Weiss, D. J. %A Kingsbury, G. G. %C Minneapolis: Department of Psychology, Psychometric Methods Program %G eng %0 Conference Paper %B Third International Conference on Educational Testing %D 1977 %T Real-data simulation of a proposal for tailored teting %A Killcross, M. C. %B Third International Conference on Educational Testing %C Leyden, The Netherlands %8 06/1977 %G eng %0 Book Section %D 1977 %T Student attitudes toward tailored testing %A Koch, W. R. %A Patience, W. M. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1977 %T TAILOR: A FORTRAN procedure for interactive tailored testing %A Cudeck, R. A. %A Cliff, N. A. %A Kehoe, J. %B Educational and Psychological Measurement %V 37 %P 767-769 %G eng %0 Generic %D 1976 %T A review of research in tailored testing (Report APRE No %A Killcross, M. C. %C 9/76, Farnborough, Hants, U. K.: Ministry of Defence, Army Personnel Research Establishment.) %G eng %0 Book %D 1974 %T The comparison of two tailored testing models and the effects of the models variables on actual loss %A Kalisch, S. J. %C Unpublished doctoral dissertation, Florida State University %G eng %0 Conference Paper %B Annual meeting of the National Council on Measurement in Education %D 1974 %T An empirical investigation of the stability and accuracy of flexilevel tests %A Kocher, A.T. %B Annual meeting of the National Council on Measurement in Education %C Chicago IL %8 03/1074 %G eng %0 Journal Article %J Journal of Computer-Based Instruction %D 1974 %T A tailored testing model employing the beta distribution and conditional difficulties %A Kalisch, S. J. %B Journal of Computer-Based Instruction %V 1 %P 22-28 %G eng %0 Generic %D 1974 %T A tailored testing model employing the beta distribution (unpublished manuscript) %A Kalisch, S. J. %C Florida State University, Educational Evaluation and Research Design Program %G eng %0 Conference Paper %B Paper presented at the 18th International Congress of Applied Psychology %D 1974 %T A tailored testing system for selection and allocation in the British Army %A Killcross, M. C. %B Paper presented at the 18th International Congress of Applied Psychology %C Montreal Canada %G eng %0 Conference Paper %B NATO Conference on Utilisation of Human Resources %D 1973 %T The potential use of tailored testing for allocation to army employments %A Killcross, M. C. %A Cassie, A %B NATO Conference on Utilisation of Human Resources %C Lisbon, Portugal %8 06/1973 %G eng %0 Journal Article %J Journal of Computer-Based Instruction %D 1973 %T A tailored testing model employing the beta distribution and conditional difficulties %A Kalisch, S. J. %B Journal of Computer-Based Instruction %V 1 %P 111-120 %G eng %0 Journal Article %J American Psychologist %D 1969 %T Use of an on-line computer for psychological testing with the up-and-down method %A Kappauf, W. E. %B American Psychologist %V 24 %P 207-211 %G eng %0 Generic %D 1959 %T Progress report on the sequential item test %A Krathwohl, D. %C East Lansing MI: Michigan State University, Bureau of Educational Research %G eng %0 Journal Article %J American Psychologist %D 1956 %T The sequential item test %A Krathwohl, D. R. %A Huyser, R. J. %B American Psychologist %V 2 %P 419 %G eng