TY - JOUR T1 - Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank JF - Assessment Y1 - In Press A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Philip Spinhoven A1 - Edwin de Beurs AB - We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. UR - https://doi.org/10.1177/1073191117746742 ER - TY - JOUR T1 - Time-Efficient Adaptive Measurement of Change JF - Journal of Computerized Adaptive Testing Y1 - 2019 A1 - Matthew Finkelman A1 - Chun Wang KW - adaptive measurement of change KW - computerized adaptive testing KW - Fisher information KW - item selection KW - response-time modeling AB -

The adaptive measurement of change (AMC) refers to the use of computerized adaptive testing (CAT) at multiple occasions to efficiently assess a respondent’s improvement, decline, or sameness from occasion to occasion. Whereas previous AMC research focused on administering the most informative item to a respondent at each stage of testing, the current research proposes the use of Fisher information per time unit as an item selection procedure for AMC. The latter procedure incorporates not only the amount of information provided by a given item but also the expected amount of time required to complete it. In a simulation study, the use of Fisher information per time unit item selection resulted in a lower false positive rate in the majority of conditions studied, and a higher true positive rate in all conditions studied, compared to item selection via Fisher information without accounting for the expected time taken. Future directions of research are suggested.

VL - 7 UR - http://iacat.org/jcat/index.php/jcat/article/view/73/35 IS - 2 ER - TY - JOUR T1 - Factors Affecting the Classification Accuracy and Average Length of a Variable-Length Cognitive Diagnostic Computerized Test JF - Journal of Computerized Adaptive Testing Y1 - 2018 A1 - Huebner, Alan A1 - Finkelman, Matthew D. A1 - Weissman, Alexander VL - 6 UR - http://iacat.org/jcat/index.php/jcat/article/view/55/30 IS - 1 ER - TY - JOUR T1 - Implementing Three CATs Within Eighteen Months JF - Journal of Computerized Adaptive Testing Y1 - 2018 A1 - Christian Spoden A1 - Andreas Frey A1 - Raphael Bernhardt VL - 6 UR - http://iacat.org/jcat/index.php/jcat/article/view/70/33 IS - 3 ER - TY - JOUR T1 - Latent Class Analysis of Recurrent Events in Problem-Solving Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Haochen Xu A1 - Guanhua Fang A1 - Yunxiao Chen A1 - Jingchen Liu A1 - Zhiliang Ying AB - Computer-based assessment of complex problem-solving abilities is becoming more and more popular. In such an assessment, the entire problem-solving process of an examinee is recorded, providing detailed information about the individual, such as behavioral patterns, speed, and learning trajectory. The problem-solving processes are recorded in a computer log file which is a time-stamped documentation of events related to task completion. As opposed to cross-sectional response data from traditional tests, process data in log files are massive and irregularly structured, calling for effective exploratory data analysis methods. Motivated by a specific complex problem-solving item “Climate Control” in the 2012 Programme for International Student Assessment, the authors propose a latent class analysis approach to analyzing the events occurred in the problem-solving processes. The exploratory latent class analysis yields meaningful latent classes. Simulation studies are conducted to evaluate the proposed approach. VL - 42 UR - https://doi.org/10.1177/0146621617748325 ER - TY - CONF T1 - Adaptive Item and Feedback Selection in Personalized Learning with a Network Approach T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nikky van Buuren A1 - Hendrik Straat A1 - Theo Eggen A1 - Jean-Paul Fox KW - feedback selection KW - item selection KW - network approach KW - personalized learning AB -

Personalized learning is a term used to describe educational systems that adapt student-specific curriculum sequencing, pacing, and presentation based on their unique backgrounds, knowledge, preferences, interests, and learning goals. (Chen, 2008; Netcoh, 2016). The technological approach to personalized learning provides data-driven models to incorporate these adaptations automatically. Examples of applications include online learning systems, educational games, and revision-aid systems. In this study we introduce Bayesian networks as a methodology to implement an adaptive framework within a personalized learning environment. Existing ideas from Computerized Adaptive Testing (CAT) with Item Response Theory (IRT), where choices about content provision are based on maximizing information, are related to the goals of personalized learning environments. Personalized learning entails other goals besides efficient ability estimation by maximizing information, such as an adaptive configuration of preferences and feedback to the student. These considerations will be discussed and their application in networks will be illustrated.

Adaptivity in Personalized Learning.In standard CAT’s there is a focus on selecting items that provide maximum information about the ability of an individual at a certain point in time (Van der Linden & Glas, 2000). When learning is the main goal of testing, alternative adaptive item selection methods were explored by Eggen (2012). The adaptive choices made in personalized learning applications require additional adaptivity with respect to the following aspects; the moment of feedback, the kind of feedback, and the possibility for students to actively influence the learning process.

Bayesian Networks and Personalized Learning.Personalized learning aims at constructing a framework to incorporate all the aspects mentioned above. Therefore, the goal of this framework is not only to focus on retrieving ability estimates by choosing items on maximum information, but also to construct a framework that allows for these other factors to play a role. Plajner and Vomlel (2016) have already applied Bayesian Networks to adaptive testing, selecting items with help of entropy reduction. Almond et al. (2015) provide a reference work on Bayesian Networks in Educational Assessment. Both acknowledge the potential of the method in terms of features such as modularity options to build finer-grained models. IRT does not allow to model sub-skills very easily and to gather information on fine-grained level, due to its dependency on the assumption of generally one underlying trait. The local independence assumption in IRT implies being interested in mainly the student’s overall ability on the subject of interest. When the goal is to improve student’s learning, we are not just interested in efficiently coming to their test score on a global subject. One wants a model that is able to map educational problems and talents in detail over the whole educational program, while allowing for dependency between items. The moment in time can influence topics to be better mastered than others, and this is exactly what we can to get out of a model. The possibility to model flexible structures, estimate abilities on a very detailed level for sub-skills and to easily incorporate other variables such as feedback in Bayesian Networks makes it a very promising method for making adaptive choices in personalized learning. It is shown in this research how item and feedback selection can be performed with help of the promising Bayesian Networks. A student involvement possibility is also introduced and evaluated.

References

Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian Networks in Educational Assessment. Test. New York: Springer Science+Business Media. http://doi.org/10.1007/978-0-387-98138-3

Eggen, T.J.H.M. (2012) Computerized Adaptive Testing Item Selection in Computerized Adaptive Learning Systems. In: Eggen. TJHM & Veldkamp, BP.. (Eds). Psychometrics in Practice at RCEC. Enschede: RCEC

Netcoh, S. (2016, March). “What Do You Mean by ‘Personalized Learning?’. Croscutting Conversations in Education – Research, Reflections & Practice. Blogpost.

Plajner, M., & Vomlel, J. (2016). Student Skill Models in Adaptive Testing. In Proceedings of the Eighth International Conference on Probabilistic Graphical Models (pp. 403-414).

Van der Linden, W. J., & Glas, C. A. (2000). Computerized adaptive testing: Theory and practice. Dordrecht: Kluwer Academic Publishers.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - JOUR T1 - Is a Computerized Adaptive Test More Motivating Than a Fixed-Item Test? JF - Applied Psychological Measurement Y1 - 2017 A1 - Guangming Ling A1 - Yigal Attali A1 - Bridgid Finn A1 - Elizabeth A. Stone AB - Computer adaptive tests provide important measurement advantages over traditional fixed-item tests, but research on the psychological reactions of test takers to adaptive tests is lacking. In particular, it has been suggested that test-taker engagement, and possibly test performance as a consequence, could benefit from the control that adaptive tests have on the number of test items examinees answer correctly. However, previous research on this issue found little support for this possibility. This study expands on previous research by examining this issue in the context of a mathematical ability assessment and by considering the possible effect of immediate feedback of response correctness on test engagement, test anxiety, time on task, and test performance. Middle school students completed a mathematics assessment under one of three test type conditions (fixed, adaptive, or easier adaptive) and either with or without immediate feedback about the correctness of responses. Results showed little evidence for test type effects. The easier adaptive test resulted in higher engagement and lower anxiety than either the adaptive or fixed-item tests; however, no significant differences in performance were found across test types, although performance was significantly higher across all test types when students received immediate feedback. In addition, these effects were not related to ability level, as measured by the state assessment achievement levels. The possibility that test experiences in adaptive tests may not in practice be significantly different than in fixed-item tests is raised and discussed to explain the results of this and previous studies. VL - 41 UR - https://doi.org/10.1177/0146621617707556 ER - TY - JOUR T1 - Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank JF - Evaluation & the Health Professions Y1 - 2017 A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Edwin de Beurs AB - We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. VL - 40 UR - https://doi.org/10.1177/0163278716684168 ER - TY - JOUR T1 - Heuristic Constraint Management Methods in Multidimensional Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2017 A1 - Sebastian Born A1 - Andreas Frey AB - Although multidimensional adaptive testing (MAT) has been proven to be highly advantageous with regard to measurement efficiency when several highly correlated dimensions are measured, there are few operational assessments that use MAT. This may be due to issues of constraint management, which is more complex in MAT than it is in unidimensional adaptive testing. Very few studies have examined the performance of existing constraint management methods (CMMs) in MAT. The present article focuses on the effectiveness of two promising heuristic CMMs in MAT for varying levels of imposed constraints and for various correlations between the measured dimensions. Through a simulation study, the multidimensional maximum priority index (MMPI) and multidimensional weighted penalty model (MWPM), as an extension of the weighted penalty model, are examined with regard to measurement precision and constraint violations. The results show that both CMMs are capable of addressing complex constraints in MAT. However, measurement precision losses were found to differ between the MMPI and MWPM. While the MMPI appears to be more suitable for use in assessment situations involving few to a moderate number of constraints, the MWPM should be used when numerous constraints are involved. VL - 77 UR - http://dx.doi.org/10.1177/0013164416643744 ER - TY - JOUR T1 - The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study JF - Quality of Life Research Y1 - 2017 A1 - Barthel, D. A1 - Otto, C. A1 - Nolte, S. A1 - Meyrose, A.-K. A1 - Fischer, F. A1 - Devine, J. A1 - Walter, O. A1 - Mierke, A. A1 - Fischer, K. I. A1 - Thyen, U. A1 - Klein, M. A1 - Ankermann, T. A1 - Rose, M. A1 - Ravens-Sieberer, U. AB - Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. VL - 26 UR - https://doi.org/10.1007/s11136-016-1437-9 ER - TY - JOUR T1 - On Computing the Key Probability in the Stochastically Curtailed Sequential Probability Ratio Test JF - Applied Psychological Measurement Y1 - 2016 A1 - Huebner, Alan R. A1 - Finkelman, Matthew D. AB - The Stochastically Curtailed Sequential Probability Ratio Test (SCSPRT) is a termination criterion for computerized classification tests (CCTs) that has been shown to be more efficient than the well-known Sequential Probability Ratio Test (SPRT). The performance of the SCSPRT depends on computing the probability that at a given stage in the test, an examinee’s current interim classification status will not change before the end of the test. Previous work discusses two methods of computing this probability, an exact method in which all potential responses to remaining items are considered and an approximation based on the central limit theorem (CLT) requiring less computation. Generally, the CLT method should be used early in the test when the number of remaining items is large, and the exact method is more appropriate at later stages of the test when few items remain. However, there is currently a dearth of information as to the performance of the SCSPRT when using the two methods. For the first time, the exact and CLT methods of computing the crucial probability are compared in a simulation study to explore whether there is any effect on the accuracy or efficiency of the CCT. The article is focused toward practitioners and researchers interested in using the SCSPRT as a termination criterion in an operational CCT. VL - 40 UR - http://apm.sagepub.com/content/40/2/142.abstract ER - TY - JOUR T1 - Stochastic Curtailment of Questionnaires for Three-Level Classification: Shortening the CES-D for Assessing Low, Moderate, and High Risk of Depression JF - Applied Psychological Measurement Y1 - 2016 A1 - Smits, Niels A1 - Finkelman, Matthew D. A1 - Kelderman, Henk AB - In clinical assessment, efficient screeners are needed to ensure low respondent burden. In this article, Stochastic Curtailment (SC), a method for efficient computerized testing for classification into two classes for observable outcomes, was extended to three classes. In a post hoc simulation study using the item scores on the Center for Epidemiologic Studies–Depression Scale (CES-D) of a large sample, three versions of SC, SC via Empirical Proportions (SC-EP), SC via Simple Ordinal Regression (SC-SOR), and SC via Multiple Ordinal Regression (SC-MOR) were compared at both respondent burden and classification accuracy. All methods were applied under the regular item order of the CES-D and under an ordering that was optimal in terms of the predictive power of the items. Under the regular item ordering, the three methods were equally accurate, but SC-SOR and SC-MOR needed less items. Under the optimal ordering, additional gains in efficiency were found, but SC-MOR suffered from capitalization on chance substantially. It was concluded that SC-SOR is an efficient and accurate method for clinical screening. Strengths and weaknesses of the methods are discussed. VL - 40 UR - http://apm.sagepub.com/content/40/1/22.abstract ER - TY - JOUR T1 - Comparing Simple Scoring With IRT Scoring of Personality Measures: The Navy Computer Adaptive Personality Scales JF - Applied Psychological Measurement Y1 - 2015 A1 - Oswald, Frederick L. A1 - Shaw, Amy A1 - Farmer, William L. AB -

This article analyzes data from U.S. Navy sailors (N = 8,956), with the central measure being the Navy Computer Adaptive Personality Scales (NCAPS). Analyses and results from this article extend and qualify those from previous research efforts by examining the properties of the NCAPS and its adaptive structure in more detail. Specifically, this article examines item exposure rates, the efficiency of item use based on item response theory (IRT)–based Expected A Posteriori (EAP) scoring, and a comparison of IRT-EAP scoring with much more parsimonious scoring methods that appear to work just as well (stem-level scoring and dichotomous scoring). The cutting-edge nature of adaptive personality testing will necessitate a series of future efforts like this: to examine the benefits of adaptive scoring schemes and novel measurement methods continually, while pushing testing technology further ahead.

VL - 39 UR - http://apm.sagepub.com/content/39/2/144.abstract ER - TY - JOUR T1 - Stochastic Curtailment in Adaptive Mastery Testing: Improving the Efficiency of Confidence Interval–Based Stopping Rules JF - Applied Psychological Measurement Y1 - 2015 A1 - Sie, Haskell A1 - Finkelman, Matthew D. A1 - Bartroff, Jay A1 - Thompson, Nathan A. AB - A well-known stopping rule in adaptive mastery testing is to terminate the assessment once the examinee’s ability confidence interval lies entirely above or below the cut-off score. This article proposes new procedures that seek to improve such a variable-length stopping rule by coupling it with curtailment and stochastic curtailment. Under the new procedures, test termination can occur earlier if the probability is high enough that the current classification decision remains the same should the test continue. Computation of this probability utilizes normality of an asymptotically equivalent version of the maximum likelihood ability estimate. In two simulation sets, the new procedures showed a substantial reduction in average test length while maintaining similar classification accuracy to the original method. VL - 39 UR - http://apm.sagepub.com/content/39/4/278.abstract ER - TY - JOUR T1 - Utilizing Response Times in Computerized Classification Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Sie, Haskell A1 - Finkelman, Matthew D. A1 - Riley, Barth A1 - Smits, Niels AB - A well-known approach in computerized mastery testing is to combine the Sequential Probability Ratio Test (SPRT) stopping rule with item selection to maximize Fisher information at the mastery threshold. This article proposes a new approach in which a time limit is defined for the test and examinees’ response times are considered in both item selection and test termination. Item selection is performed by maximizing Fisher information per time unit, rather than Fisher information itself. The test is terminated once the SPRT makes a classification decision, the time limit is exceeded, or there is no remaining item that has a high enough probability of being answered before the time limit. In a simulation study, the new procedure showed a substantial reduction in average testing time while slightly improving classification accuracy compared with the original method. In addition, the new procedure reduced the percentage of examinees who exceeded the time limit. VL - 39 UR - http://apm.sagepub.com/content/39/5/389.abstract ER - TY - JOUR T1 - Cognitive Diagnostic Models and Computerized Adaptive Testing: Two New Item-Selection Methods That Incorporate Response Times JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Finkelman, M. D. A1 - Kim, W. A1 - Weissman, A. A1 - Cook, R.J. VL - 2 UR - http://www.iacat.org/jcat/index.php/jcat/article/view/43/21 IS - 4 ER - TY - JOUR T1 - A Comparison of Computerized Classification Testing and Computerized Adaptive Testing in Clinical Psychology JF - Journal of Computerized Adaptive Testing Y1 - 2013 A1 - Smits, N. A1 - Finkelman, M. D. VL - 1 IS - 2 ER - TY - JOUR T1 - A Comparison of Exposure Control Procedures in CAT Systems Based on Different Measurement Models for Testlets JF - Applied Measurement in Education Y1 - 2013 A1 - Boyd, Aimee M. A1 - Dodd, Barbara A1 - Fitzpatrick, Steven VL - 26 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2013.765434 ER - TY - JOUR T1 - Item Ordering in Stochastically Curtailed Health Questionnaires With an Observable Outcome JF - Journal of Computerized Adaptive Testing Y1 - 2013 A1 - Finkelman, M. D. A1 - Kim, W. A1 - He, Y. A1 - Lai, A.M. VL - 1 IS - 3 ER - TY - CHAP T1 - Reporting differentiated literacy results in PISA by using multidimensional adaptive testing. T2 - Research on PISA. Y1 - 2013 A1 - Frey, A. A1 - Seitz, N-N. A1 - Kröhne, U. JF - Research on PISA. PB - Dodrecht: Springer ER - TY - JOUR T1 - A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2013 A1 - Wang, Chun A1 - Fan, Zhewen A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A. AB -

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the advantage of conciseness, but may suffer from reduced flexibility to fit real data. We propose a semiparametric approach, specifically, the Cox proportional hazards model with a latent speed covariate to model the RTs, embedded within the hierarchical framework proposed by van der Linden to model the RTs and response accuracy simultaneously. This semiparametric approach combines the flexibility of nonparametric modeling and the brevity and interpretability of the parametric modeling. A Markov chain Monte Carlo method for parameter estimation is given and may be used with sparse data obtained by computerized adaptive testing. Both simulation studies and real data analysis are carried out to demonstrate the applicability of the new model.

VL - 38 UR - http://jeb.sagepub.com/cgi/content/abstract/38/4/381 ER - TY - CHAP T1 - Adaptives Testen [Adaptive testing]. T2 - Testtheorie und Fragebogenkonstruktion Y1 - 2012 A1 - Frey, A. JF - Testtheorie und Fragebogenkonstruktion PB - Heidelberg: Springer CY - Berlin ER - TY - JOUR T1 - Development of a computerized adaptive test for depression JF - Archives of General Psychiatry Y1 - 2012 A1 - Robert D. Gibbons A1 - David .J. Weiss A1 - Paul A. Pilkonis A1 - Ellen Frank A1 - Tara Moore A1 - Jong Bae Kim A1 - David J. Kupfer VL - 69 UR - WWW.ARCHGENPSYCHIATRY.COM IS - 11 ER - TY - JOUR T1 - Multistage Computerized Adaptive Testing With Uniform Item Exposure JF - Applied Measurement in Education Y1 - 2012 A1 - Edwards, Michael C. A1 - Flora, David B. A1 - Thissen, David VL - 25 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.660363 ER - TY - JOUR T1 - Hypothetical use of multidimensional adaptive testing for the assessment of student achievement in PISA. JF - Educational and Psychological Measurement Y1 - 2011 A1 - Frey, A. A1 - Seitz, N-N. VL - 71 ER - TY - CONF T1 - The Use of Decision Trees for Adaptive Item Selection and Score Estimation T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Barth B. Riley A1 - Rodney Funk A1 - Michael L. Dennis A1 - Richard D. Lennox A1 - Matthew Finkelman KW - adaptive item selection KW - CAT KW - decision tree AB -

Conducted post-hoc simulations comparing the relative efficiency, and precision of decision trees (using CHAID and CART) vs. IRT-based CAT.

Conclusions

Decision tree methods were more efficient than CAT

But,...

Conclusions

CAT selects items based on two criteria: Item location relative to current estimate of theta, Item discrimination

Decision Trees select items that best discriminate between groups defined by the total score.

CAT is optimal only when trait level is well estimated.
Findings suggest that combining decision tree followed by CAT item selection may be advantageous.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension JF - Quality of Life Research Y1 - 2010 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Chie, W. C. A1 - Conroy, T. A1 - Costantini, A. A1 - Fayers, P. A1 - Helbostad, J. A1 - Holzner, B. A1 - Kaasa, S. A1 - Singer, S. A1 - Velikova, G. A1 - Young, T. AB - PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. VL - 20 SN - 1573-2649 (Electronic)0962-9343 (Linking) N1 - Qual Life Res. 2010 Oct 23. ER - TY - JOUR T1 - Item Selection and Hypothesis Testing for the Adaptive Measurement of Change JF - Applied Psychological Measurement Y1 - 2010 A1 - Finkelman, M. D. A1 - Weiss, D. J. A1 - Kim-Kang, G. KW - change KW - computerized adaptive testing KW - individual change KW - Kullback–Leibler information KW - likelihood ratio KW - measuring change AB -

Assessing individual change is an important topic in both psychological and educational measurement. An adaptive measurement of change (AMC) method had previously been shown to exhibit greater efficiency in detecting change than conventional nonadaptive methods. However, little work had been done to compare different procedures within the AMC framework. This study introduced a new item selection criterion and two new test statistics for detecting change with AMC that were specifically designed for the paradigm of hypothesis testing. In two simulation sets, the new methods for detecting significant change improved on existing procedures by demonstrating better adherence to Type I error rates and substantially better power for detecting relatively small change. 

VL - 34 IS - 4 ER - TY - JOUR T1 - Multidimensionale adaptive Kompetenzdiagnostik: Ergebnisse zur Messeffizienz [Multidimensional adaptive testing of competencies: Results regarding measurement efficiency]. JF - Zeitschrift für Pädagogik Y1 - 2010 A1 - Frey, A. A1 - Seitz, N-N. VL - 56 ER - TY - JOUR T1 - Variations on Stochastic Curtailment in Sequential Mastery Testing JF - Applied Psychological Measurement Y1 - 2010 A1 - Finkelman, Matthew David AB -

In sequential mastery testing (SMT), assessment via computer is used to classify examinees into one of two mutually exclusive categories. Unlike paper-and-pencil tests, SMT has the capability to use variable-length stopping rules. One approach to shortening variable-length tests is stochastic curtailment, which halts examination if the probability of changing classification decisions is low. The estimation of such a probability is therefore a critical component of a stochastically curtailed test. This article examines several variations on stochastic curtailment where the key probability is estimated more aggressively than the standard formulation, resulting in additional savings in average test length (ATL). In two simulation sets, the variations successfully reduced the ATL, and in many cases the average loss, compared with the standard formulation.

VL - 34 UR - http://apm.sagepub.com/content/34/1/27.abstract ER - TY - JOUR T1 - A Conditional Exposure Control Method for Multidimensional Adaptive Testing JF - Journal of Educational Measurement Y1 - 2009 A1 - Matthew Finkelman A1 - Nering, Michael L. A1 - Roussos, Louis A. AB -

In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed for tests employing the unidimensional 3-PL model. The present article explores the issues associated with controlling exposure rates when a multidimensional item response theory (MIRT) model is utilized and exposure rates must be controlled conditional upon ability. This situation is complicated by the exponentially increasing number of possible ability values in multiple dimensions. The article introduces a new procedure, called the generalized Stocking-Lewis method, that controls the exposure rate for students of comparable ability as well as with respect to the overall population. A realistic simulation set compares the new method with three other approaches: Kullback-Leibler information with no exposure control, Kullback-Leibler information with unconditional Sympson-Hetter exposure control, and random item selection.

VL - 46 UR - http://dx.doi.org/10.1111/j.1745-3984.2009.01070.x ER - TY - JOUR T1 - A conditional exposure control method for multidimensional adaptive testing JF - Journal of Educational Measurement Y1 - 2009 A1 - Finkelman, M. A1 - Nering, M. L. A1 - Roussos, L. A. VL - 46 ER - TY - JOUR T1 - Constraint-Weighted a-Stratification for Computerized Adaptive Testing With Nonstatistical Constraints JF - Educational and Psychological Measurement Y1 - 2009 A1 - Ying Cheng, A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A1 - Fanmin Guo, AB -

a-stratification is a method that utilizes items with small discrimination (a) parameters early in an exam and those with higher a values when more is learned about the ability parameter. It can achieve much better item usage than the maximum information criterion (MIC). To make a-stratification more practical and more widely applicable, a method for weighting the item selection process in a-stratification as a means of satisfying multiple test constraints is proposed. This method is studied in simulation against an analogous method without stratification as well as a-stratification using descending-rather than ascending-a procedures. In addition, a variation of a-stratification that allows for unbalanced usage of a parameters is included in the study to examine the trade-off between efficiency and exposure control. Finally, MIC and randomized item selection are included as baseline measures. Results indicate that the weighting mechanism successfully addresses the constraints, that stratification helps to a great extent balancing exposure rates, and that the ascending-a design improves measurement precision.

VL - 69 UR - http://epm.sagepub.com/content/69/1/35.abstract ER - TY - JOUR T1 - Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis JF - Rehabilitation Psychology Y1 - 2009 A1 - Forkmann, T. A1 - Boecker, M. A1 - Norra, C. A1 - Eberle, N. A1 - Kircher, T. A1 - Schauerte, P. A1 - Mischke, K. A1 - Westhofen, M. A1 - Gauggel, S. A1 - Wirtz, M. KW - Adaptation, Psychological KW - Adult KW - Aged KW - Depressive Disorder/*diagnosis/psychology KW - Diagnosis, Computer-Assisted KW - Female KW - Heart Diseases/*psychology KW - Humans KW - Male KW - Mental Disorders/*psychology KW - Middle Aged KW - Models, Statistical KW - Otorhinolaryngologic Diseases/*psychology KW - Personality Assessment/statistics & numerical data KW - Personality Inventory/*statistics & numerical data KW - Psychometrics/statistics & numerical data KW - Questionnaires KW - Reproducibility of Results KW - Sick Role AB - OBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. VL - 54 SN - 0090-5550 (Print)0090-5550 (Linking) N1 - Forkmann, ThomasBoecker, MarenNorra, ChristineEberle, NicoleKircher, TiloSchauerte, PatrickMischke, KarlWesthofen, MartinGauggel, SiegfriedWirtz, MarkusResearch Support, Non-U.S. Gov'tUnited StatesRehabilitation psychologyRehabil Psychol. 2009 May;54(2):186-97. ER - TY - JOUR T1 - Diagnostic classification models and multidimensional adaptive testing: A commentary on Rupp and Templin. JF - Measurement: Interdisciplinary Research and Perspectives Y1 - 2009 A1 - Frey, A. A1 - Carstensen, C. H. VL - 7 ER - TY - JOUR T1 - Effekte des adaptiven Testens auf die Moti¬vation zur Testbearbeitung [Effects of adaptive testing on test taking motivation]. JF - Diagnostica Y1 - 2009 A1 - Frey, A. A1 - Hartig, J. A1 - Moosbrugger, H. VL - 55 ER - TY - JOUR T1 - Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application JF - International Journal for Methods in Psychiatric Research Y1 - 2009 A1 - Fliege, H. A1 - Becker, J. A1 - Walter, O. B. A1 - Rose, M. A1 - Bjorner, J. B. A1 - Klapp, B. F. AB - In the past, a German Computerized Adaptive Test, based on Item Response Theory (IRT), was developed for purposes of assessing the construct depression [Computer-adaptive test for depression (D-CAT)]. This study aims at testing the feasibility and validity of the real computer-adaptive application.The D-CAT, supplied by a bank of 64 items, was administered on personal digital assistants (PDAs) to 423 consecutive patients suffering from psychosomatic and other medical conditions (78 with depression). Items were adaptively administered until a predetermined reliability (r >/= 0.90) was attained. For validation purposes, the Hospital Anxiety and Depression Scale (HADS), the Centre for Epidemiological Studies Depression (CES-D) scale, and the Beck Depression Inventory (BDI) were administered. Another sample of 114 patients was evaluated using standardized diagnostic interviews [Composite International Diagnostic Interview (CIDI)].The D-CAT was quickly completed (mean 74 seconds), well accepted by the patients and reliable after an average administration of only six items. In 95% of the cases, 10 items or less were needed for a reliable score estimate. Correlations between the D-CAT and the HADS, CES-D, and BDI ranged between r = 0.68 and r = 0.77. The D-CAT distinguished between diagnostic groups as well as established questionnaires do.The D-CAT proved an efficient, well accepted and reliable tool. Discriminative power was comparable to other depression measures, whereby the CAT is shorter and more precise. Item usage raises questions of balancing the item selection for content in the future. Copyright (c) 2009 John Wiley & Sons, Ltd. VL - 18 SN - 1049-8931 (Print) N1 - Journal articleInternational journal of methods in psychiatric researchInt J Methods Psychiatr Res. 2009 Feb 4. ER - TY - JOUR T1 - An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception JF - Journal of Clinical Epidemiology Y1 - 2009 A1 - Kocalevent, R. D. A1 - Rose, M. A1 - Becker, J. A1 - Walter, O. B. A1 - Fliege, H. A1 - Bjorner, J. B. A1 - Kleiber, D. A1 - Klapp, B. F. KW - *Diagnosis, Computer-Assisted KW - Adolescent KW - Adult KW - Aged KW - Aged, 80 and over KW - Confidence Intervals KW - Female KW - Humans KW - Male KW - Middle Aged KW - Perception KW - Quality of Health Care/*standards KW - Questionnaires KW - Reproducibility of Results KW - Sickness Impact Profile KW - Stress, Psychological/*diagnosis/psychology KW - Treatment Outcome AB - OBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making. VL - 62 SN - 1878-5921 (Electronic)0895-4356 (Linking) N1 - Kocalevent, Ruya-DanielaRose, MatthiasBecker, JanineWalter, Otto BFliege, HerbertBjorner, Jakob BKleiber, DieterKlapp, Burghard FEvaluation StudiesUnited StatesJournal of clinical epidemiologyJ Clin Epidemiol. 2009 Mar;62(3):278-87, 287.e1-3. Epub 2008 Jul 18. ER - TY - CHAP T1 - Item selection and hypothesis testing for the adaptive measurement of change Y1 - 2009 A1 - Finkelman, M. A1 - Weiss, D. J. A1 - Kim-Kang, G. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 228 KB} ER - TY - JOUR T1 - Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. A1 - Hambleton, R. K. A1 - Montpetit, K. A1 - Bilodeau, N. A1 - Gorton, G. E. A1 - Watson, K. A1 - Tucker, C. A. KW - *Computer Simulation KW - *Health Status KW - *Models, Statistical KW - Adaptation, Psychological KW - Adolescent KW - Cerebral Palsy/*physiopathology KW - Child KW - Child, Preschool KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Massachusetts KW - Pennsylvania KW - Questionnaires KW - Young Adult AB - PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AHambleton, Ronald KMontpetit, KathleenBilodeau, NathalieGorton, George EWatson, KyleTucker, Carole AK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 Apr;18(3):359-70. Epub 2009 Feb 17. U2 - 2692519 ER - TY - JOUR T1 - Multidimensional adaptive testing in educational and psychological measurement: Current state and future challenges JF - Studies in Educational Evaluation Y1 - 2009 A1 - Frey, A. A1 - Seitz, N-N. AB - The paper gives an overview of multidimensional adaptive testing (MAT) and evaluates its applicability in educational and psychological testing. The approach of Segall (1996) is described as a general framework for MAT. The main advantage of MAT is its capability to increase measurement efficiency. In simulation studies conceptualizing situations typical to large scale assessments, the number of presented items was reduced by MAT by about 30–50% compared to unidimensional adaptive testing and by about 70% compared to fixed item testing holding measurement precision constant. Empirical results underline these findings. Before MAT is used routinely some open questions should be answered first. After that, MAT represents a very promising approach to highly efficient simultaneous testing of multiple competencies. VL - 35 SN - 0191491X ER - TY - JOUR T1 - Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing JF - Journal of Rheumatology Y1 - 2009 A1 - Fries, J.F. A1 - Cella, D. A1 - Rose, M. A1 - Krishnan, E. A1 - Bruce, B. KW - *Disability Evaluation KW - *Outcome Assessment (Health Care) KW - Arthritis/diagnosis/*physiopathology KW - Health Surveys KW - Humans KW - Prognosis KW - Reproducibility of Results AB - OBJECTIVE: Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated. METHODS: A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ. RESULTS: Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms. CONCLUSION: Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length. VL - 36 SN - 0315-162X (Print)0315-162X (Linking) N1 - Fries, James FCella, DavidRose, MatthiasKrishnan, EswarBruce, BonnieU01 AR052158/AR/NIAMS NIH HHS/United StatesU01 AR52177/AR/NIAMS NIH HHS/United StatesConsensus Development ConferenceResearch Support, N.I.H., ExtramuralCanadaThe Journal of rheumatologyJ Rheumatol. 2009 Sep;36(9):2061-6. ER - TY - JOUR T1 - Validation of the MMPI-2 computerized adaptive version (MMPI-2-CA) in a correctional intake facility JF - Psychological Services Y1 - 2009 A1 - Forbey, J. D. A1 - Ben-Porath, Y. S. A1 - Gartland, D. AB - Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. The time savings afforded by this technique could be of particular benefit in settings where large numbers of psychological screenings are conducted, such as correctional facilities. In the current study, item and time savings, as well as the test–retest and extratest correlations associated with an audio augmented administration of all the scales of the Minnesota Multiphasic Personality Inventory (MMPI)-2 Computerized Adaptive (MMPI-2-CA) are reported. Participants include 366 men, ages 18 to 62 years (M = 33.04, SD = 10.40), undergoing intake into a large Midwestern state correctional facility. Results of the current study indicate considerable item and corresponding time savings for the MMPI-2-CA compared to conventional administration of the test, as well as comparability in terms of test–retest and correlations with external measures. Future directions of adaptive personality testing are discussed. VL - 6 SN - 1939-148X ER - TY - JOUR T1 - Assessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Coster, W. J. A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Disability Evaluation KW - *Social Adjustment KW - Activities of Daily Living KW - Adolescent KW - Age Factors KW - Child KW - Child, Preschool KW - Computer Simulation KW - Cross-Over Studies KW - Disabled Children/*rehabilitation KW - Female KW - Follow-Up Studies KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care) KW - Reference Values KW - Reproducibility of Results KW - Retrospective Studies KW - Risk Factors KW - Self Care/*standards/trends KW - Sex Factors KW - Sickness Impact Profile AB - OBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Coster, Wendy JHaley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesR41 HD052318-01A1/HD/NICHD NIH HHS/United StatesR43 HD42388-01/HD/NICHD NIH HHS/United StatesComparative StudyResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Apr;89(4):622-9. U2 - 2666276 ER - TY - CONF T1 - Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing T2 - Joint Meeting on Adolescent Treatment Effectiveness Y1 - 2008 A1 - Dennis, M. L. A1 - Funk, R. A1 - Titus, J. A1 - Riley, B. B. A1 - Hosman, S. A1 - Kinne, S. JF - Joint Meeting on Adolescent Treatment Effectiveness CY - Washington D.C., USA N1 - ProCite field[6]: Paper presented at the ER - TY - JOUR T1 - Functioning and validity of a computerized adaptive test to measure anxiety (A CAT) JF - Depression and Anxiety Y1 - 2008 A1 - Becker, J. A1 - Fliege, H. A1 - Kocalevent, R. D. A1 - Bjorner, J. B. A1 - Rose, M. A1 - Walter, O. B. A1 - Klapp, B. F. AB - Background: The aim of this study was to evaluate the Computerized Adaptive Test to measure anxiety (A-CAT), a patient-reported outcome questionnaire that uses computerized adaptive testing to measure anxiety. Methods: The A-CAT builds on an item bank of 50 items that has been built using conventional item analyses and item response theory analyses. The A-CAT was administered on Personal Digital Assistants to n=357 patients diagnosed and treated at the department of Psychosomatic Medicine and Psychotherapy, Charité Berlin, Germany. For validation purposes, two subgroups of patients (n=110 and 125) answered the A-CAT along with established anxiety and depression questionnaires. Results: The A-CAT was fast to complete (on average in 2 min, 38 s) and a precise item response theory based CAT score (reliability>.9) could be estimated after 4–41 items. On average, the CAT displayed 6 items (SD=4.2). Convergent validity of the A-CAT was supported by correlations to existing tools (Hospital Anxiety and Depression Scale-A, Beck Anxiety Inventory, Berliner Stimmungs-Fragebogen A/D, and State Trait Anxiety Inventory: r=.56–.66); discriminant validity between diagnostic groups was higher for the A-CAT than for other anxiety measures. Conclusions: The German A-CAT is an efficient, reliable, and valid tool for assessing anxiety in patients suffering from anxiety disorders and other conditions with significant potential for initial assessment and long-term treatment monitoring. Future research directions are to explore content balancing of the item selection algorithm of the CAT, to norm the tool to a healthy sample, and to develop practical cutoff scores. Depression and Anxiety, 2008. © 2008 Wiley-Liss, Inc. VL - 25 SN - 1520-6394 ER - TY - JOUR T1 - Modern sequential analysis and its application to computerized adaptive testing JF - Psychometrika Y1 - 2008 A1 - Bartroff, J. A1 - Finkelman, M. A1 - Lai, T. L. AB - After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing.We then show how these methods can be used to design adaptive mastery tests, which are asymptotically optimal and are also shown to provide substantial improvements over currently used sequential and fixed length tests. VL - 73 ER - TY - JOUR T1 - A Strategy for Controlling Item Exposure in Multidimensional Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2008 A1 - Lee, Yi-Hsuan A1 - Ip, Edward H. A1 - Fuh, Cheng-Der AB -

Although computerized adaptive tests have enjoyed tremendous growth, solutions for important problems remain unavailable. One problem is the control of item exposure rate. Because adaptive algorithms are designed to select optimal items, they choose items with high discriminating power. Thus, these items are selected more often than others, leading to both overexposure and underutilization of some parts of the item pool. Overused items are often compromised, creating a security problem that could threaten the validity of a test. Building on a previously proposed stratification scheme to control the exposure rate for one-dimensional tests, the authors extend their method to multidimensional tests. A strategy is proposed based on stratification in accordance with a functional of the vector of the discrimination parameter, which can be implemented with minimal computational overhead. Both theoretical and empirical validation studies are provided. Empirical results indicate significant improvement over the commonly used method of controlling exposure rate that requires only a reasonable sacrifice in efficiency.

VL - 68 UR - http://epm.sagepub.com/content/68/2/215.abstract ER - TY - JOUR T1 - Using computerized adaptive testing to reduce the burden of mental health assessment JF - Psychiatric Services Y1 - 2008 A1 - Gibbons, R. D. A1 - Weiss, D. J. A1 - Kupfer, D. J. A1 - Frank, E. A1 - Fagiolini, A. A1 - Grochocinski, V. J. A1 - Bhaumik, D. K. A1 - Stover, A. A1 - Bock, R. D. A1 - Immekus, J. C. KW - *Diagnosis, Computer-Assisted KW - *Questionnaires KW - Adolescent KW - Adult KW - Aged KW - Agoraphobia/diagnosis KW - Anxiety Disorders/diagnosis KW - Bipolar Disorder/diagnosis KW - Female KW - Humans KW - Male KW - Mental Disorders/*diagnosis KW - Middle Aged KW - Mood Disorders/diagnosis KW - Obsessive-Compulsive Disorder/diagnosis KW - Panic Disorder/diagnosis KW - Phobic Disorders/diagnosis KW - Reproducibility of Results KW - Time Factors AB - OBJECTIVE: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. METHODS: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. RESULTS: Tests of competing models based on item response theory supported the scale's bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients--one with bipolar disorder and one without--on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. CONCLUSIONS: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden. VL - 59 SN - 1075-2730 (Print) N1 - Gibbons, Robert DWeiss, David JKupfer, David JFrank, EllenFagiolini, AndreaGrochocinski, Victoria JBhaumik, Dulal KStover, AngelaBock, R DarrellImmekus, Jason CR01-MH-30915/MH/United States NIMHR01-MH-66302/MH/United States NIMHResearch Support, N.I.H., ExtramuralUnited StatesPsychiatric services (Washington, D.C.)Psychiatr Serv. 2008 Apr;59(4):361-8. ER - TY - JOUR T1 - Using item banks to construct measures of patient reported outcomes in clinical trials: investigator perceptions JF - Clinical Trials Y1 - 2008 A1 - Flynn, K. E. A1 - Dombeck, C. B. A1 - DeWitt, E. M. A1 - Schulman, K. A. A1 - Weinfurt, K. P. AB - BACKGROUND: Item response theory (IRT) promises more sensitive and efficient measurement of patient-reported outcomes (PROs) than traditional approaches; however, the selection and use of PRO measures from IRT-based item banks differ from current methods of using PRO measures. PURPOSE: To anticipate barriers to the adoption of IRT item banks into clinical trials. METHODS: We conducted semistructured telephone or in-person interviews with 42 clinical researchers who published results from clinical trials in the Journal of the American Medical Association, the New England Journal of Medicine, or other leading clinical journals from July 2005 through May 2006. Interviews included a brief tutorial on IRT item banks. RESULTS: After the tutorial, 39 of 42 participants understood the novel products available from an IRT item bank, namely customized short forms and computerized adaptive testing. Most participants (38/42) thought that item banks could be useful in their clinical trials, but they mentioned several potential barriers to adoption, including economic and logistical constraints, concerns about whether item banks are better than current PRO measures, concerns about how to convince study personnel or statisticians to use item banks, concerns about FDA or sponsor acceptance, and the lack of availability of item banks validated in specific disease populations. LIMITATIONS: Selection bias might have led to more positive responses to the concept of item banks in clinical trials. CONCLUSIONS: Clinical investigators are open to a new method of PRO measurement offered in IRT item banks, but bank developers must address investigator and stakeholder concerns before widespread adoption can be expected. VL - 5 SN - 1740-7745 (Print) N1 - Flynn, Kathryn EDombeck, Carrie BDeWitt, Esi MorganSchulman, Kevin AWeinfurt, Kevin P5U01AR052186/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralEnglandClinical trials (London, England)Clin Trials. 2008;5(6):575-86. ER - TY - JOUR T1 - On using stochastic curtailment to shorten the SPRT in sequential mastery testing JF - Journal of Educational and Behavioral Statistics Y1 - 2008 A1 - Finkelman, M. D. VL - 33 ER - TY - JOUR T1 - The Wald–Wolfowitz Theorem Is Violated in Sequential Mastery Testing JF - Sequential Analysis Y1 - 2008 A1 - Finkelman, M. VL - 27 ER - TY - JOUR T1 - Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment JF - Quality of Life Research Y1 - 2007 A1 - Fayers, P. M. AB - OBJECTIVES: We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). BACKGROUND: IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. RESULTS: Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. CONCLUSIONS: Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified. VL - 16 SN - 0962-9343 (Print) N1 - Fayers, Peter MNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:187-94. Epub 2007 Apr 7. ER - TY - JOUR T1 - Computerized adaptive personality testing: A review and illustration with the MMPI-2 Computerized Adaptive Version JF - Psychological Assessment Y1 - 2007 A1 - Forbey, J. D. A1 - Ben-Porath, Y. S. KW - Adolescent KW - Adult KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Female KW - Humans KW - Male KW - MMPI/*statistics & numerical data KW - Personality Assessment/*statistics & numerical data KW - Psychometrics/statistics & numerical data KW - Reference Values KW - Reproducibility of Results AB - Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. Two approaches have been explored for adaptive testing in computerized personality assessment: item response theory and the countdown method. In this article, the authors review the literature on each and report the results of an investigation designed to explore the utility, in terms of item and time savings, and validity, in terms of correlations with external criterion measures, of an expanded countdown method-based research version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), the MMPI-2 Computerized Adaptive Version (MMPI-2-CA). Participants were 433 undergraduate college students (170 men and 263 women). Results indicated considerable item savings and corresponding time savings for the adaptive testing modalities compared with a conventional computerized MMPI-2 administration. Furthermore, computerized adaptive administration yielded comparable results to computerized conventional administration of the MMPI-2 in terms of both test scores and their validity. Future directions for computerized adaptive personality testing are discussed. VL - 19 SN - 1040-3590 (Print) N1 - Forbey, Johnathan DBen-Porath, Yossef SResearch Support, Non-U.S. Gov'tUnited StatesPsychological assessmentPsychol Assess. 2007 Mar;19(1):14-24. ER - TY - JOUR T1 - Development and evaluation of a computer adaptive test for “Anxiety” (Anxiety-CAT) JF - Quality of Life Research Y1 - 2007 A1 - Walter, O. B. A1 - Becker, J. A1 - Bjorner, J. B. A1 - Fliege, H. A1 - Klapp, B. F. A1 - Rose, M. VL - 16 ER - TY - JOUR T1 - The effect of including pretest items in an operational computerized adaptive test: Do different ability examinees spend different amounts of time on embedded pretest items? JF - Educational Assessment Y1 - 2007 A1 - Ferdous, A. A. A1 - Plake, B. S. A1 - Chang, S-R. KW - ability KW - operational computerized adaptive test KW - pretest items KW - time AB - The purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Lawrence Erlbaum: US VL - 12 SN - 1062-7197 (Print); 1532-6977 (Electronic) ER - TY - JOUR T1 - Hypothetischer Einsatz adaptiven Testens bei der Messung von Bildungsstandards in Mathematik [Hypothetical use of adaptive testing for the measurement of educational standards in mathematics] . JF - Zeitschrift für Erziehungswissenschaft Y1 - 2007 A1 - Frey, A. A1 - Ehmke, T. VL - 8 ER - TY - JOUR T1 - Improving patient reported outcomes using item response theory and computerized adaptive testing JF - Journal of Rheumatology Y1 - 2007 A1 - Chakravarty, E. F. A1 - Bjorner, J. B. A1 - Fries, J.F. KW - *Rheumatic Diseases/physiopathology/psychology KW - Clinical Trials KW - Data Interpretation, Statistical KW - Disability Evaluation KW - Health Surveys KW - Humans KW - International Cooperation KW - Outcome Assessment (Health Care)/*methods KW - Patient Participation/*methods KW - Research Design/*trends KW - Software AB - OBJECTIVE: Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. METHODS: We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. RESULTS: Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance"items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. CONCLUSION: By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials. VL - 34 SN - 0315-162X (Print) N1 - Chakravarty, Eliza FBjorner, Jakob BFries, James FAr052158/ar/niamsConsensus Development ConferenceResearch Support, N.I.H., ExtramuralCanadaThe Journal of rheumatologyJ Rheumatol. 2007 Jun;34(6):1426-31. ER - TY - JOUR T1 - The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years JF - Medical Care Y1 - 2007 A1 - Cella, D. A1 - Yount, S. A1 - Rothrock, N. A1 - Gershon, R. C. A1 - Cook, K. F. A1 - Reeve, B. A1 - Ader, D. A1 - Fries, J.F. A1 - Bruce, B. A1 - Rose, M. AB - The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. VL - 45 ER - TY - CHAP T1 - Up-and-down procedures for approximating optimal designs using person-response functions Y1 - 2007 A1 - Sheng, Y. A1 - Flournoy, N. A1 - Osterlind, S. J. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 1,042 KB} ER - TY - JOUR T1 - Measurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2006 A1 - Haley, S. M. A1 - Ni, P. A1 - Ludlow, L. H. A1 - Fragala-Pinkham, M. A. KW - *Disability Evaluation KW - *Pediatrics KW - Adolescent KW - Child KW - Child, Preschool KW - Computers KW - Disabled Persons/*classification/rehabilitation KW - Efficiency KW - Humans KW - Infant KW - Outcome Assessment (Health Care) KW - Psychometrics KW - Self Care AB - OBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired. VL - 87 SN - 0003-9993 (Print) N1 - Haley, Stephen MNi, PengshengLudlow, Larry HFragala-Pinkham, Maria AK02 hd45354-01/hd/nichdResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2006 Sep;87(9):1223-9. ER - TY - JOUR T1 - Multidimensional computerized adaptive testing of the EORTC QLQ-C30: basic developments and evaluations JF - Quality of Life Research Y1 - 2006 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Fayers, P. A1 - Sprangers, M. A1 - Bjorner, J. B. KW - *Quality of Life KW - *Self Disclosure KW - Adult KW - Female KW - Health Status KW - Humans KW - Male KW - Middle Aged KW - Questionnaires/*standards KW - User-Computer Interface AB - OBJECTIVE: Self-report questionnaires are widely used to measure health-related quality of life (HRQOL). Ideally, such questionnaires should be adapted to the individual patient and at the same time scores should be directly comparable across patients. This may be achieved using computerized adaptive testing (CAT). Usually, CAT is carried out for a single domain at a time. However, many HRQOL domains are highly correlated. Multidimensional CAT may utilize these correlations to improve measurement efficiency. We investigated the possible advantages and difficulties of multidimensional CAT. STUDY DESIGN AND SETTING: We evaluated multidimensional CAT of three scales from the EORTC QLQ-C30: the physical functioning, emotional functioning, and fatigue scales. Analyses utilised a database with 2958 European cancer patients. RESULTS: It was possible to obtain scores for the three domains with five to seven items administered using multidimensional CAT that were very close to the scores obtained using all 12 items and with no or little loss of measurement precision. CONCLUSION: The findings suggest that multidimensional CAT may significantly improve measurement precision and efficiency and encourage further research into multidimensional CAT. Particularly, the estimation of the model underlying the multidimensional CAT and the conceptual aspects need further investigations. VL - 15 SN - 0962-9343 (Print) N1 - Petersen, Morten AaGroenvold, MogensAaronson, NeilFayers, PeterSprangers, MirjamBjorner, Jakob BEuropean Organisation for Research and Treatment of Cancer Quality of Life GroupResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2006 Apr;15(3):315-29. ER - TY - JOUR T1 - Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme JF - Clin Rehabil Y1 - 2006 A1 - Haley, S. M. A1 - Fragala-Pinkham, M. A. A1 - Ni, P. VL - 20 ER - TY - THES T1 - Validitätssteigerungen durch adaptives Testen [Increasing validity by adaptive testing]. Y1 - 2006 A1 - Frey, A. ER - TY - JOUR T1 - Assessing Mobility in Children Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability Inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. A1 - Raczek, A. A1 - Coster, W. A1 - Dumas, H. A1 - Fragalapinkham, M. VL - 86 SN - 00039993 ER - TY - JOUR T1 - Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. M. A1 - Raczek, A. E. A1 - Coster, W. J. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Computer Simulation KW - *Disability Evaluation KW - Adolescent KW - Child KW - Child, Preschool KW - Cross-Sectional Studies KW - Disabled Children/*rehabilitation KW - Female KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care)/*methods KW - Rehabilitation Centers KW - Rehabilitation/*standards KW - Sensitivity and Specificity AB - OBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time. VL - 86 SN - 0003-9993 (Print) N1 - Haley, Stephen MRaczek, Anastasia ECoster, Wendy JDumas, Helene MFragala-Pinkham, Maria AK02 hd45354-01a1/hd/nichdR43 hd42388-01/hd/nichdResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, P.H.S.United StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2005 May;86(5):932-9. ER - TY - JOUR T1 - A computer adaptive testing approach for assessing physical functioning in children and adolescents JF - Developmental Medicine and Child Neuropsychology Y1 - 2005 A1 - Haley, S. M. A1 - Ni, P. A1 - Fragala-Pinkham, M. A. A1 - Skrinar, A. M. A1 - Corzo, D. KW - *Computer Systems KW - Activities of Daily Living KW - Adolescent KW - Age Factors KW - Child KW - Child Development/*physiology KW - Child, Preschool KW - Computer Simulation KW - Confidence Intervals KW - Demography KW - Female KW - Glycogen Storage Disease Type II/physiopathology KW - Health Status Indicators KW - Humans KW - Infant KW - Infant, Newborn KW - Male KW - Motor Activity/*physiology KW - Outcome Assessment (Health Care)/*methods KW - Reproducibility of Results KW - Self Care KW - Sensitivity and Specificity AB - The purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning. VL - 47 SN - 0012-1622 (Print) N1 - Haley, Stephen MNi, PengshengFragala-Pinkham, Maria ASkrinar, Alison MCorzo, DeyaniraComparative StudyResearch Support, Non-U.S. Gov'tEnglandDevelopmental medicine and child neurologyDev Med Child Neurol. 2005 Feb;47(2):113-20. ER - TY - JOUR T1 - Computerized Adaptive Testing With the Partial Credit Model: Estimation Procedures, Population Distributions, and Item Pool Characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, Joanna S. A1 - Dodd, Barbara G. A1 - Fitzpatrick, Steven J. A1 - Shieh, Yann Yann AB -

The primary purpose of this research is to examine the impact of estimation methods, actual latent trait distributions, and item pool characteristics on the performance of a simulated computerized adaptive testing (CAT) system. In this study, three estimation procedures are compared for accuracy of estimation: maximum likelihood estimation (MLE), expected a priori (EAP), and Warm's weighted likelihood estimation (WLE). Some research has shown that MLE and EAP perform equally well under certain conditions in polytomous CAT systems, such that they match the actual latent trait distribution. However, little research has compared these methods when prior estimates of. distributions are extremely poor. In general, it appears that MLE, EAP, and WLE procedures perform equally well when using an optimal item pool. However, the use of EAP procedures may be advantageous under nonoptimal testing conditions when the item pool is not appropriately matched to the examinees.

VL - 29 UR - http://apm.sagepub.com/content/29/6/433.abstract ER - TY - JOUR T1 - Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, J. A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. A1 - Shieh, Y. Y. VL - 29 ER - TY - JOUR T1 - Development of a computer-adaptive test for depression (D-CAT) JF - Quality of Life Research Y1 - 2005 A1 - Fliege, H. A1 - Becker, J. A1 - Walter, O. B. A1 - Bjorner, J. B. A1 - Klapp, B. F. A1 - Rose, M. VL - 14 ER - TY - JOUR T1 - The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes JF - Clinical and Experimental Rheumatology Y1 - 2005 A1 - Fries, J.F. A1 - Bruce, B. A1 - Cella, D. KW - computerized adaptive testing AB - PROMIS (Patient-Reported-Outcomes Measurement Information System) is an NIH Roadmap network project intended to improve the reliability, validity, and precision of PROs and to provide definitive new instruments that will exceed the capabilities of classic instruments and enable improved outcome measurement for clinical research across all NIH institutes. Item response theory (IRT) measurement models now permit us to transition conventional health status assessment into an era of item banking and computerized adaptive testing (CAT). Item banking uses IRT measurement models and methods to develop item banks from large pools of items from many available questionnaires. IRT allows the reduction and improvement of items and assembles domains of items which are unidimensional and not excessively redundant. CAT provides a model-driven algorithm and software to iteratively select the most informative remaining item in a domain until a desired degree of precision is obtained. Through these approaches the number of patients required for a clinical trial may be reduced while holding statistical power constant. PROMIS tools, expected to improve precision and enable assessment at the individual patient level which should broaden the appeal of PROs, will begin to be available to the general medical community in 2008. VL - 23 ER - TY - JOUR T1 - Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire JF - Alcoholism: Clinical & Experimental Research Y1 - 2005 A1 - Kahler, C. W. A1 - Strong, D. R. A1 - Read, J. P. A1 - De Boeck, P. A1 - Wilson, M. A1 - Acton, G. S. A1 - Palfai, T. P. A1 - Wood, M. D. A1 - Mehta, P. D. A1 - Neale, M. C. A1 - Flay, B. R. A1 - Conklin, C. A. A1 - Clayton, R. R. A1 - Tiffany, S. T. A1 - Shiffman, S. A1 - Krueger, R. F. A1 - Nichol, P. E. A1 - Hicks, B. M. A1 - Markon, K. E. A1 - Patrick, C. J. A1 - Iacono, William G. A1 - McGue, Matt A1 - Langenbucher, J. W. A1 - Labouvie, E. A1 - Martin, C. S. A1 - Sanjuan, P. M. A1 - Bavly, L. A1 - Kirisci, L. A1 - Chung, T. A1 - Vanyukov, M. A1 - Dunn, M. A1 - Tarter, R. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. A1 - Watt, M. KW - Psychometrics KW - Substance-Related Disorders AB - Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association VL - 29 N1 - MiscellaneousArticleMiscellaneous Article ER - TY - JOUR T1 - Computerized adaptive measurement of depression: A simulation study JF - BMC Psychiatry Y1 - 2004 A1 - Gardner, W. A1 - Shear, K. A1 - Kelleher, K. J. A1 - Pajer, K. A. A1 - Mammen, O. A1 - Buysse, D. A1 - Frank, E. KW - *Computer Simulation KW - Adult KW - Algorithms KW - Area Under Curve KW - Comparative Study KW - Depressive Disorder/*diagnosis/epidemiology/psychology KW - Diagnosis, Computer-Assisted/*methods/statistics & numerical data KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Internet KW - Male KW - Mass Screening/methods KW - Patient Selection KW - Personality Inventory/*statistics & numerical data KW - Pilot Projects KW - Prevalence KW - Psychiatric Status Rating Scales/*statistics & numerical data KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Research Support, U.S. Gov't, P.H.S. KW - Severity of Illness Index KW - Software AB - Background: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity. VL - 4 ER - TY - JOUR T1 - Kann die Konfundierung von Konzentrationsleistung und Aktivierung durch adaptives Testen mit dern FAKT vermieden werden? [Avoiding the confounding of concentration performance and activation by adaptive testing with the FACT] JF - Zeitschrift für Differentielle und Diagnostische Psychologie Y1 - 2004 A1 - Frey, A. A1 - Moosbrugger, H. KW - Adaptive Testing KW - Computer Assisted Testing KW - Concentration KW - Performance KW - Testing computerized adaptive testing AB - The study investigates the effect of computerized adaptive testing strategies on the confounding of concentration performance with activation. A sample of 54 participants was administered 1 out of 3 versions (2 adaptive, 1 non-adaptive) of the computerized Frankfurt Adaptive Concentration Test FACT (Moosbrugger & Heyden, 1997) at three subsequent points in time. During the test administration changes in activation (electrodermal activity) were recorded. The results pinpoint a confounding of concentration performance with activation for the non-adaptive test version, but not for the adaptive test versions (p = .01). Thus, adaptive FACT testing strategies can remove the confounding of concentration performance with activation, thereby increasing the discriminant validity. In conclusion, an attention-focusing-hypothesis is formulated to explain the observed effect. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 25 ER - TY - JOUR T1 - Validating the German computerized adaptive test for anxiety on healthy sample (A-CAT) JF - Quality of Life Research Y1 - 2004 A1 - Becker, J. A1 - Walter, O. B. A1 - Fliege, H. A1 - Bjorner, J. B. A1 - Kocalevent, R. D. A1 - Schmid, G. A1 - Klapp, B. F. A1 - Rose, M. VL - 13 ER - TY - ABST T1 - An adaptation of stochastic curtailment to truncate Wald’s SPRT in computerized adaptive testing Y1 - 2003 A1 - Finkelman, M. AB -

Computerized adaptive testing (CAT) has been shown to increase eÆciency in educational measurement. One common application of CAT is to classify students as either pro cient or not proficient in ability. A truncated form of Wald's sequential probability ratio test (SPRT), in which examination is halted after a prespeci ed number of questions, has been proposed to provide a diagnosis of prociency. This article studies the further truncation provided by stochastic curtailment, where an exam is stopped early if completion of the remaining questions would be unlikely to alter the classi cation of the examinee. In a simulation study presented, the increased truncation is shown to offer substantial improvement in test length with only a slight decrease in accuracy.

PB - National Center for Research on Evaluation, Standards, and Student Testing CY - Los Angeles ER - TY - CONF T1 - A comparison of exposure control procedures in CAT systems based on different measurement models for testlets using the verbal reasoning section of the MCAT T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Boyd, A. M A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 405 KB} ER - TY - CONF T1 - The evaluation of exposure control procedures for an operational CAT. T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 2003 A1 - French, B. F. A1 - Thompson, T. T. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - Chicago IL ER - TY - JOUR T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model JF - Journal of Applied Measurement Y1 - 2003 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. J. KW - *Computers KW - *Educational Measurement KW - *Models, Theoretical KW - Automation KW - Decision Making KW - Humans KW - Reproducibility of Results AB - The purpose of the present investigation was to systematically examine the effectiveness of the Sympson-Hetter technique and rotated content balancing relative to no exposure control and no content rotation conditions in a computerized adaptive testing system (CAT) based on the partial credit model. A series of simulated fixed and variable length CATs were run using two data sets generated to multiple content areas for three sizes of item pools. The 2 (exposure control) X 2 (content rotation) X 2 (test length) X 3 (item pool size) X 2 (data sets) yielded a total of 48 conditions. Results show that while both procedures can be used with no deleterious effect on measurement precision, the gains in exposure control, pool utilization, and item overlap appear quite modest. Difficulties involved with setting the exposure control parameters in small item pools make questionable the utility of the Sympson-Hetter technique with similar item pools. VL - 4 N1 - 1529-7713Journal Article ER - TY - JOUR T1 - Statistical detection and estimation of differential item functioning in computerized adaptive testing JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Feng, X. AB - Differential item functioning (DIF) is an important issue in large scale standardized testing. DIF refers to the unexpected difference in item performances among groups of equally proficient examinees, usually classified by ethnicity or gender. Its presence could seriously affect the validity of inferences drawn from a test. Various statistical methods have been proposed to detect and estimate DIF. This dissertation addresses DIF analysis in the context of computerized adaptive testing (CAT), whose item selection algorithm adapts to the ability level of each individual examinee. In a CAT, a DIF item may be more consequential and more detrimental be cause fewer items are administered in a CAT than in a traditional paper-and-pencil test and because the remaining sequence of items presented to examinees depends in part on their responses to the DIF item. Consequently, an efficient, stable and flexible method to detect and estimate CAT DIF becomes necessary and increasingly important. We propose simultaneous implementations of online calibration and DIF testing. The idea is to perform online calibration of an item of interest separately in the focal and reference groups. Under any specific parametric IRT model, we can use the (online) estimated latent traits as covariates and fit a nonlinear regression model to each of the two groups. Because of the use of the estimated, not the true , the regression fit has to adjust for the covariate "measurement errors". It turns out that this situation fits nicely into the framework of nonlinear error-in-variable modelling, which has been extensively studied in statistical literature. We develop two bias-correction methods using asymptotic expansion and conditional score theory. After correcting the bias caused by measurement error, one can perform a significance test to detect DIF with the parameter estimates for different groups. This dissertation also discusses some general techniques to handle measurement error modelling with different IRT models, including the three-parameter normal ogive model and polytomous response models. Several methods of estimating DIF are studied as well. Large sample properties are established to justify the proposed methods. Extensive simulation studies show that the resulting methods perform well in terms of Type-I error rate control, accuracy in estimating DIF and power against both unidirectional and crossing DIF. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - CONF T1 - A further study on adjusting CAT item selection starting point for individual examinees T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Fan, M. A1 - Zhu. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - #FA02-01 ER - TY - CONF T1 - An investigation of procedures for estimating error indexes in proficiency estimation in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Shyu, C.-Y. A1 - Fan, M. A1 - Thompson, T, A1 - Hsu, Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - ABST T1 - A strategy for controlling item exposure in multidimensional computerized adaptive testing Y1 - 2002 A1 - Lee, Y. H. A1 - Ip, E.H. A1 - Fuh, C.D. CY - Available from http://www3. tat.sinica.edu.tw/library/c_tec_rep/c-2002-11.pdf ER - TY - CHAP T1 - The work ahead: A psychometric infrastructure for computerized adaptive tests T2 - Computer-based tests: Building the foundation for future assessment Y1 - 2002 A1 - F Drasgow ED - M. P. Potenza ED - J. J. Freemer ED - W. C. Ward KW - Adaptive Testing KW - Computer Assisted Testing KW - Educational KW - Measurement KW - Psychometrics AB - (From the chapter) Considers the past and future of computerized adaptive tests and computer-based tests and looks at issues and challenges confronting a testing program as it implements and operates a computer-based test. Recommendations for testing programs from The National Council of Measurement in Education Ad Hoc Committee on Computerized Adaptive Test Disclosure are appended. (PsycINFO Database Record (c) 2005 APA ) JF - Computer-based tests: Building the foundation for future assessment PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingComputer-based testing: Building the foundation for future assessments. (pp. 1-35). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xi, 326 pp ER - TY - CONF T1 - An investigation of procedures for estimating error indexes in proficiency estimation in a realistic second-order equitable CAT environment T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Shyu, C.-Y. A1 - Fan, M. A1 - Thompson, T, A1 - Hsu. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA ER - TY - CONF T1 - An investigation of the impact of items that exhibit mild DIF on ability estimation in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Jennings, J. A. A1 - Dodd, B. G. A1 - Fitzpatrick, S. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - CONF T1 - Assembling parallel item pools for computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Wang, T. A1 - Fan, M. Yi, Q. A1 - Ban, J. C. A1 - Zhu, D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans N1 - #WA00-02 ER - TY - BOOK T1 - Computerized adaptive testing: A primer (2nd edition) Y1 - 2000 A1 - Wainer, H., A1 - Dorans, N. A1 - Eignor, D. R. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale, N. J. : Lawrence Erlbaum Associates ER - TY - CONF T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA ER - TY - CHAP T1 - Item pools Y1 - 2000 A1 - Flaugher, R. CY - Wainer, H. (2000). Computerized adaptive testing: a primer. Mahwah, NJ: Erlbaum. ER - TY - JOUR T1 - A real data simulation of computerized adaptive administration of the MMPI-A JF - Computers in Human Behavior Y1 - 2000 A1 - Forbey, J. D. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. VL - 16 ER - TY - JOUR T1 - A real data simulation of computerized adaptive administration of the MMPI-A JF - Computers in Human Behavior Y1 - 2000 A1 - Fobey, J. D. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. AB - A real data simulation of computerized adaptive administration of the Minnesota Multiphasic Inventory-Adolescent (MMPI-A) was conducted using item responses from three groups of participants. The first group included 196 adolescents (age range 14-18) tested at a midwestern residential treatment facility for adolescents. The second group was the normative sample used in the standardization of the MMPI-A (Butcher, Williams, Graham, Archer, Tellegen, Ben-Porath, & Kaemmer, 1992. Minnesota Multiphasic Inventory-Adolescent (MMPI-A): manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.). The third group was the clinical sample: used in the validation of the MMPI-A (Williams & Butcher, 1989. An MMPI study of adolescents: I. Empirical validation of the study's scales. Personality assessment, 1, 251-259.). The MMPI-A data for each group of participants were run through a modified version of the MMPI-2 adaptive testing computer program (Roper, Ben-Porath & Butcher, 1995. Comparability and validity of computerized adaptive testing with the MMPI-2. Journal of Personality Assessment, 65, 358-371.). To determine the optimal amount of item savings, each group's MMPI-A item responses were used to simulate three different orderings of the items: (1) from least to most frequently endorsed in the keyed direction; (2) from least to most frequently endorsed in the keyed direction with the first 120 items rearranged into their booklet order; and (3) all items in booklet order. The mean number of items administered for each group was computed for both classification and full- scale elevations for T-score cut-off values of 60 and 65. Substantial item administration savings were achieved for all three groups, and the mean number of items saved ranged from 50 items (10.7% of the administered items) to 123 items (26.4% of the administered items), depending upon the T-score cut-off, classification method (i.e. classification only or full-scale elevation), and group. (C) 2000 Elsevier Science Ltd. All rights reserved. VL - 16 ER - TY - CONF T1 - Specific information item selection for adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Davey, T. A1 - Fan, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the National council on Measurement in Education Y1 - 1999 A1 - Fan, M. A1 - Thompson, T. A1 - Davey, T. JF - Paper presented at the annual meeting of the National council on Measurement in Education CY - Montreal N1 - #FA99-01 ER - TY - JOUR T1 - Evaluating the usefulness of computerized adaptive testing for medical in-course assessment JF - Academic Medicine Y1 - 1999 A1 - Kreiter, C. D. A1 - Ferguson, K. A1 - Gruppen, L. D. KW - *Automation KW - *Education, Medical, Undergraduate KW - Educational Measurement/*methods KW - Humans KW - Internal Medicine/*education KW - Likelihood Functions KW - Psychometrics/*methods KW - Reproducibility of Results AB - PURPOSE: This study investigated the feasibility of converting an existing computer-administered, in-course internal medicine test to an adaptive format. METHOD: A 200-item internal medicine extended matching test was used for this research. Parameters were estimated with commercially available software with responses from 621 examinees. A specially developed simulation program was used to retrospectively estimate the efficiency of the computer-adaptive exam format. RESULTS: It was found that the average test length could be shortened by almost half with measurement precision approximately equal to that of the full 200-item paper-and-pencil test. However, computer-adaptive testing with this item bank provided little advantage for examinees at the upper end of the ability continuum. An examination of classical item statistics and IRT item statistics suggested that adding more difficult items might extend the advantage to this group of examinees. CONCLUSIONS: Medical item banks presently used for incourse assessment might be advantageously employed in adaptive testing. However, it is important to evaluate the match between the items and the measurement objective of the test before implementing this format. VL - 74 SN - 1040-2446 (Print) N1 - Kreiter, C DFerguson, KGruppen, L DUnited statesAcademic medicine : journal of the Association of American Medical CollegesAcad Med. 1999 Oct;74(10):1125-8. JO - Acad Med ER - TY - JOUR T1 - Examinee judgments of changes in item difficulty: Implications for item review in computerized adaptive testing JF - Applied Measurement in Education Y1 - 1999 A1 - Wise, S. L. A1 - Finney, S. J., A1 - Enders, C. K. A1 - Freeman, S.A. A1 - Severance, D.D. VL - 12 ER - TY - CHAP T1 - Alternatives for scoring computerized adaptive tests T2 - Computer-based testing Y1 - 1998 A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. ED - J. J. Fremer ED - W. C. Ward JF - Computer-based testing PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J., USA ER - TY - CONF T1 - Alternatives for scoring computerized adaptive tests T2 - Paper presented at an Educational Testing Service-sponsored colloquium entitled Computer-based testing: Building the foundations for future assessments Y1 - 1998 A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at an Educational Testing Service-sponsored colloquium entitled Computer-based testing: Building the foundations for future assessments CY - Philadelphia PA ER - TY - CONF T1 - Computerized adaptive rating scales that measure contextual performance T2 - Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology Y1 - 1998 A1 - Borman, W. C. A1 - Hanson, M. A. A1 - Montowidlo, S. J A1 - F Drasgow A1 - Foster, L A1 - Kubisiak, U. C. JF - Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology CY - Dallas TX ER - TY - JOUR T1 - Testing word knowledge by telephone to estimate general cognitive aptitude using an adaptive test JF - Intelligence Y1 - 1998 A1 - Legree, P. J. A1 - Fischl, M. A A1 - Gade, P. A. A1 - Wilson, M. VL - 26 ER - TY - CONF T1 - The accuracy of examinee judgments of relative item difficulty: Implication for computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Wise, S. L. A1 - Freeman, S.A. A1 - Finney, S. J. A1 - Enders, C. K. A1 - Severance, D.D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago ER - TY - JOUR T1 - The effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational & Psychological Measurement Y1 - 1997 A1 - Chen, S-K. A1 - Hou, L. Y. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. KW - computerized adaptive testing AB - Investigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 57 N1 - Sage Publications, US ER - TY - JOUR T1 - The effect of population distribution and methods of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational and Psychological Measurement Y1 - 1997 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. VL - 57 ER - TY - ABST T1 - Linking scores for computer-adaptive and paper-and-pencil administrations of the SAT (Research Report No 97-12) Y1 - 1997 A1 - Lawrence, I. A1 - Feigenbaum, M. CY - Princeton NJ: Educational Testing Service N1 - #LA97-12 ER - TY - JOUR T1 - On-line performance assessment using rating scales JF - Journal of Outcomes Measurement Y1 - 1997 A1 - Stahl, J. A1 - Shumway, R. A1 - Bergstrom, B. A1 - Fisher, A. KW - *Outcome Assessment (Health Care) KW - *Rehabilitation KW - *Software KW - *Task Performance and Analysis KW - Activities of Daily Living KW - Humans KW - Microcomputers KW - Psychometrics KW - Psychomotor Performance AB - The purpose of this paper is to report on the development of the on-line performance assessment instrument--the Assessment of Motor and Process Skills (AMPS). Issues that will be addressed in the paper include: (a) the establishment of the scoring rubric and its implementation in an extended Rasch model, (b) training of raters, (c) validation of the scoring rubric and procedures for monitoring the internal consistency of raters, and (d) technological implementation of the assessment instrument in a computerized program. VL - 1 N1 - 1090-655X (Print)Journal Article ER - TY - CONF T1 - Relationship of response latency to test design, examinee ability, and item difficulty in computer-based test administration T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Swanson, D. B. A1 - Featherman, C. M. A1 - Case, A. M. A1 - Luecht, RM A1 - Nungester, R. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Effects of answer feedback and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education. Y1 - 1996 A1 - Vispoel, W. P. A1 - Brunsman, B. A1 - Forte, E. A1 - Bleiler, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education. N1 - #VI96-01 ER - TY - CONF T1 - Effects of answer review and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Vispoel, W. A1 - Forte, E. A1 - Boo, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New York ER - TY - CONF T1 - The effects of methods of theta estimation, prior distribution, and number of quadrature points on CAT using the graded response model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Hou, L. A1 - Chen, S. A1 - Dodd. B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York NY ER - TY - CONF T1 - Effects of randomesque item selection on CAT item exposure rates and proficiency estimation under 1- and 2-PL models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Featherman, C. M. A1 - Subhiyah, R. G. A1 - Hadadi, A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - JOUR T1 - Methodologic trends in the healthcare professions: computer adaptive and computer simulation testing JF - Nurse Education Y1 - 1996 A1 - Forker, J. E. A1 - McDonald, M. E. KW - *Clinical Competence KW - *Computer Simulation KW - Computer-Assisted Instruction/*methods KW - Educational Measurement/*methods KW - Humans AB - Assessing knowledge and performance on computer is rapidly becoming a common phenomenon in testing and measurement. Computer adaptive testing presents an individualized test format in accordance with the examinee's ability level. The efficiency of the testing process enables a more precise estimate of performance, often with fewer items than traditional paper-and-pencil testing methodologies. Computer simulation testing involves performance-based, or authentic, assessment of the examinee's clinical decision-making abilities. The authors discuss the trends in assessing performance through computerized means and the application of these methodologies to community-based nursing practice. VL - 21 SN - 0363-3624 (Print)0363-3624 (Linking) N1 - Forker, J EMcDonald, M EUnited statesNurse educatorNurse Educ. 1996 Jul-Aug;21(4):13-4. ER - TY - CONF T1 - Multidimensional computer adaptive testing T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 1996 A1 - Fan, M. A1 - Hsu, Y. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - New York NY N1 - #FA96-02 ER - TY - CONF T1 - New algorithms for item selection and exposure and proficiency estimation under 1- and 2-PL models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Featherman, C. M. A1 - Subhiyah, R. G. A1 - Hadadi, A. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - CONF T1 - Utility of Fisher information, global information and different starting abilities in mini CAT T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Fan, M. A1 - Hsu, Y. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - New York NY N1 - #FA96-01 ER - TY - JOUR T1 - Assessment of scaled score consistency in adaptive testing from a multidimensional item response theory perspective JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1995 A1 - Fan, Miechu KW - computerized adaptive testing AB - The purpose of this study was twofold: (a) to examine whether the unidimensional adaptive testing estimates are comparable for different ability levels of examinees when the true examinee-item interaction is correctly modeled using a compensatory multidimensional item response theory (MIRT) model; and (b) to investigate the effects of adaptive testing estimation when the procedure of item selection of computerized adaptive testing (CAT) is controlled by either content-balancing or selecting the most informative item in a user specified direction at the current estimate of unidimensional ability. A series of Monte Carlo simulations were conducted in this study. Deviation from the reference composite angle was used as an index of the theta1,theta2-composite consistency across the different levels of unidimensional CAT estimates. In addition, the effect of the content-balancing item selection procedure and the fixed-direction item selection procedure were compared across the different ability levels. The characteristics of item selection, test information and the relationship between unidimensional and multidimensional models were also investigated. In addition to employing statistical analysis to examine the robustness of the CAT procedure violations of unidimensionality, this research also included graphical analyses to present the results. The results were summarized as follows: (a) the reference angles for the no-control-item-selection method were disparate across the unidimensional ability groups; (b) the unidimensional CAT estimates from the content-balancing item selection method did not offer much improvement; (c) the fixed-direction-item selection method did provide greater consistency for the unidimensional CAT estimates across the different levels of ability; (d) and, increasing the CAT test length did not provide greater score scale consistency. Based on the results of this study, the following conclusions were drawn: (a) without any controlling (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 55 ER - TY - JOUR T1 - Computer-adaptive testing: A new breed of assessment JF - Journal of the American Dietetic Association Y1 - 1995 A1 - Ruiz, B. A1 - Fitz, P. A. A1 - Lewis, C. A1 - Reidy, C. VL - 95 ER - TY - JOUR T1 - Computer-adaptive testing: A new breed of assessment JF - Journal of the American Dietetic Association Y1 - 1995 A1 - Ruiz, B. A1 - Fitz, P. A. A1 - Lewis, C. A1 - Reidy, C. VL - 95 ER - TY - CONF T1 - The effect of ability estimation for polytomous CAT in different item selection procedures T2 - Paper presented at the Annual meeting of the Psychometric Society Y1 - 1995 A1 - Fan, M. A1 - Hsu, Y. JF - Paper presented at the Annual meeting of the Psychometric Society CY - Minneapolis MN ER - TY - CONF T1 - The effect of model misspecification on classification decisions made using a computerized test: 3-PLM vs. 1PLM (and UIRT versus MIRT) T2 - Paper presented at the Annual Meeting of the Psychometric Society Y1 - 1995 A1 - Spray, J. A. A1 - Kalohn, J.C. A1 - Schulz, M. A1 - Fleer, P. Jr. JF - Paper presented at the Annual Meeting of the Psychometric Society CY - Minneapolis, MN N1 - #SP95-01 ER - TY - CONF T1 - The effect of population distribution and methods of theta estimation on CAT using the rating scale model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - JOUR T1 - A study of psychologically optimal level of item difficulty JF - Shinrigaku Kenkyu Y1 - 1995 A1 - Fujimori, S. KW - *Adaptation, Psychological KW - *Psychological Tests KW - Adult KW - Female KW - Humans KW - Male AB - For the purpose of selecting items in a test, this study presented a viewpoint of psychologically optimal difficulty level, as well as measurement efficiency, of items. A paper-and-pencil test (P & P) composed of hard, moderate and easy subtests was administered to 298 students at a university. A computerized adaptive test (CAT) was also administered to 79 students. The items of both tests were selected from Shiba's Word Meaning Comprehension Test, for which the estimates of parameters of two-parameter item response model were available. The results of P & P research showed that the psychologically optimal success level would be such that the proportion of right answers is somewhere between .75 and .85. A similar result was obtained from CAT research, where the proportion of about .8 might be desirable. Traditionally a success rate of .5 has been recommended in adaptive testing. In this study, however, it was suggested that the items of such level would be too hard psychologically for many examinees. VL - 65 SN - 0021-5236 (Print)0021-5236 (Linking) N1 - Fujimori, SClinical TrialControlled Clinical TrialEnglish AbstractJapanShinrigaku kenkyu : The Japanese journal of psychologyShinrigaku Kenkyu. 1995 Feb;65(6):446-53. ER - TY - CONF T1 - Pinpointing PRAXIS I CAT characteristics through simulation procedures T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1994 A1 - Eignor, D. R. A1 - Folk, V.G., A1 - Li, M.-Y. A1 - Stocking, M. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, LA ER - TY - JOUR T1 - Computerized adaptive testing in instructional settings JF - Educational Technology Research and Development Y1 - 1993 A1 - Welch, R. E., A1 - Frick, T. VL - 41(3) ER - TY - JOUR T1 - Moving in a new direction: Computerized adaptive testing (CAT) JF - Nursing Management Y1 - 1993 A1 - Jones-Dickson, C. A1 - Dorsey, D. A1 - Campbell-Warnock, J. A1 - Fields, F. KW - *Computers KW - Accreditation/methods KW - Educational Measurement/*methods KW - Licensure, Nursing KW - United States VL - 24 SN - 0744-6314 (Print) N1 - Jones-Dickson, CDorsey, DCampbell-Warnock, JFields, FUnited statesNursing managementNurs Manage. 1993 Jan;24(1):80, 82. ER - TY - JOUR T1 - Computerized adaptive mastery tests as expert systems JF - Journal of Educational Computing Research Y1 - 1992 A1 - Frick, T. W. VL - 8(2) ER - TY - JOUR T1 - Computerized adaptive mastery tests as expert systems JF - Journal of Educational Computing Research Y1 - 1992 A1 - Frick, T. W. VL - 8 ER - TY - JOUR T1 - Computerized adaptive testing for NCLEX-PN JF - Journal of Practical Nursing Y1 - 1992 A1 - Fields, F. A. KW - *Licensure KW - *Programmed Instruction KW - Educational Measurement/*methods KW - Humans KW - Nursing, Practical/*education VL - 42 SN - 0022-3867 (Print) N1 - Fields, F AUnited statesThe Journal of practical nursingJ Pract Nurs. 1992 Jun;42(2):8-10. ER - TY - CONF T1 - Student attitudes toward computer-adaptive test administration T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1992 A1 - Baghi, H A1 - Ferrara, S. F A1 - Gabrys, R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CONF T1 - Applications of computer-adaptive testing in Maryland T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Baghi, H A1 - Gabrys, R. A1 - Ferrara, S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - CONF T1 - The use of the graded response model in computerized adaptive testing of the attitudes to science scale T2 - annual meeting of the American Education Research Association Y1 - 1991 A1 - Foong, Y-Y. A1 - Lam, T-L. AB - The graded response model for two-stage testing was applied to an attitudes toward science scale using real-data simulation. The 48-item scale was administered to 920 students at a grade-8 equivalent in Singapore. A two-stage 16-item computerized adaptive test was developed. In two-stage testing an initial, or routing, test is followed by a second-stage testlet of greater or lesser difficulty based on performance. A conventional test of the same length as the adaptive two-stage test was selected from the 48-item pool. Responses to the conventional test, the routing test, and a testlet were simulated. The algorithm of E. Balas (1965) and the multidimensional knapsack problem of optimization theory were used in test development. The simulation showed the efficiency and accuracy of the two-stage test with the graded response model in estimating attitude trait levels, as evidenced by better results from the two-stage test than its conventional counterpart and the reduction to one-third of the length of the original measure. Six tables and three graphs are included. (SLD) JF - annual meeting of the American Education Research Association CY - Chicago, IL USA ER - TY - JOUR T1 - A comparison of three decision models for adapting the length of computer-based mastery tests JF - Journal of Educational Computing Research Y1 - 1990 A1 - Frick, T. W. VL - 6 IS - 4 ER - TY - JOUR T1 - Computerized adaptive measurement of attitudes JF - Measurement and Evaluation in Counseling and Development Y1 - 1990 A1 - Koch, W. R. A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. VL - 23 ER - TY - BOOK T1 - Computerized adaptive testing: A primer (Eds.) Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale NJ: Erlbaum ER - TY - JOUR T1 - Adaptive estimation when the unidimensionality assumption of IRT is violated JF - Applied Psychological Measurement Y1 - 1989 A1 - Folk, V.G., A1 - Green, B. F. VL - 13 ER - TY - JOUR T1 - Adaptive Estimation When the Unidimensionality Assumption of IRT is Violated JF - Applied Psychological Measurement Y1 - 1989 A1 - Folk, V.G. A1 - Green, B. F. VL - 13 IS - 4 ER - TY - JOUR T1 - Bayesian adaptation during computer-based tests and computer-guided practice exercises JF - Journal of Educational Computing Research Y1 - 1989 A1 - Frick, T. W. VL - 5(1) ER - TY - ABST T1 - A comparison of an expert systems approach to computerized adaptive testing and an IRT model Y1 - 1989 A1 - Frick, T. W. CY - Unpublished manuscript (submitted to American Educational Research Journal) ER - TY - CONF T1 - EXSPRT: An expert systems approach to computer-based adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1989 A1 - Frick, T. W. A1 - Plew, G.T. A1 - Luk, H.-K. JF - Paper presented at the annual meeting of the American Educational Research Association VL - San Francisco. ER - TY - ABST T1 - College Board computerized placement tests: Validation of an adaptive test of basic skills (Research Report 86-29) Y1 - 1986 A1 - W. C. Ward A1 - Kline, R. G. A1 - Flaugher, J. CY - Princeton NJ: Educational Testing Service. ER - TY - ABST T1 - Adaptive testing without a computer Y1 - 1981 A1 - Friedman, D. A1 - Steinberg, A, A1 - Ree, M. J. CY - Catalog of Selected Documents in Psychology, Nov 1981, 11, 74-75 (Ms. No. 2350). AFHRL Technical Report 80-66. ER - TY - CHAP T1 - Individualized testing on the basis of the Rasch model Y1 - 1980 A1 - Fischer, G. H. A1 - Pendl, P. CY - In J. Th. Van der Kamp, W. F. Langerak, and D. N. M. de Gruijter (Eds.). Psychometrics for educational debates. New York: Wiley. ER - TY - ABST T1 - Computer-based adaptive testing models for the Air Force technical training environment: Phase I: Development of a computerized measurement system for Air Force technical Training Y1 - 1974 A1 - Hansen, D. N. A1 - Johnson, B. F. A1 - Fagan, R. L. A1 - Tan, P. A1 - Dick, W. CY - JSAS Catalogue of Selected Documents in Psychology, 5, 1-86 (MS No. 882). AFHRL Technical Report 74-48. ER - TY - ABST T1 - Development of a programmed testing system (Technical Paper 259) Y1 - 1974 A1 - Bayroff, A. G. A1 - Ross, R. M A1 - Fischl, M. A CY - Arlington VA: US Army Research Institute for the Behavioral and Social Sciences. NTIS No. AD A001534) ER - TY - ABST T1 - Implementation of a Bayesian system for decision analysis in a program of individually prescribed instruction (Research Report No 60) Y1 - 1973 A1 - Ferguson, R. L. A1 - Novick, M. R. CY - Iowa City IA: American College Testing Program N1 - #FE73-01 ER - TY - ABST T1 - The application of item generators for individualizing mathematics testing and instruction (Report 1971/14) Y1 - 1971 A1 - Ferguson, R. L. A1 - Hsu, T. CY - Pittsburgh PA: University of Pittsburgh Learning Research and Development Center ER - TY - ABST T1 - Computer assistance for individualizing measurement Y1 - 1971 A1 - Ferguson, R. L. CY - Pittsburgh PA: University of Pittsburgh R and D Center ER - TY - JOUR T1 - A model for computer-assisted criterion-referenced measurement JF - Education Y1 - 1971 A1 - Ferguson, R. L. VL - 81 ER - TY - JOUR T1 - Computer assistance for individualizing measurement JF - Computers and Automation Y1 - 1970 A1 - Ferguson, R. L. VL - March 1970 ER - TY - ABST T1 - Computer assistance for individualizing measurement Y1 - 1970 A1 - Ferguson, R. L. CY - Pittsburgh PA: University of Pittsburgh, Learning Research and Development Center ER - TY - CONF T1 - A model for computer-assisted criterion-referenced measurement T2 - Paper presented at the annual meeting of the American Educational Research Association/National Council on Measurement in Education Y1 - 1970 A1 - Ferguson, R. L. JF - Paper presented at the annual meeting of the American Educational Research Association/National Council on Measurement in Education CY - Minneapolis MN ER - TY - ABST T1 - Computer-assisted criterion-referenced measurement (Working Paper No 49) Y1 - 1969 A1 - Ferguson, R. L. CY - Pittsburgh PA: University of Pittsburgh, Learning and Research Development Center. (ERIC No. ED 037 089) ER - TY - BOOK T1 - The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction Y1 - 1969 A1 - Ferguson, R. L. CY - Doctoral dissertation, University of Pittsburgh. Dissertation Abstracts International, 30-09A, 3856. (University Microfilms No. 70-4530). ER -