TY - JOUR T1 - Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank JF - Assessment Y1 - In Press A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Philip Spinhoven A1 - Edwin de Beurs AB - We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. UR - https://doi.org/10.1177/1073191117746742 ER - TY - JOUR T1 - A Dynamic Stratification Method for Improving Trait Estimation in Computerized Adaptive Testing Under Item Exposure Control JF - Applied Psychological Measurement Y1 - 2020 A1 - Jyun-Hong Chen A1 - Hsiu-Yi Chao A1 - Shu-Ying Chen AB - When computerized adaptive testing (CAT) is under stringent item exposure control, the precision of trait estimation will substantially decrease. A new item selection method, the dynamic Stratification method based on Dominance Curves (SDC), which is aimed at improving trait estimation, is proposed to mitigate this problem. The objective function of the SDC in item selection is to maximize the sum of test information for all examinees rather than maximizing item information for individual examinees at a single-item administration, as in conventional CAT. To achieve this objective, the SDC uses dominance curves to stratify an item pool into strata with the number being equal to the test length to precisely and accurately increase the quality of the administered items as the test progresses, reducing the likelihood that a high-discrimination item will be administered to an examinee whose ability is not close to the item difficulty. Furthermore, the SDC incorporates a dynamic process for on-the-fly item–stratum adjustment to optimize the use of quality items. Simulation studies were conducted to investigate the performance of the SDC in CAT under item exposure control at different levels of severity. According to the results, the SDC can efficiently improve trait estimation in CAT through greater precision and more accurate trait estimation than those generated by other methods (e.g., the maximum Fisher information method) in most conditions. VL - 44 UR - https://doi.org/10.1177/0146621619843820 ER - TY - JOUR T1 - Developing Multistage Tests Using D-Scoring Method JF - Educational and Psychological Measurement Y1 - 2019 A1 - Kyung (Chris) T. Han A1 - Dimiter M. Dimitrov A1 - Faisal Al-Mashary AB - The D-scoring method for scoring and equating tests with binary items proposed by Dimitrov offers some of the advantages of item response theory, such as item-level difficulty information and score computation that reflects the item difficulties, while retaining the merits of classical test theory such as the simplicity of number correct score computation and relaxed requirements for model sample sizes. Because of its unique combination of those merits, the D-scoring method has seen quick adoption in the educational and psychological measurement field. Because item-level difficulty information is available with the D-scoring method and item difficulties are reflected in test scores, it conceptually makes sense to use the D-scoring method with adaptive test designs such as multistage testing (MST). In this study, we developed and compared several versions of the MST mechanism using the D-scoring approach and also proposed and implemented a new framework for conducting MST simulation under the D-scoring method. Our findings suggest that the score recovery performance under MST with D-scoring was promising, as it retained score comparability across different MST paths. We found that MST using the D-scoring method can achieve improvements in measurement precision and efficiency over linear-based tests that use D-scoring method. VL - 79 UR - https://doi.org/10.1177/0013164419841428 ER - TY - CONF T1 - Developing a CAT: An Integrated Perspective T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nathan Thompson KW - CAT Development KW - integrated approach AB -

Most resources on computerized adaptive testing (CAT) tend to focus on psychometric aspects such as mathematical formulae for item selection or ability estimation. However, development of a CAT assessment requires a holistic view of project management, financials, content development, product launch and branding, and more. This presentation will develop such a holistic view, which serves several purposes, including providing a framework for validity, estimating costs and ROI, and making better decisions regarding the psychometric aspects.

Thompson and Weiss (2011) presented a 5-step model for developing computerized adaptive tests (CATs). This model will be presented and discussed as the core of this holistic framework, then applied to real-life examples. While most CAT research focuses on developing new quantitative algorithms, this presentation is instead intended to help researchers evaluate and select algorithms that are most appropriate for their needs. It is therefore ideal for practitioners that are familiar with the basics of item response theory and CAT, and wish to explore how they might apply these methodologies to improve their assessments.

Steps include:

1. Feasibility, applicability, and planning studies

2. Develop item bank content or utilize existing bank

3. Pretest and calibrate item bank

4. Determine specifications for final CAT

5. Publish live CAT.

So, for example, Step 1 will contain simulation studies which estimate item bank requirements, which then can be used to determine costs of content development, which in turn can be integrated into an estimated project cost timeline. Such information is vital in determining if the CAT should even be developed in the first place.

References

Thompson, N. A., & Weiss, D. J. (2011). A Framework for the Development of Computerized Adaptive Tests. Practical Assessment, Research & Evaluation, 16(1). Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Jv8bpH2zkw5TqSMi03e5JJJ98QtXf-Cv ER - TY - JOUR T1 - Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank JF - Evaluation & the Health Professions Y1 - 2017 A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Edwin de Beurs AB - We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. VL - 40 UR - https://doi.org/10.1177/0163278716684168 ER - TY - CONF T1 - The Development of a Web-Based CAT in China T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chongli Liang A1 - Danjun Wang A1 - Dan Zhou A1 - Peida Zhan KW - China KW - Web-Based CAT AB -

Cognitive ability assessment has been widely used as the recruitment tool in hiring potential employees. Traditional cognitive ability tests have been encountering threats from item-exposures and long time for answering. Especially in China, campus recruitment thinks highly of short answering time and anti-cheating. Beisen, as the biggest native online assessment software provider, developed a web-based CAT for cognitive ability which assessing verbal, quantitative, logical and spatial ability in order to decrease answering times, improve assessment accuracy and reduce threats from cheating and faking in online ability test. The web-based test provides convenient testing for examinees who can access easily to the test via internet just by login the test website at any time and any place through any Internet-enabled devices (e.g., laptops, IPADs, and smart phones).

We designed the CAT following strategies of establishing item bank, setting starting point, item selection, scoring and terminating. Additionally, we pay close attention to administrating the test via web. For the CAT procedures, we employed online calibration for establishing a stable and expanding item bank, and integrated maximum Fisher information, α-stratified strategy and randomization for item selection and coping with item exposures. Fixed-length and variable-length strategies were combined in terminating the test. For fulfilling the fluid web-based testing, we employed cloud computing techniques and designed each computing process subtly. Distributed computation was used to process scoring which executes EAP and item selecting at high speed. Caching all items to the servers in advance helps shortening the process of loading items to examinees’ terminal equipment. Horizontally scalable cloud servers function coping with great concurrency. The massive computation in item selecting was conversed to searching items from an information matrix table.

We examined the average accuracy, bank usage and computing performance in the condition of laboratory and real testing. According to a test for almost 28000 examinees, we found that bank usage is averagely 50%, and that 80% tests terminate at test information of 10 and averagely at 9.6. In context of great concurrency, the testing is unhindered and the process of scoring and item selection only takes averagely 0.23s for each examiner.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - JOUR T1 - The Development of MST Test Information for the Prediction of Test Performances JF - Educational and Psychological Measurement Y1 - 2017 A1 - Ryoungsun Park A1 - Jiseon Kim A1 - Hyewon Chung A1 - Barbara G. Dodd AB - The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance. VL - 77 UR - http://dx.doi.org/10.1177/0013164416662960 ER - TY - CONF T1 - DIF-CAT: Doubly Adaptive CAT Using Subgroup Information to Improve Measurement Precision T2 - IACAT 2017 Conference Y1 - 2017 A1 - Joy Wang A1 - David J. Weiss A1 - Chun Wang KW - DIF-CAT KW - Doubly Adaptive CAT KW - Measurement Precision KW - subgroup information AB -

Differential item functioning (DIF) is usually regarded as a test fairness issue in high-stakes tests. In low-stakes tests, it is more of an accuracy problem. However, in low-stakes tests, the same method, deleting items that demonstrate significant DIF, is still employed to treat DIF items. When political concerns are not important, such as in low-stakes tests and instruments that are not used to make decisions about people, deleting items might not be optimal. Computerized adaptive testing (CAT) is more and more frequently used in low-stakes tests. The DIF-CAT method evaluated in this research is designed to cope with DIF in a CAT environment. Using this method, item parameters are separately estimated for the focal group and the reference group in a DIF study, then CATs are administered based on different sets of item parameters for the focal and reference groups.

To evaluate the performance of the DIF-CAT procedure, it was compared in a simulation study to (1) deleting all the DIF items in a CAT bank and (2) ignoring DIF. A 300-item flat item bank and a 300-item peaked item bank were simulated using the three-parameter logistic IRT model with D = 1,7. 40% of the items in each bank showed DIF. The DIF size was b and/or a = 0.5 while original b ranged from -3 to 3 and a ranged from 0.3 to 2.1. Three types of DIF were considered: (1) uniform DIF caused by differences in b, non-uniform DIF caused by differences in a, and non-uniform DIF caused by differences in both a and b. 500 normally distributed simulees in each of reference and focal groups were used in item parameter re-calibration. In the Delete DIF method, only DIF-free items were calibrated. In the Ignore DIF method, all the items were calibrated using all simulees without differentiating the groups. In the DIF-CAT method, the DIF-free items were used as anchor items to estimate the item parameters for the focal and reference groups and the item parameters from recalibration were used. All simulees used the same item parameters in the Delete method and the Ignore method. CATs for simulees within the two groups used group-specific item parameters in the DIF-CAT method. In the CAT stage, 100 simulees were generated for each of the reference and focal groups, at each of six discrete q levels ranging from -2.5 to 2.5. CAT test length was fixed at 40 items. Bias, average absolute difference, RMSE, standard error of θ estimates, and person fit, were used to compare the performance of the DIF methods. DIF item usage was also recorded for the Ignore method and the DIF-CAT method.

Generally, the DIF-CAT method outperformed both the Delete method and the Ignore method in dealing with DIF items in CAT. The Delete method, which is the most frequently used method for handling DIF, performed the worst of the three methods in a CAT environment, as reflected in multiple indices of measurement precision. Even the Ignore method, which simply left DIF items in the item bank, provided θ estimates of higher precision than the Delete method. This poor performance of the Delete method was probably due to reduction in size of the item bank available for each CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Gu4FR06qM5EZNp_Ns0Kt3HzBqWAv3LPy ER - TY - JOUR T1 - Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing JF - Journal of Educational Measurement Y1 - 2017 A1 - Kang, Hyeon-Ah A1 - Zhang, Susu A1 - Chang, Hua-Hua AB - The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen-Shannon (JS) divergence, a symmetrized version of the Kullback-Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. VL - 54 UR - http://dx.doi.org/10.1111/jedm.12139 ER - TY - JOUR T1 - Detecting Item Preknowledge in Computerized Adaptive Testing Using Information Theory and Combinatorial Optimization JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Belov, D. I. KW - combinatorial optimization KW - hypothesis testing KW - item preknowledge KW - Kullback-Leibler divergence KW - simulated annealing. KW - test security VL - 2 UR - http://www.iacat.org/jcat/index.php/jcat/article/view/36/18 IS - 3 ER - TY - JOUR T1 - Determining the Overall Impact of Interruptions During Online Testing JF - Journal of Educational Measurement Y1 - 2014 A1 - Sinharay, Sandip A1 - Wan, Ping A1 - Whitaker, Mike A1 - Kim, Dong-In A1 - Zhang, Litong A1 - Choi, Seung W. AB -

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees’ scores. There is a lack of research on this topic due to the novelty of the problem. This article is an attempt to fill that void. Several methods, primarily based on propensity score matching, linear regression, and item response theory, were suggested to determine the overall impact of the interruptions on the examinees’ scores. A realistic simulation study shows that the suggested methods have satisfactory Type I error rate and power. Then the methods were applied to data from the Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions in 2013. The results indicate that the interruptions did not have a significant overall impact on the student scores for the ISTEP+ test.

VL - 51 UR - http://dx.doi.org/10.1111/jedm.12052 ER - TY - JOUR T1 - Deriving Stopping Rules for Multidimensional Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Wang, Chun A1 - Chang, Hua-Hua A1 - Boughton, Keith A. AB -

Multidimensional computerized adaptive testing (MCAT) is able to provide a vector of ability estimates for each examinee, which could be used to provide a more informative profile of an examinee’s performance. The current literature on MCAT focuses on the fixed-length tests, which can generate less accurate results for those examinees whose abilities are quite different from the average difficulty level of the item bank when there are only a limited number of items in the item bank. Therefore, instead of stopping the test with a predetermined fixed test length, the authors use a more informative stopping criterion that is directly related to measurement accuracy. Specifically, this research derives four stopping rules that either quantify the measurement precision of the ability vector (i.e., minimum determinant rule [D-rule], minimum eigenvalue rule [E-rule], and maximum trace rule [T-rule]) or quantify the amount of available information carried by each item (i.e., maximum Kullback–Leibler divergence rule [K-rule]). The simulation results showed that all four stopping rules successfully terminated the test when the mean squared error of ability estimation is within a desired range, regardless of examinees’ true abilities. It was found that when using the D-, E-, or T-rule, examinees with extreme abilities tended to have tests that were twice as long as the tests received by examinees with moderate abilities. However, the test length difference with K-rule is not very dramatic, indicating that K-rule may not be very sensitive to measurement precision. In all cases, the cutoff value for each stopping rule needs to be adjusted on a case-by-case basis to find an optimal solution.

VL - 37 UR - http://apm.sagepub.com/content/37/2/99.abstract ER - TY - JOUR T1 - Detecting Local Item Dependence in Polytomous Adaptive Data JF - Journal of Educational Measurement Y1 - 2012 A1 - Mislevy, Jessica L. A1 - Rupp, André A. A1 - Harring, Jeffrey R. AB -

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health-outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed-form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q3 Statistic and Pearson's Statistic X2, in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient-Reported Outcomes Measurement Information System (PROMIS).

VL - 49 UR - http://dx.doi.org/10.1111/j.1745-3984.2012.00165.x ER - TY - JOUR T1 - Development of a computerized adaptive test for depression JF - Archives of General Psychiatry Y1 - 2012 A1 - Robert D. Gibbons A1 - David .J. Weiss A1 - Paul A. Pilkonis A1 - Ellen Frank A1 - Tara Moore A1 - Jong Bae Kim A1 - David J. Kupfer VL - 69 UR - WWW.ARCHGENPSYCHIATRY.COM IS - 11 ER - TY - JOUR T1 - Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Jacobsen, J. A1 - Ackermann, R. A1 - Egüez, J. A1 - Ganguli, D. A1 - Rickard, P. A1 - Taylor, L. AB -

A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.

VL - 12 UR - http://www.testpublishers.org/journal-of-applied-testing-technology IS - 1 ER - TY - CONF T1 - Detecting DIF between Conventional and Computerized Adaptive Testing: A Monte Carlo Study T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Barth B. Riley A1 - Adam C. Carle KW - 95% Credible Interval KW - CAT KW - DIF KW - differential item function KW - modified robust Z statistic KW - Monte Carlo methodologies AB -

A comparison od two procedures, Modified Robust Z and 95% Credible Interval, were compared in a Monte Carlo study. Both procedures evidenced adequate control of false positive DIF results.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CHAP T1 - Designing and Implementing a Multistage Adaptive Test: The Uniform CPA Exam T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Melican, G.J. A1 - Breithaupt, K A1 - Zhang, Y. JF - Elements of Adaptive Testing ER - TY - CHAP T1 - Designing Item Pools for Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Designing item pools to optimize the functioning of a computerized adaptive test JF - Psychological Test and Assessment Modeling Y1 - 2010 A1 - Reckase, M. D. AB - Computerized adaptive testing (CAT) is a testing procedure that can result in improved precision for a specified test length or reduced test length with no loss of precision. However, these attractive psychometric features of CATs are only achieved if appropriate test items are available for administration. This set of test items is commonly called an “item pool.” This paper discusses the optimal characteristics for an item pool that will lead to the desired properties for a CAT. Then, a procedure is described for designing the statistical characteristics of the item parameters for an optimal item pool within an item response theory framework. Because true optimality is impractical, methods for achieving practical approximations to optimality are described. The results of this approach are shown for an operational testing program including comparisons to the results from the item pool currently used in that testing program.Key VL - 52 SN - 2190-0507 ER - TY - CHAP T1 - Detecting Person Misfit in Adaptive Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Meijer, R. R. A1 - van Krimpen-Stoop, E. M. L. A. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM JF - Personality and Individual Differences Y1 - 2010 A1 - Egberink, I. J. L. A1 - Meijer, R. R. A1 - Veldkamp, B. P. A1 - Schakel, L. A1 - Smid, N. G. KW - CAT KW - computerized adaptive testing KW - CUSUM approach KW - person Fit AB - The scalability of individual trait scores on a computerized adaptive test (CAT) was assessed through investigating the consistency of individual item score patterns. A sample of N = 428 persons completed a personality CAT as part of a career development procedure. To detect inconsistent item score patterns, we used a cumulative sum (CUSUM) procedure. Combined information from the CUSUM, other personality measures, and interviews showed that similar estimated trait values may have a different interpretation.Implications for computer-based assessment are discussed. VL - 48 SN - 01918869 ER - TY - JOUR T1 - Deterioro de parámetros de los ítems en tests adaptativos informatizados: estudio con eCAT [Item parameter drift in computerized adaptive testing: Study with eCAT] JF - Psicothema Y1 - 2010 A1 - Abad, F. J. A1 - Olea, J. A1 - Aguado, D. A1 - Ponsoda, V. A1 - Barrada, J KW - *Software KW - Educational Measurement/*methods/*statistics & numerical data KW - Humans KW - Language AB -

En el presente trabajo se muestra el análisis realizado sobre un Test Adaptativo Informatizado (TAI) diseñado para la evaluación del nivel de inglés, denominado eCAT, con el objetivo de estudiar el deterioro de parámetros (parameter drift) producido desde la calibración inicial del banco de ítems. Se ha comparado la calibración original desarrollada para la puesta en servicio del TAI (N= 3224) y la calibración actual obtenida con las aplicaciones reales del TAI (N= 7254). Se ha analizado el Funcionamiento Diferencial de los Ítems (FDI) en función de los parámetros utilizados y se ha simulado el impacto que sobre el nivel de rasgo estimado tiene la variación en los parámetros. Los resultados muestran que se produce especialmente un deterioro de los parámetros a y c, que hay unimportante número de ítems del banco para los que existe FDI y que la variación de los parámetros produce un impacto moderado en la estimación de θ de los evaluados con nivel de inglés alto. Se concluye que los parámetros de los ítems se han deteriorado y deben ser actualizados.Item parameter drift in computerized adaptive testing: Study with eCAT. This study describes the parameter drift analysis conducted on eCAT (a Computerized Adaptive Test to assess the written English level of Spanish speakers). The original calibration of the item bank (N = 3224) was compared to a new calibration obtained from the data provided by most eCAT operative administrations (N =7254). A Differential Item Functioning (DIF) study was conducted between the original and the new calibrations. The impact that the new parameters have on the trait level estimates was obtained by simulation. Results show that parameter drift is found especially for a and c parameters, an important number of bank items show DIF, and the parameter change has a moderate impact on high-level-English θ estimates. It is then recommended to replace the original estimates by the new set. by the new set.

VL - 22 SN - 0214-9915 (Print)0214-9915 (Linking) N1 - Abad, Francisco JOlea, JulioAguado, DavidPonsoda, VicenteBarrada, Juan REnglish AbstractSpainPsicothemaPsicothema. 2010 May;22(2):340-7. ER - TY - JOUR T1 - Development and evaluation of a confidence-weighting computerized adaptive testing JF - Educational Technology & Society Y1 - 2010 A1 - Yen, Y. C. A1 - Ho, R. G. A1 - Chen, L. J. A1 - Chou, K. Y. A1 - Chen, Y. L. VL - 13(3) ER - TY - JOUR T1 - Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments JF - Sleep Y1 - 2010 A1 - Buysse, D. J. A1 - Yu, L. A1 - Moul, D. E. A1 - Germain, A. A1 - Stover, A. A1 - Dodds, N. E. A1 - Johnston, K. L. A1 - Shablesky-Cade, M. A. A1 - Pilkonis, P. A. KW - *Outcome Assessment (Health Care) KW - *Self Disclosure KW - Adult KW - Aged KW - Aged, 80 and over KW - Cross-Sectional Studies KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Psychometrics KW - Questionnaires KW - Reproducibility of Results KW - Sleep Disorders/*diagnosis KW - Young Adult AB - STUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions. VL - 33 SN - 0161-8105 (Print)0161-8105 (Linking) N1 - Buysse, Daniel JYu, LanMoul, Douglas EGermain, AnneStover, AngelaDodds, Nathan EJohnston, Kelly LShablesky-Cade, Melissa APilkonis, Paul AAR052155/AR/NIAMS NIH HHS/United StatesU01AR52155/AR/NIAMS NIH HHS/United StatesU01AR52158/AR/NIAMS NIH HHS/United StatesU01AR52170/AR/NIAMS NIH HHS/United StatesU01AR52171/AR/NIAMS NIH HHS/United StatesU01AR52177/AR/NIAMS NIH HHS/United StatesU01AR52181/AR/NIAMS NIH HHS/United StatesU01AR52186/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralValidation StudiesUnited StatesSleepSleep. 2010 Jun 1;33(6):781-92. U2 - 2880437 ER - TY - JOUR T1 - Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension JF - Quality of Life Research Y1 - 2010 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Chie, W. C. A1 - Conroy, T. A1 - Costantini, A. A1 - Fayers, P. A1 - Helbostad, J. A1 - Holzner, B. A1 - Kaasa, S. A1 - Singer, S. A1 - Velikova, G. A1 - Young, T. AB - PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. VL - 20 SN - 1573-2649 (Electronic)0962-9343 (Linking) N1 - Qual Life Res. 2010 Oct 23. ER - TY - CHAP T1 - Developing item variants: An empirical study Y1 - 2009 A1 - Wendt, A. A1 - Kao, S. A1 - Gorham, J. A1 - Woo, A. AB - Large-scale standardized test have been widely used for educational and licensure testing. In computerized adaptive testing (CAT), one of the practical concerns for maintaining large-scale assessments is to ensure adequate numbers of high-quality items that are required for item pool functioning. Developing items at specific difficulty levels and for certain areas of test plans is a wellknown challenge. The purpose of this study was to investigate strategies for varying items that can effectively generate items at targeted difficulty levels and specific test plan areas. Each variant item generation model was developed by decomposing selected source items possessing ideal measurement properties and targeting the desirable content domains. 341 variant items were generated from 72 source items. Data were collected from six pretest periods. Items were calibrated using the Rasch model. Initial results indicate that variant items showed desirable measurement properties. Additionally, compared to an average of approximately 60% of the items passing pretest criteria, an average of 84% of the variant items passed the pretest criteria. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 194 KB} ER - TY - JOUR T1 - Development and preliminary testing of a computerized adaptive assessment of chronic pain JF - Journal of Pain Y1 - 2009 A1 - Anatchkova, M. D. A1 - Saris-Baglama, R. N. A1 - Kosinski, M. A1 - Bjorner, J. B. KW - *Computers KW - *Questionnaires KW - Activities of Daily Living KW - Adaptation, Psychological KW - Chronic Disease KW - Cohort Studies KW - Disability Evaluation KW - Female KW - Humans KW - Male KW - Middle Aged KW - Models, Psychological KW - Outcome Assessment (Health Care) KW - Pain Measurement/*methods KW - Pain, Intractable/*diagnosis/psychology KW - Psychometrics KW - Quality of Life KW - User-Computer Interface AB - The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain. VL - 10 SN - 1528-8447 (Electronic)1526-5900 (Linking) N1 - Anatchkova, Milena DSaris-Baglama, Renee NKosinski, MarkBjorner, Jakob B1R43AR052251-01A1/AR/NIAMS NIH HHS/United StatesEvaluation StudiesResearch Support, N.I.H., ExtramuralUnited StatesThe journal of pain : official journal of the American Pain SocietyJ Pain. 2009 Sep;10(9):932-43. U2 - 2763618 ER - TY - JOUR T1 - Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis JF - Rehabilitation Psychology Y1 - 2009 A1 - Forkmann, T. A1 - Boecker, M. A1 - Norra, C. A1 - Eberle, N. A1 - Kircher, T. A1 - Schauerte, P. A1 - Mischke, K. A1 - Westhofen, M. A1 - Gauggel, S. A1 - Wirtz, M. KW - Adaptation, Psychological KW - Adult KW - Aged KW - Depressive Disorder/*diagnosis/psychology KW - Diagnosis, Computer-Assisted KW - Female KW - Heart Diseases/*psychology KW - Humans KW - Male KW - Mental Disorders/*psychology KW - Middle Aged KW - Models, Statistical KW - Otorhinolaryngologic Diseases/*psychology KW - Personality Assessment/statistics & numerical data KW - Personality Inventory/*statistics & numerical data KW - Psychometrics/statistics & numerical data KW - Questionnaires KW - Reproducibility of Results KW - Sick Role AB - OBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. VL - 54 SN - 0090-5550 (Print)0090-5550 (Linking) N1 - Forkmann, ThomasBoecker, MarenNorra, ChristineEberle, NicoleKircher, TiloSchauerte, PatrickMischke, KarlWesthofen, MartinGauggel, SiegfriedWirtz, MarkusResearch Support, Non-U.S. Gov'tUnited StatesRehabilitation psychologyRehabil Psychol. 2009 May;54(2):186-97. ER - TY - JOUR T1 - Diagnostic classification models and multidimensional adaptive testing: A commentary on Rupp and Templin. JF - Measurement: Interdisciplinary Research and Perspectives Y1 - 2009 A1 - Frey, A. A1 - Carstensen, C. H. VL - 7 ER - TY - JOUR T1 - Direct and Inverse Problems of Item Pool Design for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Belov, Dmitry I. A1 - Armstrong, Ronald D. AB -

The recent literature on computerized adaptive testing (CAT) has developed methods for creating CAT item pools from a large master pool. Each CAT pool is designed as a set of nonoverlapping forms reflecting the skill levels of an assumed population of test takers. This article presents a Monte Carlo method to obtain these CAT pools and discusses its advantages over existing methods. Also, a new problem is considered that finds a population ability density function best matching the master pool. An analysis of the solution to this new problem provides testing organizations with effective guidance for maintaining their master pools. Computer experiments with a pool of Law School Admission Test items and its assembly constraints are presented.

VL - 69 UR - http://epm.sagepub.com/content/69/4/533.abstract ER - TY - JOUR T1 - Direct and inverse problems of item pool design for computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Belov, D. I. A1 - Armstrong, R. D. VL - 69 ER - TY - CONF T1 - Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing T2 - Joint Meeting on Adolescent Treatment Effectiveness Y1 - 2008 A1 - Dennis, M. L. A1 - Funk, R. A1 - Titus, J. A1 - Riley, B. B. A1 - Hosman, S. A1 - Kinne, S. JF - Joint Meeting on Adolescent Treatment Effectiveness CY - Washington D.C., USA N1 - ProCite field[6]: Paper presented at the ER - TY - JOUR T1 - The D-optimality item selection criterion in the early stage of CAT: A study with the graded response model JF - Journal of Educational and Behavioral Statistics Y1 - 2008 A1 - Passos, V. L. A1 - Berger, M. P. F. A1 - Tan, F. E. S. KW - computerized adaptive testing KW - D optimality KW - item selection AB - During the early stage of computerized adaptive testing (CAT), item selection criteria based on Fisher’s information often produce less stable latent trait estimates than the Kullback-Leibler global information criterion. Robustness against early stage instability has been reported for the D-optimality criterion in a polytomous CAT with the Nominal Response Model and is shown herein to be reproducible for the Graded Response Model. For comparative purposes, the A-optimality and the global information criteria are also applied. Their item selection is investigated as a function of test progression and item bank composition. The results indicate how the selection of specific item parameters underlies the criteria performances evaluated via accuracy and precision of estimation. In addition, the criteria item exposure rates are compared, without the use of any exposure controlling measure. On the account of stability, precision, accuracy, numerical simplicity, and less evidently, item exposure rate, the D-optimality criterion can be recommended for CAT. VL - 33 ER - TY - JOUR T1 - The design and evaluation of a computerized adaptive test on mobile devices JF - Computers & Education Y1 - 2007 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 49. ER - TY - CHAP T1 - The design of p-optimal item banks for computerized adaptive tests Y1 - 2007 A1 - Reckase, M. D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. {PDF file, 211 KB}. ER - TY - CHAP T1 - Designing optimal item pools for computerized adaptive tests with Sympson-Hetter exposure control Y1 - 2007 A1 - Gu, L. A1 - Reckase, M. D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing N1 - 3 MB} ER - TY - CHAP T1 - Designing templates based on a taxonomy of innovative items Y1 - 2007 A1 - Parshall, C. G. A1 - Harmes, J. C. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 149 KB} ER - TY - JOUR T1 - Detecting Differential Speededness in Multistage Testing JF - Journal of Educational Measurement Y1 - 2007 A1 - van der Linden, Wim J. A1 - Breithaupt, Krista A1 - Chuah, Siang Chee A1 - Zhang, Yanwei AB -

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed between subtests and test takers and detecting differential speededness. An empirical data set for a multistage test in the computerized CPA Exam was used to demonstrate the procedures. Although the more difficult subtests appeared to have items that were more time intensive than the easier subtests, an analysis of the residual response times did not reveal any significant differential speededness because the time limit appeared to be appropriate. In a separate analysis, within each of the subtests, we found minor but consistent patterns of residual times that are believed to be due to a warm-up effect, that is, use of more time on the initial items than they actually need.

VL - 44 UR - http://dx.doi.org/10.1111/j.1745-3984.2007.00030.x ER - TY - JOUR T1 - Developing tailored instruments: item banking and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Bjorner, J. B. A1 - Chang, C-H. A1 - Thissen, D. A1 - Reeve, B. B. KW - *Health Status KW - *Health Status Indicators KW - *Mental Health KW - *Outcome Assessment (Health Care) KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Algorithms KW - Factor Analysis, Statistical KW - Humans KW - Models, Statistical KW - Psychometrics AB - Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. VL - 16 SN - 0962-9343 (Print) N1 - Bjorner, Jakob BueChang, Chih-HungThissen, DavidReeve, Bryce B1R43NS047763-01/NS/United States NINDSAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:95-108. Epub 2007 Feb 15. ER - TY - JOUR T1 - Development and evaluation of a computer adaptive test for “Anxiety” (Anxiety-CAT) JF - Quality of Life Research Y1 - 2007 A1 - Walter, O. B. A1 - Becker, J. A1 - Bjorner, J. B. A1 - Fliege, H. A1 - Klapp, B. F. A1 - Rose, M. VL - 16 ER - TY - CHAP T1 - The development of a computerized adaptive test for integrity Y1 - 2007 A1 - Egberink, I. J. L. A1 - Veldkamp, B. P. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDf file, 290 KB} ER - TY - CHAP T1 - Development of a multiple-component CAT for measuring foreign language proficiency (SIMTEST) Y1 - 2007 A1 - Sumbling, M. A1 - Sanz, P. A1 - Viladrich, M. C. A1 - Doval, E. A1 - Riera, L. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 258 KB} ER - TY - CHAP T1 - Designing computerized adaptive tests Y1 - 2006 A1 - Davey, T. A1 - Pitoniak, M. J. CY - S.M. Downing and T. M. Haladyna (Eds.), Handbook of test development. New Jersey: Lawrence Erlbaum Associates. ER - TY - JOUR T1 - Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer JF - Evaluation and the Health Professions Y1 - 2005 A1 - Eton, D. T. A1 - Lai, J. S. A1 - Cella, D. A1 - Reeve, B. B. A1 - Talcott, J. A. A1 - Clark, J. A. A1 - McPherson, C. P. A1 - Litwin, M. S. A1 - Moinpour, C. M. KW - *Quality of Life KW - *Questionnaires KW - Adult KW - Aged KW - Data Collection/methods KW - Humans KW - Intestine, Large/*physiopathology KW - Male KW - Middle Aged KW - Prostatic Neoplasms/*physiopathology KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Statistics, Nonparametric AB - Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. VL - 28 N1 - 0163-2787 (Print)Journal Article ER - TY - JOUR T1 - Design and evaluation of an XML-based platform-independent computerized adaptive testing system JF - IEEE Transactions on Education Y1 - 2005 A1 - Ho, R.-G., A1 - Yen, Y.-C. VL - 48(2) ER - TY - JOUR T1 - Development of a computer-adaptive test for depression (D-CAT) JF - Quality of Life Research Y1 - 2005 A1 - Fliege, H. A1 - Becker, J. A1 - Walter, O. B. A1 - Bjorner, J. B. A1 - Klapp, B. F. A1 - Rose, M. VL - 14 ER - TY - CHAP T1 - The development of the adaptive item language assessment (AILA) for mixed-ability students Y1 - 2005 A1 - Giouroglou, H. A1 - Economides, A. A. CY - Proceedings E-Learn 2005 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, 643-650, Vancouver, Canada, AACE, October 2005. ER - TY - JOUR T1 - Dynamic assessment of health outcomes: Time to let the CAT out of the bag? JF - Health Services Research Y1 - 2005 A1 - Cook, K. F. A1 - O'Malley, K. J. A1 - Roddey, T. S. KW - computer adaptive testing KW - Item Response Theory KW - self reported health outcomes AB - Background: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 40 SN - 0017-9124 (Print); 1475-6773 (Electronic) ER - TY - CONF T1 - Detecting exposed test items in computer-based testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Han, N. A1 - Hambleton, R. K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 1.245 MB} ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory Y1 - 2004 A1 - Bjorner, J. B. JF - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory CY - Item Banks, and Computer-Adaptive Testing,” Bethesda MD N1 - {PDF file, 406 KB} ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory Y1 - 2004 A1 - Chang, C-H. JF - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory CY - Item Banks, and Computer-Adaptive Testing,” Bethesda MD N1 - {PDF file, 181 KB} ER - TY - JOUR T1 - The development and evaluation of a software prototype for computer-adaptive testing JF - Computers and Education Y1 - 2004 A1 - Lilley, M A1 - Barker, T A1 - Britton, C KW - computerized adaptive testing VL - 43 ER - TY - JOUR T1 - Developing an initial physical function item bank from existing sources JF - Journal of Applied Measurement Y1 - 2003 A1 - Bode, R. K. A1 - Cella, D. A1 - Lai, J. S. A1 - Heinemann, A. W. KW - *Databases KW - *Sickness Impact Profile KW - Adaptation, Psychological KW - Data Collection KW - Humans KW - Neoplasms/*physiopathology/psychology/therapy KW - Psychometrics KW - Quality of Life/*psychology KW - Research Support, U.S. Gov't, P.H.S. KW - United States AB - The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed. VL - 4 N1 - 1529-7713Journal Article ER - TY - JOUR T1 - Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function (FLEX-SF) JF - Medical Care (in press) Y1 - 2003 A1 - Cook, K. F. A1 - Roddey, T. S. A1 - Gartsman, G M A1 - Olson, S L N1 - #CO03-01 ER - TY - CONF T1 - Development of the Learning Potential Computerised Adaptive Test (LPCAT) T2 - Unpublished manuscript. Y1 - 2003 A1 - De Beer, M. JF - Unpublished manuscript. N1 - {PDF file, 563 KB} ER - TY - JOUR T1 - Development, reliability, and validity of a computerized adaptive version of the Schedule for Nonadaptive and Adaptive Personality JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Simms, L. J. AB - Computerized adaptive testing (CAT) and Item Response Theory (IRT) techniques were applied to the Schedule for Nonadaptive and Adaptive Personality (SNAP) to create a more efficient measure with little or no cost to test reliability or validity. The SNAP includes 15 factor analytically derived and relatively unidimensional traits relevant to personality disorder. IRT item parameters were calibrated on item responses from a sample of 3,995 participants who completed the traditional paper-and-pencil (P&P) SNAP in a variety of university, community, and patient settings. Computerized simulations were conducted to test various adaptive testing algorithms, and the results informed the construction of the CAT version of the SNAP (SNAP-CAT). A validation study of the SNAP-CAT was conducted on a sample of 413 undergraduates who completed the SNAP twice, separated by one week. Participants were randomly assigned to one of four groups who completed (1) a modified P&P version of the SNAP (SNAP-PP) twice (n = 106), (2) the SNAP-PP first and the SNAP-CAT second (n = 105), (3) the SNAP-CAT first and the SNAP-PP second (n = 102), and (4) the SNAP-CAT twice (n = 100). Results indicated that the SNAP-CAT was 58% and 60% faster than the traditional P&P version, at Times 1 and 2, respectively, and mean item savings across scales were 36% and 37%, respectively. These savings came with minimal cost to reliability or validity, and the two test forms were largely equivalent. Descriptive statistics, rank-ordering of scores, internal factor structure, and convergent/discriminant validity were highly comparable across testing modes and methods of scoring, and very few differences between forms replicated across testing sessions. In addition, participants overwhelmingly preferred the computerized version to the P&P version. However, several specific problems were identified for the Self-harm and Propriety scales of the SNAP-CAT that appeared to be broadly related to IRT calibration difficulties. Reasons for these anomalous findings are discussed, and follow-up studies are suggested. Despite these specific problems, the SNAP-CAT appears to be a viable alternative to the traditional P&P SNAP. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 63 ER - TY - JOUR T1 - Data sparseness and on-line pretest item calibration-scaling methods in CAT JF - Journal of Educational Measurement Y1 - 2002 A1 - Ban, J-C. A1 - Hanson, B. A. A1 - Yi, Q. A1 - Harris, D. J. KW - Computer Assisted Testing KW - Educational Measurement KW - Item Response Theory KW - Maximum Likelihood KW - Methodology KW - Scaling (Testing) KW - Statistical Data AB - Compared and evaluated 3 on-line pretest item calibration-scaling methods (the marginal maximum likelihood estimate with 1 expectation maximization [EM] cycle [OEM] method, the marginal maximum likelihood estimate with multiple EM cycles [MEM] method, and M. L. Stocking's Method B) in terms of item parameter recovery when the item responses to the pretest items in the pool are sparse. Simulations of computerized adaptive tests were used to evaluate the results yielded by the three methods. The MEM method produced the smallest average total error in parameter estimation, and the OEM method yielded the largest total error (PsycINFO Database Record (c) 2005 APA ) VL - 39 ER - TY - JOUR T1 - Detection of person misfit in computerized adaptive tests with polytomous items JF - Applied Psychological Measurement Y1 - 2002 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. AB - Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. For a computerized adaptive test (CAT) using dichotomous items, several person-fit statistics for detecting mis.tting item score patterns have been proposed. Both for paper-and-pencil (P&P) tests and CATs, detection ofperson mis.t with polytomous items is hardly explored. In this study, the nominal and empirical null distributions ofthe standardized log-likelihood statistic for polytomous items are compared both for P&P tests and CATs. Results showed that the empirical distribution of this statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Second, a new person-fit statistic based on the cumulative sum (CUSUM) procedure from statistical process control was proposed. By means ofsimulated data, critical values were determined that can be used to classify a pattern as fitting or misfitting. The effectiveness of the CUSUM to detect simulees with item preknowledge was investigated. Detection rates using the CUSUM were high for realistic numbers ofdisclosed items. VL - 26 ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement Y1 - 2002 A1 - Thissen, D. JF - Paper presented at the conference “Advances in Health Outcomes Measurement CY - ” Bethesda, Maryland, June 23-25 N1 - {PDF file, 170 KB} ER - TY - CONF T1 - The development and evaluation of a computer-adaptive testing application for English language T2 - Paper presented at the 2002 Computer-Assisted Testing Conference Y1 - 2002 A1 - Lilley, M A1 - Barker, T JF - Paper presented at the 2002 Computer-Assisted Testing Conference CY - United Kingdom N1 - {PDF file, 308 KB} ER - TY - JOUR T1 - Development of an index of physical functional health status in rehabilitation JF - Archives of Physical Medicine and Rehabilitation Y1 - 2002 A1 - Hart, D. L. A1 - Wright, B. D. KW - *Health Status Indicators KW - *Rehabilitation Centers KW - Adolescent KW - Adult KW - Aged KW - Aged, 80 and over KW - Female KW - Health Surveys KW - Humans KW - Male KW - Middle Aged KW - Musculoskeletal Diseases/*physiopathology/*rehabilitation KW - Nervous System Diseases/*physiopathology/*rehabilitation KW - Physical Fitness/*physiology KW - Recovery of Function/physiology KW - Reproducibility of Results KW - Retrospective Studies AB - OBJECTIVE: To describe (1) the development of an index of physical functional health status (FHS) and (2) its hierarchical structure, unidimensionality, reproducibility of item calibrations, and practical application. DESIGN: Rasch analysis of existing data sets. SETTING: A total of 715 acute, orthopedic outpatient centers and 62 long-term care facilities in 41 states participating with Focus On Therapeutic Outcomes, Inc. PATIENTS: A convenience sample of 92,343 patients (40% male; mean age +/- standard deviation [SD], 48+/-17y; range, 14-99y) seeking rehabilitation between 1993 and 1999. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Patients completed self-report health status surveys at admission and discharge. The Medical Outcomes Study 36-Item Short-Form Health Survey's physical functioning scale (PF-10) is the foundation of the physical FHS. The Oswestry Low Back Pain Disability Questionnaire, Neck Disability Index, Lysholm Knee Questionnaire, items pertinent to patients with upper-extremity impairments, and items pertinent to patients with more involved neuromusculoskeletal impairments were cocalibrated into the PF-10. RESULTS: The final FHS item bank contained 36 items (patient separation, 2.3; root mean square measurement error, 5.9; mean square +/- SD infit, 0.9+/-0.5; outfit, 0.9+/-0.9). Analyses supported empirical item hierarchy, unidimensionality, reproducibility of item calibrations, and content and construct validity of the FHS-36. CONCLUSIONS: Results support the reliability and validity of FHS-36 measures in the present sample. Analyses show the potential for a dynamic, computer-controlled, adaptive survey for FHS assessment applicable for group analysis and clinical decision making for individual patients. VL - 83 N1 - 0003-9993 (Print)Journal Article ER - TY - CONF T1 - The Development of STAR Early Literacy T2 - Presentation to the 32rd Annual National Conference on Large-Scale Assessment. Y1 - 2002 A1 - J. R. McBride JF - Presentation to the 32rd Annual National Conference on Large-Scale Assessment. CY - Desert Springs CA ER - TY - THES T1 - DEVELOPMENT, RELIABILITY, AND VALIDITY OF A COMPUTERIZED ADAPTIVE VERSION OF THE SCHEDULE FOR NONADAPTIVE AND ADAPTIVE PERSONALITY Y1 - 2002 A1 - Simms, L. J. CY - Unpublished Ph. D. dissertation, University of Iowa, Iowa City Iowa ER - TY - CONF T1 - Data sparseness and online pretest calibration/scaling methods in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Ban, J A1 - Hanson, B. A. A1 - Yi, Q. A1 - Harris, D. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle N1 - Also ACT Research Report 2002-1) ER - TY - CONF T1 - Deriving a stopping rule for sequential adaptive tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Grabovsky, I. A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 111 KB} ER - TY - ABST T1 - Detection of misfitting item-score patterns in computerized adaptive testing Y1 - 2001 A1 - Stoop, E. M. L. A. CY - Enschede, The Netherlands: Febodruk B N1 - #ST01-01 V. ER - TY - BOOK T1 - Development and evaluation of test assembly procedures for computerized adaptive testing Y1 - 2001 A1 - Robin, F. CY - Unpublished doctoral dissertation, University of Massachusetts, Amherst ER - TY - JOUR T1 - Development of an adaptive multimedia program to collect patient health data JF - American Journal of Preventative Medicine Y1 - 2001 A1 - Sutherland, L. A. A1 - Campbell, M. A1 - Ornstein, K. A1 - Wildemuth, B. A1 - Lobach, D. VL - 21 ER - TY - ABST T1 - The Development of STAR Early Literacy: A report of the School Renaissance Institute. Y1 - 2001 A1 - School-Renaissance-Institute CY - Madison, WI: Author. ER - TY - JOUR T1 - Developments in measurement of persons and items by means of item response models JF - Behaviormetrika Y1 - 2001 A1 - Sijtsma, K. KW - Cognitive KW - Computer Assisted Testing KW - Item Response Theory KW - Models KW - Nonparametric Statistical Tests KW - Processes AB - This paper starts with a general introduction into measurement of hypothetical constructs typical of the social and behavioral sciences. After the stages ranging from theory through operationalization and item domain to preliminary test or questionnaire have been treated, the general assumptions of item response theory are discussed. The family of parametric item response models for dichotomous items is introduced and it is explained how parameters for respondents and items are estimated from the scores collected from a sample of respondents who took the test or questionnaire. Next, the family of nonparametric item response models is explained, followed by the 3 classes of item response models for polytomous item scores (e.g., rating scale scores). Then, to what degree the mean item score and the unweighted sum of item scores for persons are useful for measuring items and persons in the context of item response theory is discussed. Methods for fitting parametric and nonparametric models to data are briefly discussed. Finally, the main applications of item response models are discussed, which include equating and item banking, computerized and adaptive testing, research into differential item functioning, person fit research, and cognitive modeling. (PsycINFO Database Record (c) 2005 APA ) VL - 28 ER - TY - JOUR T1 - Differences between self-adapted and computerized adaptive tests: A meta-analysis JF - Journal of Educational Measurement Y1 - 2001 A1 - Pitkin, A. K. A1 - Vispoel, W. P. KW - Adaptive Testing KW - Computer Assisted Testing KW - Scores computerized adaptive testing KW - Test KW - Test Anxiety AB - Self-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. The authors speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. (PsycINFO Database Record (c) 2005 APA ) VL - 38 ER - TY - CHAP T1 - Designing item pools for computerized adaptive testing T2 - Computerized adaptive testing: Theory and practice Y1 - 2000 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. JF - Computerized adaptive testing: Theory and practice PB - Kluwer Academic Publishers CY - Dendrecht, The Netherlands ER - TY - CHAP T1 - Detecting person misfit in adaptive testing using statistical process control techniques Y1 - 2000 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. CY - W. J. van der Linden, and C. A. W. Glas (Editors). Computerized Adaptive Testing: Theory and Practice. Norwell MA: Kluwer. ER - TY - CHAP T1 - Detecting person misfit in adaptive testing using statistical process control techniques T2 - Computer adaptive testing: Theory and practice Y1 - 2000 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. KW - person Fit JF - Computer adaptive testing: Theory and practice PB - Kluwer Academic. CY - Dordrecht, The Netherlands ER - TY - CONF T1 - Detecting test-takers who have memorized items in computerized-adaptive testing and muti-stage testing: A comparison T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Patsula, L N. A1 - McLeod, L. D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA ER - TY - JOUR T1 - Detection of known items in adaptive testing with a statistical quality control method JF - Journal of Educational and Behavioral Statistics Y1 - 2000 A1 - Veerkamp, W. J. J. A1 - Glas, C. E. W. VL - 25 ER - TY - ABST T1 - Detection of person misfit in computerized adaptive testing with polytomous items (Research Report 00-01) Y1 - 2000 A1 - van Krimpen-Stoop, E. M. L. A. A1 - Meijer, R. R. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - ABST T1 - Development and evaluation of test assembly procedures for computerized adaptive testing (Laboratory of Psychometric and Evaluative Methods Research Report No 391) Y1 - 2000 A1 - Robin, F. CY - Amherst MA: University of Massachusetts, School of Education. ER - TY - JOUR T1 - The development of a computerized version of Vandenberg's mental rotation test and the effect of visuo-spatial working memory loading JF - Dissertation Abstracts International Section A: Humanities and Social Sciences Y1 - 2000 A1 - Strong, S. D. KW - Computer Assisted Testing KW - Mental Rotation KW - Short Term Memory computerized adaptive testing KW - Test Construction KW - Test Validity KW - Visuospatial Memory AB - This dissertation focused on the generation and evaluation of web-based versions of Vandenberg's Mental Rotation Test. Memory and spatial visualization theory were explored in relation to the addition of a visuo-spatial working memory component. Analysis of the data determined that there was a significant difference between scores on the MRT Computer and MRT Memory test. The addition of a visuo-spatial working memory component did significantly affect results at the .05 alpha level. Reliability and discrimination estimates were higher on the MRT Memory version. The computerization of the paper and pencil version on the MRT did not significantly effect scores but did effect the time required to complete the test. The population utilized in the quasi-experiment consisted of 107 university students from eight institutions in engineering graphics related courses. The subjects completed two researcher developed, Web-based versions of Vandenberg's Mental Rotation Test and the original paper and pencil version of the Mental Rotation Test. One version of the test included a visuo-spatial working memory loading. Significant contributions of this study included developing and evaluating computerized versions of Vandenberg's Mental Rotation Test. Previous versions of Vandenberg's Mental Rotation Test did not take advantage of the ability of the computer to incorporate an interaction factor, such as a visuo-spatial working memory loading, into the test. The addition of an interaction factor results in a more discriminate test which will lend itself well to computerized adaptive testing practices. Educators in engineering graphics related disciplines should strongly consider the use of spatial visualization tests to aid in establishing the effects of modern computer systems on fundamental design/drafting skills. Regular testing of spatial visualization skills will result assist in the creation of a more relevant curriculum. Computerized tests which are valid and reliable will assist in making this task feasible. (PsycINFO Database Record (c) 2005 APA ) VL - 60 ER - TY - JOUR T1 - Diagnostische programme in der Demenzfrüherkennung: Der Adaptive Figurenfolgen-Lerntest (ADAFI) [Diagnostic programs in the early detection of dementia: The Adaptive Figure Series Learning Test (ADAFI)] JF - Zeitschrift für Gerontopsychologie & -Psychiatrie Y1 - 2000 A1 - Schreiber, M. D. A1 - Schneider, R. J. A1 - Schweizer, A. A1 - Beckmann, J. F. A1 - Baltissen, R. KW - Adaptive Testing KW - At Risk Populations KW - Computer Assisted Diagnosis KW - Dementia AB - Zusammenfassung: Untersucht wurde die Eignung des computergestützten Adaptiven Figurenfolgen-Lerntests (ADAFI), zwischen gesunden älteren Menschen und älteren Menschen mit erhöhtem Demenzrisiko zu differenzieren. Der im ADAFI vorgelegte Aufgabentyp der fluiden Intelligenzdimension (logisches Auffüllen von Figurenfolgen) hat sich in mehreren Studien zur Erfassung des intellektuellen Leistungspotentials (kognitive Plastizität) älterer Menschen als günstig für die genannte Differenzierung erwiesen. Aufgrund seiner Konzeption als Diagnostisches Programm fängt der ADAFI allerdings einige Kritikpunkte an Vorgehensweisen in diesen bisherigen Arbeiten auf. Es konnte gezeigt werden, a) daß mit dem ADAFI deutliche Lokationsunterschiede zwischen den beiden Gruppen darstellbar sind, b) daß mit diesem Verfahren eine gute Vorhersage des mentalen Gesundheitsstatus der Probanden auf Einzelfallebene gelingt (Sensitivität: 80 %, Spezifität: 90 %), und c) daß die Vorhersageleistung statusdiagnostischer Tests zur Informationsverarbeitungsgeschwindigkeit und zum Arbeitsgedächtnis geringer ist. Die Ergebnisse weisen darauf hin, daß die plastizitätsorientierte Leistungserfassung mit dem ADAFI vielversprechend für die Frühdiagnostik dementieller Prozesse sein könnte.The aim of this study was to examine the ability of the computerized Adaptive Figure Series Learning Test (ADAFI) to differentiate among old subjects at risk for dementia and old healthy controls. Several studies on the subject of measuring the intellectual potential (cognitive plasticity) of old subjects have shown the usefulness of the fluid intelligence type of task used in the ADAFI (completion of figure series) for this differentiation. Because the ADAFI has been developed as a Diagnostic Program it is able to counter some critical issues in those studies. It was shown a) that distinct differences between both groups are revealed by the ADAFI, b) that the prediction of the cognitive health status of individual subjects is quite good (sensitivity: 80 %, specifity: 90 %), and c) that the prediction of the cognitive health status with tests of processing speed and working memory is worse than with the ADAFI. The results indicate that the ADAFI might be a promising plasticity-oriented tool for the measurement of cognitive decline in the elderly, and thus might be useful for the early detection of dementia. VL - 13 ER - TY - JOUR T1 - Does adaptive testing violate local independence? JF - Psychometrika Y1 - 2000 A1 - Mislevy, R. J. A1 - Chang, Hua-Hua VL - 65 ER - TY - ABST T1 - Designing item pools for computerized adaptive testing (Research Report 99-03 ) Y1 - 1999 A1 - Veldkamp, B. P. A1 - van der Linden, W. J. CY - Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis ER - TY - JOUR T1 - Detecting item memorization in the CAT environment JF - Applied Psychological Measurement Y1 - 1999 A1 - McLeod L. D., A1 - Lewis, C. VL - 23 ER - TY - CONF T1 - Detecting items that have been memorized in the CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - McLeod, L. D. A1 - Schinpke, D. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CHAP T1 - Developing computerized adaptive tests for school children Y1 - 1999 A1 - Kingsbury, G. G. A1 - Houser, R.L. CY - F. Drasgow and J. B. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 93-115). Mahwah NJ: Erlbaum. ER - TY - CONF T1 - The development and cognitive evaluation of an audio-assisted computer-adaptive test for eight-grade mathematics T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Williams, V. S. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CHAP T1 - Development and introduction of a computer adaptive Graduate Record Examination General Test Y1 - 1999 A1 - Mills, C. N. CY - F. Drasgow and J .B. Olson-Buchanan (Eds.). Innovations in computerized assessment (pp. 117-135). Mahwah NJ: Erlbaum. ER - TY - CHAP T1 - The development of a computerized adaptive selection system for computer programmers in a financial services company Y1 - 1999 A1 - Zickar, M.. J. A1 - Overton, R. C. A1 - Taylor, L. R. A1 - Harms, H. J. CY - F. Drasgow and J. B. Olsen (Eds.), Innvoations in computerized assessment (p. 7-33). Mahwah NJ Erlbaum. ER - TY - JOUR T1 - The development of an adaptive test for placement in french JF - Studies in language testing Y1 - 1999 A1 - Laurier, M. VL - 10 ER - TY - CHAP T1 - Development of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery Y1 - 1999 A1 - Segall, D. O. A1 - Moreno, K. E. CY - F. Drasgow and J. Olson-Buchanan (Eds.). Innovations in computerized assessment. Mahwah NJ: Erlbaum. ER - TY - JOUR T1 - Dynamic health assessments: The search for more practical and more precise outcomes measures JF - Quality of Life Newsletter Y1 - 1999 A1 - Ware, J. E., Jr. A1 - Bjorner, J. B. A1 - Kosinski, M. N1 - {PDF file, 75 KB} ER - TY - CONF T1 - Developing, maintaining, and renewing the item inventory to support computer-based testing T2 - Paper presented at the colloquium Y1 - 1998 A1 - Way, W. D. A1 - Steffen, M. A1 - Anderson, G. S. JF - Paper presented at the colloquium CY - Computer-Based Testing: Building the Foundation for Future Assessments, Philadelphia PA ER - TY - ABST T1 - Development and evaluation of online calibration procedures (TCN 96-216) Y1 - 1998 A1 - Levine, M. L. A1 - Williams. CY - Champaign IL: Algorithm Design and Measurement Services, Inc ER - TY - ABST T1 - Does adaptive testing violate local independence? (Research Report 98-33) Y1 - 1998 A1 - Mislevy, R. J. A1 - Chang, Hua-Hua CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Detecting misbehaving items in a CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Swygert, K. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago, IL ER - TY - CONF T1 - Detection of aberrant response patterns in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - van der Linden, W. J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Developing and scoring an innovative computerized writing assessment JF - Journal of Educational Measurement Y1 - 1997 A1 - Davey, T. A1 - Godwin, J., A1 - Mittelholz, D. VL - 34 ER - TY - JOUR T1 - Diagnostic adaptive testing: Effects of remedial instruction as empirical validation JF - Journal of Educational Measurement Y1 - 1997 A1 - Tatsuoka, K. K. A1 - Tatsuoka, M. M. VL - 34 ER - TY - JOUR T1 - The distribution of indexes of person fit within the computerized adaptive testing environment JF - Applied Psychological Measurement Y1 - 1997 A1 - Nering, M. L. KW - Adaptive Testing KW - Computer Assisted Testing KW - Fit KW - Person Environment AB - The extent to which a trait estimate represents the underlying latent trait of interest can be estimated by using indexes of person fit. Several statistical methods for indexing person fit have been proposed to identify nonmodel-fitting response vectors. These person-fit indexes have generally been found to follow a standard normal distribution for conventionally administered tests. The present investigation found that within the context of computerized adaptive testing (CAT) these indexes tended not to follow a standard normal distribution. As the item pool became less discriminating, as the CAT termination criterion became less stringent, and as the number of items in the pool decreased, the distributions of the indexes approached a standard normal distribution. It was determined that under these conditions the indexes' distributions approached standard normal distributions because more items were being administered. However, even when over 50 items were administered in a CAT the indexes were distributed in a fashion that was different from what was expected. (PsycINFO Database Record (c) 2006 APA ) VL - 21 N1 - Journal; Peer Reviewed Journal ER - TY - JOUR T1 - Dispelling myths about the new NCLEX exam JF - Recruitment, Retention, and Restructuring Report Y1 - 1996 A1 - Johnson, S. H. KW - *Educational Measurement KW - *Licensure KW - Humans KW - Nursing Staff KW - Personnel Selection KW - United States AB - The new computerized NCLEX system is working well. Most new candidates, employers, and board of nursing representatives like the computerized adaptive testing system and the fast report of results. But, among the candidates themselves some myths have grown which cause them needless anxiety. VL - 9 N1 - Journal Article ER - TY - JOUR T1 - Dynamic scaling: An ipsative procedure using techniques from computer adaptive testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1996 A1 - Berg, S. R. KW - computerized adaptive testing AB - The purpose of this study was to create a prototype method for scaling items using computer adaptive testing techniques and to demonstrate the method with a working model program. The method can be used to scale items, rank individuals with respect to the scaled items, and to re-scale the items with respect to the individuals' responses. When using this prototype method, the items to be scaled are part of a database that contains not only the items, but measures of how individuals respond to each item. After completion of all presented items, the individual is assigned an overall scale value which is then compared with each item responded to, and an individual "error" term is stored with each item. After several individuals have responded to the items, the item error terms are used to revise the placement of the scaled items. This revision feature allows the natural adaptation of one general list to reflect subgroup differences, for example, differences among geographic areas or ethnic groups. It also provides easy revision and limited authoring of the scale items by the computer program administrator. This study addressed the methodology, the instrumentation needed to handle the scale-item administration, data recording, item error analysis, and scale-item database editing required by the method, and the behavior of a prototype vocabulary test in use. Analyses were made of item ordering, response profiles, item stability, reliability and validity. Although slow, the movement of unordered words used as items in the prototype program was accurate as determined by comparison with an expert word ranking. Person scores obtained by multiple administrations of the prototype test were reliable and correlated at.94 with a commercial paper-and-pencil vocabulary test, while holding a three-to-one speed advantage in administration. Although based upon self-report data, dynamic scaling instruments like the model vocabulary test could be very useful for self-assessment, for pre (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 56 ER - TY - CONF T1 - Does cheating on CAT pay: Not T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Gershon, R. C. A1 - Bergstrom, B. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco N1 - ERIC ED 392 844 ER - TY - ABST T1 - DIF analysis for pretest items in computer-adaptive testing (Educational Testing Service Research Rep No RR 94-33) Y1 - 1994 A1 - Zwick, R. A1 - Thayer, D. T. A1 - Wingersky, M. CY - Princeton NJ: Educational Testing Service. N1 - #ZW94-33 ER - TY - ABST T1 - Deriving comparable scores for computer adaptive and conventional tests: An example using the SAT Y1 - 1993 A1 - Eignor, D. R. CY - Princeton NJ: Educational Testing Service N1 - #EI93-55 (Also presented at the 1993 National Council on Measurement in Education meeting in Atlanta GA.) ER - TY - JOUR T1 - The development and evaluation of a computerized adaptive test of tonal memory JF - Journal of Research in Music Education Y1 - 1993 A1 - Vispoel, W. P. VL - 41 ER - TY - JOUR T1 - The development and evaluation of a system for computerized adaptive testing JF - Dissertation Abstracts International Y1 - 1992 A1 - de la Torre Sanchez, R. KW - computerized adaptive testing VL - 52 ER - TY - CHAP T1 - The development of alternative operational concepts Y1 - 1992 A1 - J. R. McBride A1 - Curran, L. T. CY - Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center. ER - TY - CONF T1 - Differential item functioning analysis for computer-adaptive tests and other IRT-scored measures T2 - Paper presented at the annual meeting of the Military Testing Association Y1 - 1992 A1 - Zwick, R. JF - Paper presented at the annual meeting of the Military Testing Association CY - San Diego CA ER - TY - CONF T1 - The development and evaluation of a computerized adaptive testing system T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - De la Torre, R. A1 - Vispoel, W. P. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - ERIC No. ED 338 711) ER - TY - CONF T1 - Development and evaluation of hierarchical testlets in two-stage tests using integer linear programming T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - Lam, T. L. A1 - Goong, Y. Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL ER - TY - CONF T1 - Dichotomous search strategies for computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association. Y1 - 1990 A1 - Xiao, B. JF - Paper presented at the annual meeting of the American Educational Research Association. ER - TY - CHAP T1 - Die Optimierung der Mebgenauikeit beim branched adaptiven Testen [Optimization of measurement precision for branched-adaptive testing Y1 - 1989 A1 - Kubinger, K. D. CY - K. D. Kubinger (Ed.), Moderne Testtheorie Ein Abrib samt neusten Beitrgen [Modern test theory Overview and new issues] (pp. 187-218). Weinhem, Germany: Beltz. ER - TY - CONF T1 - The development and evaluation of a microcomputerized adaptive placement testing system for college mathematics T2 - Paper(s) presented at the annual meeting(s) of the American Educational Research Association Y1 - 1988 A1 - Hsu, T.-C. A1 - Shermis, M. D. JF - Paper(s) presented at the annual meeting(s) of the American Educational Research Association CY - 1986 (San Francisco CA) and 1987 (Washington DC) ER - TY - ABST T1 - Determining the sensitivity of CAT-ASVAB scores to changes in item response curves with the medium of administration (Report No.86-189) Y1 - 1986 A1 - Divgi, D. R. CY - Alexandria VA: Center for Naval Analyses N1 - #DI86-189 ER - TY - ABST T1 - Development of a microcomputer-based adaptive testing system: Phase II Implementation (Research Report ONR 85-5) Y1 - 1985 A1 - Vale, C. D. CY - St. Paul MN: Assessment Systems Corporation ER - TY - CONF T1 - The design of a computerized adaptive testing system for administering the ASVAB T2 - Presentation at the Annual Meeting of the American Educational Research Association Y1 - 1984 A1 - J. R. McBride JF - Presentation at the Annual Meeting of the American Educational Research Association CY - New Orleans, LA ER - TY - CHAP T1 - Design of a Microcomputer-Based Adaptive Testing System Y1 - 1982 A1 - Vale, C. D. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Item Response Theory and Computerized Adaptive Testing Conference (pp. 360-371). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laborat N1 - {PDF file, 697 KB} ER - TY - CONF T1 - Development of a computerized adaptive testing system for enlisted personnel selection T2 - Presented at the Annual Convention of the American Psychological Association Y1 - 1982 A1 - J. R. McBride JF - Presented at the Annual Convention of the American Psychological Association CY - Washington, DC ER - TY - CHAP T1 - Discussion: Adaptive and sequential testing Y1 - 1982 A1 - Reckase, M. D. CY - D. J. Weiss (Ed.). Proceedings of the 1982 Computerized Adaptive Testing Conference (pp. 290-294). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 288 KB} ER - TY - JOUR T1 - Design and implementation of a microcomputer-based adaptive testing system JF - Behavior Research Methods and Instrumentation Y1 - 1981 A1 - Vale, C. D. VL - 13 ER - TY - BOOK T1 - Development and evaluation of an adaptive testing strategy for use in multidimensional interest assessment Y1 - 1980 A1 - Vale, C. D. CY - Unpublished doctoral dissertation, University of Minnesota. Dissertational Abstract International, 42(11-B), 4248-4249 ER - TY - CHAP T1 - Discussion: Session 1 Y1 - 1980 A1 - B. K. Waters CY - D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 51-55). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. N1 - #WA80-01 {PDF file, 283 KB} ER - TY - CHAP T1 - Discussion: Session 3 Y1 - 1980 A1 - Novick, M. R. CY - D. J. Weiss (Ed.), Proceedings of the 1979 Item Response Theory and Computerized Adaptive Testing Conference (pp. 140-143). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laborat N1 - {PDF file, 286 KB} ER - TY - ABST T1 - The danger of relying solely on diagnostic adaptive testing when prior and subsequent instructional methods are different (CERL Report E-5) Y1 - 1979 A1 - Tatsuoka, K. A1 - Birenbaum, M. CY - Urbana IL: Univeristy of Illinois, Computer-Based Education Research Laboratory. N1 - #TA79-01 ER - TY - JOUR T1 - Description of components in tailored testing JF - Behavior Research Methods and Instrumentation Y1 - 1977 A1 - Patience, W. M. VL - 9 ER - TY - CHAP T1 - Discussion Y1 - 1976 A1 - Lord, F. M., CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 113-117). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 318 KB} ER - TY - CHAP T1 - Discussion Y1 - 1976 A1 - Green, B. F. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. pp. 118-119). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 347 KB} ER - TY - CHAP T1 - Discussion Y1 - 1975 A1 - Linn, R. L. CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 44-46. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 414 KB} ER - TY - CHAP T1 - Discussion Y1 - 1975 A1 - Bock, R. D. CY - D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 46-49. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. N1 - {PDF file, 414 KB} ER - TY - ABST T1 - Development of a programmed testing system (Technical Paper 259) Y1 - 1974 A1 - Bayroff, A. G. A1 - Ross, R. M A1 - Fischl, M. A CY - Arlington VA: US Army Research Institute for the Behavioral and Social Sciences. NTIS No. AD A001534) ER - TY - JOUR T1 - The development and evaluation of several programmed testing methods JF - Educational and Psychological Measurement Y1 - 1969 A1 - Linn, R. L. A1 - Cleary, T. A. VL - 29 ER - TY - BOOK T1 - The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction Y1 - 1969 A1 - Ferguson, R. L. CY - Doctoral dissertation, University of Pittsburgh. Dissertation Abstracts International, 30-09A, 3856. (University Microfilms No. 70-4530). ER - TY - ABST T1 - The development and evaluation of several programmed testing methods (Research Bulletin 68-5) Y1 - 1968 A1 - Linn, R. L. A1 - Rock, D. A. A1 - Cleary, T. A. CY - Princeton NJ: Educational Testing Service N1 - #LI68-05 ER -