TY - JOUR T1 - Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank JF - Assessment Y1 - In Press A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Philip Spinhoven A1 - Edwin de Beurs AB - We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. UR - https://doi.org/10.1177/1073191117746742 ER - TY - JOUR T1 - How Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change? JF - Journal of Computerized Adaptive Testing Y1 - 2023 A1 - Ming Him Tai A1 - Allison W. Cooperman A1 - Joseph N. DeWeese A1 - David J. Weiss KW - adaptive measurement of change KW - computerized adaptive testing KW - longitudinal measurement KW - trait change patterns VL - 10 IS - 3 ER - TY - JOUR T1 - Computerized adaptive testing to screen children for emotional and behavioral problems by preventive child healthcare JF - BMC Pediatrics Y1 - 2020 A1 - Theunissen, Meninou H.C. A1 - de Wolff, Marianne S. A1 - Deurloo, Jacqueline A. A1 - Vogels, Anton G. C. AB -
Questionnaires to detect emotional and behavioral problems (EBP) in Preventive Child Healthcare (PCH) should be short which potentially affects validity and reliability. Simulation studies have shown that Computerized Adaptive Testing (CAT) could overcome these weaknesses. We studied the applicability (using the measures participation rate, satisfaction, and efficiency) and the validity of CAT in routine PCH practice.
We analyzed data on 461 children aged 10–11 years (response 41%), who were assessed during routine well-child examinations by PCH professionals. Before the visit, parents completed the CAT and the Child Behavior Checklist (CBCL). Satisfaction was measured by parent- and PCH professional-report. Efficiency of the CAT procedure was measured as number of items needed to assess whether a child has serious problems or not. Its validity was assessed using the CBCL as the criterion.
Parents and PCH professionals rated the CAT on average as good. The procedure required at average 16 items to assess whether a child has serious problems or not. Agreement of scores on the CAT scales with corresponding CBCL scales was high (range of Spearman correlations 0.59–0.72). Area Under Curves (AUC) were high (range: 0.95–0.97) for the Psycat total, externalizing, and hyperactivity scales using corresponding CBCL scale scores as criterion. For the Psycat internalizing scale the AUC was somewhat lower but still high (0.86).
CAT is a valid procedure for the identification of emotional and behavioral problems in children aged 10–11 years. It may support the efficient and accurate identification of children with overall, and potentially also specific, emotional and behavioral problems in routine PCH.
VL - 20 UR - https://bmcpediatr.biomedcentral.com/articles/10.1186/s12887-020-2018-1 IS - Article number: 119 ER - TY - JOUR T1 - Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2020 A1 - Jing Yang A1 - Hua-Hua Chang A1 - Jian Tao A1 - Ningzhong Shi AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback–Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst. VL - 44 UR - https://doi.org/10.1177/0146621619893783 ER - TY - JOUR T1 - Nonparametric CAT for CD in Educational Settings With Small Samples JF - Applied Psychological Measurement Y1 - 2019 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small. VL - 43 UR - https://doi.org/10.1177/0146621618813113 ER - TY - JOUR T1 - Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Dongbo Tu A1 - Yuting Han A1 - Yan Cai A1 - Xuliang Gao AB - Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. VL - 42 UR - https://doi.org/10.1177/0146621618762748 ER - TY - JOUR T1 - Measuring patient-reported outcomes adaptively: Multidimensionality matters! JF - Applied Psychological Measurement Y1 - 2018 A1 - Paap, Muirne C. S. A1 - Kroeze, Karel A. A1 - Glas, C. A. W. A1 - Terwee, C. B. A1 - van der Palen, Job A1 - Veldkamp, Bernard P. ER - TY - JOUR T1 - On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis JF - Journal of Educational Measurement Y1 - 2018 A1 - Liu, Shuchang A1 - Cai, Yan A1 - Tu, Dongbo AB - Abstract This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index method (MPI), the revised maximum priority index (RMPI), the weighted deviation model (WDM), and the two revised Monte Carlo methods (R1-MC, R2-MC). Simulation results showed that on the whole the CD-OMST performs well in that it not only has acceptable attribute pattern correct classification rates but also satisfies both statistical and nonstatistical constraints; the RMPI method was generally better than the MPI method, the R2-MC method was generally better than the R1-MC method, and the two revised Monte Carlo methods performed best in terms of test security and constraint management, whereas the RMPI and WDM methods worked best in terms of measurement precision. The study is not only expected to provide information about how to combine MST and CD using an on-the-fly method and how do these assembled methods in CD-OMST perform relative to each other but also offer guidance for practitioners to assemble modules in CD-OMST with both statistical and nonstatistical constraints. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12194 ER - TY - CONF T1 - Computerized Adaptive Testing for Cognitive Diagnosis in Classroom: A Nonparametric Approach T2 - IACAT 2017 Conference Y1 - 2017 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai KW - CD-CAT KW - non-parametric approach AB -In the past decade, CDMs of educational test performance have received increasing attention among educational researchers (for details, see Fu & Li, 2007, and Rupp, Templin, & Henson, 2010). CDMs of educational test performance decompose the ability domain of a given test into specific skills, called attributes, each of which an examinee may or may not have mastered. The resulting attribute profile documents the individual’s strengths and weaknesses within the ability domain. The Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation (e.g., Cheng & Chang, 2007; Cheng, 2009; Liu, You, Wang, Ding, & Chang, 2013; Tatsuoka & Tatsuoka, 1997). While model-based CD-CAT is relatively well-researched in the context of large-scale assessments, this type of system has not received the same degree of development in small-scale settings, where it would be most useful. The main challenge is that the statistical estimation techniques successfully applied to the parametric CD-CAT require large samples to guarantee the reliable calibration of item parameters and accurate estimation of examinees’ attribute profiles. In response to the challenge, a nonparametric approach that does not require any parameter calibration, and thus can be used in small educational programs, is proposed. The proposed nonparametric CD-CAT relies on the same principle as the regular CAT algorithm, but uses the nonparametric classification method (Chiu & Douglas, 2013) to assess and update the student’s ability state while the test proceeds. Based on a student’s initial responses, 2 a neighborhood of candidate proficiency classes is identified, and items not characteristic of the chosen proficiency classes are precluded from being chosen next. The response to the next item then allows for an update of the skill profile, and the set of possible proficiency classes is further narrowed. In this manner, the nonparametric CD-CAT cycles through item administration and update stages until the most likely proficiency class has been pinpointed. The simulation results show that the proposed method outperformed the compared parametric CD-CAT algorithms and the differences were significant when the item parameter calibration was not optimal.
References
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.
Cheng, Y., & Chang, H. (2007). The modified maximum global discrimination index method for cognitive diagnostic CAT. In D. Weiss (Ed.) Proceedings of the 2007 GMAC Computerized Adaptive Testing Conference.
Chiu, C.-Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250.
Fu, J., & Li, Y. (2007). An integrative review of cognitively diagnostic psychometric models. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, Illinois.
Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172.
Rupp, A. A., & Templin, J. L., & Henson, R. A. (2010). Diagnostic Measurement. Theory, Methods, and Applications. New York: Guilford.
Tatsuoka, K.K., & Tatsuoka, M.M. (1997), Computerized cognitive diagnostic adaptive testing: Effect on remedial instruction as empirical validation. Journal of Educational Measurement, 34, 3–20.
JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Developing a CAT: An Integrated Perspective T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nathan Thompson KW - CAT Development KW - integrated approach AB -Most resources on computerized adaptive testing (CAT) tend to focus on psychometric aspects such as mathematical formulae for item selection or ability estimation. However, development of a CAT assessment requires a holistic view of project management, financials, content development, product launch and branding, and more. This presentation will develop such a holistic view, which serves several purposes, including providing a framework for validity, estimating costs and ROI, and making better decisions regarding the psychometric aspects.
Thompson and Weiss (2011) presented a 5-step model for developing computerized adaptive tests (CATs). This model will be presented and discussed as the core of this holistic framework, then applied to real-life examples. While most CAT research focuses on developing new quantitative algorithms, this presentation is instead intended to help researchers evaluate and select algorithms that are most appropriate for their needs. It is therefore ideal for practitioners that are familiar with the basics of item response theory and CAT, and wish to explore how they might apply these methodologies to improve their assessments.
Steps include:
1. Feasibility, applicability, and planning studies
2. Develop item bank content or utilize existing bank
3. Pretest and calibrate item bank
4. Determine specifications for final CAT
5. Publish live CAT.
So, for example, Step 1 will contain simulation studies which estimate item bank requirements, which then can be used to determine costs of content development, which in turn can be integrated into an estimated project cost timeline. Such information is vital in determining if the CAT should even be developed in the first place.
References
Thompson, N. A., & Weiss, D. J. (2011). A Framework for the Development of Computerized Adaptive Tests. Practical Assessment, Research & Evaluation, 16(1). Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.
JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Jv8bpH2zkw5TqSMi03e5JJJ98QtXf-Cv ER - TY - JOUR T1 - Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank JF - Evaluation & the Health Professions Y1 - 2017 A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Edwin de Beurs AB - We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. VL - 40 UR - https://doi.org/10.1177/0163278716684168 ER - TY - CONF T1 - Evaluation of Parameter Recovery, Drift, and DIF with CAT Data T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nathan Thompson A1 - Jordan Stoeger KW - CAT KW - DIF KW - Parameter Drift KW - Parameter Recovery AB -Parameter drift and differential item functioning (DIF) analyses are frequent components of a test maintenance plan. That is, after a test form(s) is published, organizations will often calibrate postpublishing data at a later date to evaluate whether the performance of the items or the test has changed over time. For example, if item content is leaked, the items might gradually become easier over time, and item statistics or parameters can reflect this.
When tests are published under a computerized adaptive testing (CAT) paradigm, they are nearly always calibrated with item response theory (IRT). IRT calibrations assume that range restriction is not an issue – that is, each item is administered to a range of examinee ability. CAT data violates this assumption. However, some organizations still wish to evaluate continuing performance of the items from a DIF or drift paradigm.
This presentation will evaluate just how inaccurate DIF and drift analyses might be on CAT data, using a Monte Carlo parameter recovery methodology. Known item parameters will be used to generate both linear and CAT data sets, which are then calibrated for DIF and drift. In addition, we will implement Randomesque item exposure constraints in some CAT conditions, as this randomization directly alleviates the range restriction problem somewhat, but it is an empirical question as to whether this improves the parameter recovery calibrations.
JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1F7HCZWD28Q97sCKFIJB0Yps0H66NPeKq ER - TY - JOUR T1 - Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life JF - Quality of Life Research Y1 - 2017 A1 - Paap, Muirne C. S. A1 - Kroeze, Karel A. A1 - Terwee, Caroline B. A1 - van der Palen, Job A1 - Veldkamp, Bernard P. VL - 26 UR - https://doi.org/10.1007/s11136-017-1624-3 ER - TY - CONF T1 - New Challenges (With Solutions) and Innovative Applications of CAT T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chun Wang A1 - David J. Weiss A1 - Xue Zhang A1 - Jian Tao A1 - Yinhong He A1 - Ping Chen A1 - Shiyu Wang A1 - Susu Zhang A1 - Haiyan Lin A1 - Xiaohong Gao A1 - Hua-Hua Chang A1 - Zhuoran Shang KW - CAT KW - challenges KW - innovative applications AB -Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.
JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa ER - TY - CONF T1 - A Simulation Study to Compare Classification Method in Cognitive Diagnosis Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Jing Yang A1 - Jian Tao A1 - Hua-Hua Chang A1 - Ning-Zhong Shi AB -Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models that can be viewed as restricted latent class models have been developed to classify the examinees into the correct profile of skills that have been mastered and those that have not so as to get more efficient remediation. Chiu & Douglas (2013) introduces a nonparametric procedure that only requires specification of Q-matrix to classify by proximity to ideal response pattern. In this article, we compare nonparametric procedure with common profile estimation method like maximum a posterior (MAP) in CD-CAT. Simulation studies consider a variety of Q-matrix structure, the number of attributes, ways to generate attribute profiles, and item quality. Results indicate that nonparametric procedure consistently gets the higher pattern and attribute recovery rate in nearly all conditions.
References
Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. doi: 10.1007/s00357-013-9132-9
JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1jCL3fPZLgzIdwvEk20D-FliZ15OTUtpr ER - TY - JOUR T1 - The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study JF - Quality of Life Research Y1 - 2017 A1 - Barthel, D. A1 - Otto, C. A1 - Nolte, S. A1 - Meyrose, A.-K. A1 - Fischer, F. A1 - Devine, J. A1 - Walter, O. A1 - Mierke, A. A1 - Fischer, K. I. A1 - Thyen, U. A1 - Klein, M. A1 - Ankermann, T. A1 - Rose, M. A1 - Ravens-Sieberer, U. AB - Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. VL - 26 UR - https://doi.org/10.1007/s11136-016-1437-9 ER - TY - JOUR T1 - Stochastic Curtailment in Adaptive Mastery Testing: Improving the Efficiency of Confidence Interval–Based Stopping Rules JF - Applied Psychological Measurement Y1 - 2015 A1 - Sie, Haskell A1 - Finkelman, Matthew D. A1 - Bartroff, Jay A1 - Thompson, Nathan A. AB - A well-known stopping rule in adaptive mastery testing is to terminate the assessment once the examinee’s ability confidence interval lies entirely above or below the cut-off score. This article proposes new procedures that seek to improve such a variable-length stopping rule by coupling it with curtailment and stochastic curtailment. Under the new procedures, test termination can occur earlier if the probability is high enough that the current classification decision remains the same should the test continue. Computation of this probability utilizes normality of an asymptotically equivalent version of the maximum likelihood ability estimate. In two simulation sets, the new procedures showed a substantial reduction in average test length while maintaining similar classification accuracy to the original method. VL - 39 UR - http://apm.sagepub.com/content/39/4/278.abstract ER - TY - JOUR T1 - The Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large-Scale Assessments JF - Educational Measurement: Issues and Practice Y1 - 2013 A1 - Taherbhai, Husein A1 - Seo, Daeryong KW - cohort growth KW - construct-relevant drift KW - evaluation of scale drift KW - philosophical aspects of IRT equating VL - 32 UR - http://dx.doi.org/10.1111/emip.12000 ER - TY - JOUR T1 - Multistage Computerized Adaptive Testing With Uniform Item Exposure JF - Applied Measurement in Education Y1 - 2012 A1 - Edwards, Michael C. A1 - Flora, David B. A1 - Thissen, David VL - 25 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.660363 ER - TY - JOUR T1 - Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy JF - Physical & Occupational Therapy in Pediatrics Y1 - 2011 A1 - Montpetit, K. A1 - Haley, S. A1 - Bilodeau, N. A1 - Ni, P. A1 - Tian, F. A1 - Gorton, G., 3rd A1 - Mulcahey, M. J. AB - This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function. VL - 31 SN - 1541-3144 (Electronic)0194-2638 (Linking) N1 - Montpetit, KathleenHaley, StephenBilodeau, NathalieNi, PengshengTian, FengGorton, George 3rdMulcahey, M JEnglandPhys Occup Ther Pediatr. 2011 Feb;31(1):90-102. Epub 2010 Oct 13. JO - Phys Occup Ther Pediatr ER - TY - JOUR T1 - Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Jacobsen, J. A1 - Ackermann, R. A1 - Egüez, J. A1 - Ganguli, D. A1 - Rickard, P. A1 - Taylor, L. AB -A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.
The present article describes the potential utility of item response theory (IRT) and adaptive testing for scale evaluation and for web-based career assessment. The article describes the principles of both IRT and adaptive testing and then illustrates these with reference to data analyses and simulation studies of the Career Confidence Inventory (CCI). The kinds of information provided by IRT are shown to give a more precise look at scale quality across the trait continuum and also to permit the use of adaptive testing, where the items administered are tailored to the individual being tested. Such tailoring can significantly reduce testing time while maintaining high quality of measurement. This efficiency is especially useful when multiscale inventories and/or a large number of scales are to be administered. Readers are encouraged to consider using these advances in career assessment.
VL - 19 UR - http://jca.sagepub.com/cgi/content/abstract/19/3/274 ER - TY - CHAP T1 - Computerized adaptive testing by mutual information and multiple imputations Y1 - 2009 A1 - Thissen-Roe, A. AB - Over the years, most computerized adaptive testing (CAT) systems have used score estimation procedures from item response theory (IRT). IRT models have salutary properties for score estimation, error reporting, and next-item selection. However, some testing purposes favor scoring approaches outside IRT. Where a criterion metric is readily available and more relevant than the assessed construct, for example in the selection of job applicants, a predictive model might be appropriate (Scarborough & Somers, 2006). In these cases, neither IRT scoring nor a unidimensional assessment structure can be assumed. Yet, the primary benefit of CAT remains desirable: shorter assessments with minimal loss of accuracy due to unasked items. In such a case, it remains possible to create a CAT system that produces an estimated score from a subset of available items, recognizes differential item information given the emerging item response pattern, and optimizes the accuracy of the score estimated at every successive item. The method of multiple imputations (Rubin, 1987) can be used to simulate plausible scores given plausible response patterns to unasked items (Thissen-Roe, 2005). Mutual information can then be calculated in order to select an optimally informative next item (or set of items). Previously observed response patterns to two complete neural network-scored assessments were resampled according to MIMI CAT item selection. The reproduced CAT scores were compared to full-length assessment scores. Approximately 95% accurate assignment of examinees to one of three score categories was achieved with a 70%-80% reduction in median test length. Several algorithmic factors influencing accuracy and computational performance were examined. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 179 KB} ER - TY - CHAP T1 - Guess what? Score differences with rapid replies versus omissions on a computerized adaptive test Y1 - 2009 A1 - Talento-Miller, E. A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 215 KB} ER - TY - JOUR T1 - Item selection in computerized classification testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Thompson, N. A. AB - Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points. VL - 69 SN - 0013-1644 ER - TY - JOUR T1 - Item Selection in Computerized Classification Testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Thompson, Nathan A. AB -Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points.
VL - 69 UR - http://epm.sagepub.com/content/69/5/778.abstract ER - TY - JOUR T1 - Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. A1 - Hambleton, R. K. A1 - Montpetit, K. A1 - Bilodeau, N. A1 - Gorton, G. E. A1 - Watson, K. A1 - Tucker, C. A. KW - *Computer Simulation KW - *Health Status KW - *Models, Statistical KW - Adaptation, Psychological KW - Adolescent KW - Cerebral Palsy/*physiopathology KW - Child KW - Child, Preschool KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Massachusetts KW - Pennsylvania KW - Questionnaires KW - Young Adult AB - PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AHambleton, Ronald KMontpetit, KathleenBilodeau, NathalieGorton, George EWatson, KyleTucker, Carole AK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 Apr;18(3):359-70. Epub 2009 Feb 17. U2 - 2692519 ER - TY - CHAP T1 - The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: CAT software Y1 - 2009 A1 - Thompson, N. A. AB - Development of computerized adaptive tests (CAT) requires a number of appropriate software tools. This paper describes the development of two new CAT software programs. CATSIM has been designed specifically to conduct several different kinds of simulation studies, which are necessary for planning purposes as well as properly designing live CATs. FastCAT is a software system for banking items and publishing CAT tests as standalone files, to be administered anywhere. Both are available for public use. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 283 KB} ER - TY - CHAP T1 - The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: IRT software Y1 - 2009 A1 - Thissen, D. AB - IRTPRO (Item Response Theory for Patient-Reported Outcomes) is an entirely new application for item calibration and test scoring using IRT. IRTPRO implements algorithms for maximum likelihood estimation of item parameters (item calibration) for several unidimensional and multidimensional item response theory (IRT) models for dichotomous and polytomous item responses. In addition, the software provides computation of goodness-of-fit indices, statistics for the diagnosis of local dependence and for the detection of differential item functioning (DIF), and IRT scaled scores. This paper illustrates the use, and some capabilities, of the software. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF File, 817 K ER - TY - JOUR T1 - Replenishing a computerized adaptive test of patient-reported daily activity functioning JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Jette, A. M. A1 - Tao, W. A1 - Moed, R. A1 - Meyers, D. A1 - Ludlow, L. H. KW - *Activities of Daily Living KW - *Disability Evaluation KW - *Questionnaires KW - *User-Computer Interface KW - Adult KW - Aged KW - Cohort Studies KW - Computer-Assisted Instruction KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods AB - PURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengJette, Alan MTao, WeiMoed, RichardMeyers, DougLudlow, Larry HK02 HD45354-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 May;18(4):461-71. Epub 2009 Mar 14. ER - TY - CHAP T1 - Utilizing the generalized likelihood ratio as a termination criterion Y1 - 2009 A1 - Thompson, N. A. AB - Computer-based testing can be used to classify examinees into mutually exclusive groups. Currently, the predominant psychometric algorithm for designing computerized classification tests (CCTs) is the sequential probability ratio test (SPRT; Reckase, 1983) based on item response theory (IRT). The SPRT has been shown to be more efficient than confidence intervals around θ estimates as a method for CCT delivery (Spray & Reckase, 1996; Rudner, 2002). More recently, it was demonstrated that the SPRT, which only uses fixed values, is less efficient than a generalized form which tests whether a given examinee’s θ is below θ1or above θ2 (Thompson, 2007). This formulation allows the indifference region to vary based on observed data. Moreover, this composite hypothesis formulation better represents the conceptual purpose of the test, which is to test whether θ is above or below the cutscore. The purpose of this study was to explore the specifications of the new generalized likelihood ratio (GLR; Huang, 2004). As with the SPRT, the efficiency of the procedure depends on the nominal error rates and the distance between θ1 and θ2 (Eggen, 1999). This study utilized a monte-carlo approach, with 10,000 examinees simulated under each condition, to evaluate differences in efficiency and accuracy due to hypothesis structure, nominal error rate, and indifference region size. The GLR was always at least as efficient as the fixed-point SPRT while maintaining equivalent levels of accuracy. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 194 KB} ER - TY - JOUR T1 - CAT-MD: Computerized adaptive testing on mobile devices JF - International Journal of Web-Based Learning and Teaching Technologies Y1 - 2008 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 3 ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Haley, S. M. A1 - Gandek, B. A1 - Siebens, H. A1 - Black-Schaffer, R. M. A1 - Sinclair, S. J. A1 - Tao, W. A1 - Coster, W. J. A1 - Ni, P. A1 - Jette, A. M. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Haley, Stephen MGandek, BarbaraSiebens, HilaryBlack-Schaffer, Randie MSinclair, Samuel JTao, WeiCoster, Wendy JNi, PengshengJette, Alan MK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01/HD/NICHD NIH HHS/United StatesR01 HD043568/HD/NICHD NIH HHS/United StatesR01 HD043568-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Feb;89(2):275-83. U2 - 2666330 ER - TY - CONF T1 - Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing T2 - Joint Meeting on Adolescent Treatment Effectiveness Y1 - 2008 A1 - Dennis, M. L. A1 - Funk, R. A1 - Titus, J. A1 - Riley, B. B. A1 - Hosman, S. A1 - Kinne, S. JF - Joint Meeting on Adolescent Treatment Effectiveness CY - Washington D.C., USA N1 - ProCite field[6]: Paper presented at the ER - TY - JOUR T1 - The D-optimality item selection criterion in the early stage of CAT: A study with the graded response model JF - Journal of Educational and Behavioral Statistics Y1 - 2008 A1 - Passos, V. L. A1 - Berger, M. P. F. A1 - Tan, F. E. S. KW - computerized adaptive testing KW - D optimality KW - item selection AB - During the early stage of computerized adaptive testing (CAT), item selection criteria based on Fisher’s information often produce less stable latent trait estimates than the Kullback-Leibler global information criterion. Robustness against early stage instability has been reported for the D-optimality criterion in a polytomous CAT with the Nominal Response Model and is shown herein to be reproducible for the Graded Response Model. For comparative purposes, the A-optimality and the global information criteria are also applied. Their item selection is investigated as a function of test progression and item bank composition. The results indicate how the selection of specific item parameters underlies the criteria performances evaluated via accuracy and precision of estimation. In addition, the criteria item exposure rates are compared, without the use of any exposure controlling measure. On the account of stability, precision, accuracy, numerical simplicity, and less evidently, item exposure rate, the D-optimality criterion can be recommended for CAT. VL - 33 ER - TY - JOUR T1 - An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain JF - BMC Musculoskelet Disorders Y1 - 2008 A1 - Elhan, A. H. A1 - Oztuna, D. A1 - Kutlay, S. A1 - Kucukdeveci, A. A. A1 - Tennant, A. AB - ABSTRACT: BACKGROUND: Recent approaches to outcome measurement involving Computerized Adaptive Testing (CAT) offer an approach for measuring disability in low back pain (LBP) in a way that can reduce the burden upon patient and professional. The aim of this study was to explore the potential of CAT in LBP for measuring disability as defined in the International Classification of Functioning, Disability and Health (ICF) which includes impairments, activity limitation, and participation restriction. METHODS: 266 patients with low back pain answered questions from a range of widely used questionnaires. An exploratory factor analysis (EFA) was used to identify disability dimensions which were then subjected to Rasch analysis. Reliability was tested by internal consistency and person separation index (PSI). Discriminant validity of disability levels were evaluated by Spearman correlation coefficient (r), intraclass correlation coefficient [ICC(2,1)] and the Bland-Altman approach. A CAT was developed for each dimension, and the results checked against simulated and real applications from a further 133 patients. RESULTS: Factor analytic techniques identified two dimensions named "body functions" and "activity-participation". After deletion of some items for failure to fit the Rasch model, the remaining items were mostly free of Differential Item Functioning (DIF) for age and gender. Reliability exceeded 0.90 for both dimensions. The disability levels generated using all items and those obtained from the real CAT application were highly correlated (i.e. >0.97 for both dimensions). On average, 19 and 14 items were needed to estimate the precise disability levels using the initial CAT for the first and second dimension. However, a marginal increase in the standard error of the estimate across successive iterations substantially reduced the number of items required to make an estimate. CONCLUSIONS: Using a combination approach of EFA and Rasch analysis this study has shown that it is possible to calibrate items onto a single metric in a way that can be used to provide the basis of a CAT application. Thus there is an opportunity to obtain a wide variety of information to evaluate the biopsychosocial model in its more complex forms, without necessarily increasing the burden of information collection for patients. VL - 9 SN - 1471-2474 (Electronic) N1 - Journal articleBMC musculoskeletal disordersBMC Musculoskelet Disord. 2008 Dec 18;9(1):166. ER - TY - BOOK T1 - A comparison of two methods of polytomous computerized classification testing for multiple cutscores Y1 - 2007 A1 - Thompson, N. A. CY - Unpublished doctoral dissertation, University of Minnesota N1 - {PDF file, 363 KB} ER - TY - Generic T1 - Computerized classification testing with composite hypotheses T2 - GMAC Conference on Computerized Adaptive Testing Y1 - 2007 A1 - Thompson, N. A. A1 - Ro, S. KW - computerized adaptive testing JF - GMAC Conference on Computerized Adaptive Testing PB - Graduate Management Admissions Council CY - St. Paul, MN N1 - Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved [date] from www. psych. umn. edu/psylabs/CATCentral ER - TY - CHAP T1 - Computerized classification testing with composite hypotheses Y1 - 2007 A1 - Thompson, N. A. A1 - Ro, S. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 96 KB} ER - TY - CONF T1 - Cutscore location and classification accuracy in computerized classification testing T2 - Paper presented at the international meeting of the Psychometric Society Y1 - 2007 A1 - Ro, S. A1 - Thompson, N. A. JF - Paper presented at the international meeting of the Psychometric Society CY - Tokyo, Japan N1 - {PDF file, 94 KB} ER - TY - JOUR T1 - The design and evaluation of a computerized adaptive test on mobile devices JF - Computers & Education Y1 - 2007 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 49. ER - TY - JOUR T1 - Developing tailored instruments: item banking and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Bjorner, J. B. A1 - Chang, C-H. A1 - Thissen, D. A1 - Reeve, B. B. KW - *Health Status KW - *Health Status Indicators KW - *Mental Health KW - *Outcome Assessment (Health Care) KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Algorithms KW - Factor Analysis, Statistical KW - Humans KW - Models, Statistical KW - Psychometrics AB - Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. VL - 16 SN - 0962-9343 (Print) N1 - Bjorner, Jakob BueChang, Chih-HungThissen, DavidReeve, Bryce B1R43NS047763-01/NS/United States NINDSAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:95-108. Epub 2007 Feb 15. ER - TY - CHAP T1 - Exploring potential designs for multi-form structure computerized adaptive tests with uniform item exposure Y1 - 2007 A1 - Edwards, M. C. A1 - Thissen, D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 295 KB} ER - TY - CHAP T1 - Investigating CAT designs to achieve comparability with a paper test Y1 - 2007 A1 - Thompson, T. A1 - Way, W. D. CY - In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 141 KB} ER - TY - JOUR T1 - IRT health outcomes data analysis project: an overview and summary JF - Quality of Life Research Y1 - 2007 A1 - Cook, K. F. A1 - Teal, C. R. A1 - Bjorner, J. B. A1 - Cella, D. A1 - Chang, C-H. A1 - Crane, P. K. A1 - Gibbons, L. E. A1 - Hays, R. D. A1 - McHorney, C. A. A1 - Ocepek-Welikson, K. A1 - Raczek, A. E. A1 - Teresi, J. A. A1 - Reeve, B. B. KW - *Data Interpretation, Statistical KW - *Health Status KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Female KW - HIV Infections/psychology KW - Humans KW - Male KW - Neoplasms/psychology KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Stress, Psychological AB - BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. VL - 16 SN - 0962-9343 (Print) N1 - Cook, Karon FTeal, Cayla RBjorner, Jakob BCella, DavidChang, Chih-HungCrane, Paul KGibbons, Laura EHays, Ron DMcHorney, Colleen AOcepek-Welikson, KatjaRaczek, Anastasia ETeresi, Jeanne AReeve, Bryce B1U01AR52171-01/AR/United States NIAMSR01 (CA60068)/CA/United States NCIY1-PC-3028-01/PC/United States NCIResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:121-32. Epub 2007 Mar 10. ER - TY - CONF T1 - Item selection in computerized classification testing T2 - Paper presented at the Conference on High Stakes Testing Y1 - 2007 A1 - Thompson, N. A. JF - Paper presented at the Conference on High Stakes Testing CY - University of Nebraska N1 - {PDF file, 87KB} ER - TY - JOUR T1 - Methodological issues for building item banks and computerized adaptive scales JF - Quality of Life Research Y1 - 2007 A1 - Thissen, D. A1 - Reeve, B. B. A1 - Bjorner, J. B. A1 - Chang, C-H. AB - Abstract This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research. VL - 16 SN - 0962-93431573-2649 ER - TY - JOUR T1 - A Practitioner’s Guide for Variable-length Computerized Classification Testing JF - Practical Assessment Research and Evaluation Y1 - 2007 A1 - Thompson, N. A. VL - 12 IS - 1 ER - TY - Generic T1 - A practitioner's guide to variable-length computerized classification testing Y1 - 2007 A1 - Thompson, N. A. KW - CAT KW - classification KW - computer adaptive testing KW - computerized adaptive testing KW - Computerized classification testing AB - Variable-length computerized classification tests, CCTs, (Lin & Spray, 2000; Thompson, 2006) are a powerful and efficient approach to testing for the purpose of classifying examinees into groups. CCTs are designed by the specification of at least five technical components: psychometric model, calibrated item bank, starting point, item selection algorithm, and termination criterion. Several options exist for each of these CCT components, creating a myriad of possible designs. Confusion among designs is exacerbated by the lack of a standardized nomenclature. This article outlines the components of a CCT, common options for each component, and the interaction of options for different components, so that practitioners may more efficiently design CCTs. It also offers a suggestion of nomenclature. JF - Practical Assessment, Research and Evaluation VL - 12 ER - TY - JOUR T1 - Prospective evaluation of the am-pac-cat in outpatient rehabilitation settings JF - Physical Therapy Y1 - 2007 A1 - Jette, A., A1 - Haley, S. A1 - Tao, W. A1 - Ni, P. A1 - Moed, R. A1 - Meyers, D. A1 - Zurek, M. VL - 87 ER - TY - JOUR T1 - Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) JF - Medical Care Y1 - 2007 A1 - Reeve, B. B. A1 - Hays, R. D. A1 - Bjorner, J. B. A1 - Cook, K. F. A1 - Crane, P. K. A1 - Teresi, J. A. A1 - Thissen, D. A1 - Revicki, D. A. A1 - Weiss, D. J. A1 - Hambleton, R. K. A1 - Liu, H. A1 - Gershon, R. C. A1 - Reise, S. P. A1 - Lai, J. S. A1 - Cella, D. KW - *Health Status KW - *Information Systems KW - *Quality of Life KW - *Self Disclosure KW - Adolescent KW - Adult KW - Aged KW - Calibration KW - Databases as Topic KW - Evaluation Studies as Topic KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Questionnaires/standards KW - United States AB - BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. VL - 45 SN - 0025-7079 (Print) N1 - Reeve, Bryce BHays, Ron DBjorner, Jakob BCook, Karon FCrane, Paul KTeresi, Jeanne AThissen, DavidRevicki, Dennis AWeiss, David JHambleton, Ronald KLiu, HonghuGershon, RichardReise, Steven PLai, Jin-sheiCella, DavidPROMIS Cooperative GroupAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralUnited StatesMedical careMed Care. 2007 May;45(5 Suppl 1):S22-31. ER - TY - JOUR T1 - A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005 JF - Journal of Technology,Learning, and Assessment, Y1 - 2007 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. AB - Since researchers acknowledged the several advantages of computerized adaptive testing (CAT) over traditional linear test administration, the issue of item exposure control has received increased attention. Due to CAT’s underlying philosophy, particular items in the item pool may be presented too often and become overexposed, while other items are rarely selected by the CAT algorithm and thus become underexposed. Several item exposure control strategies have been presented in the literature aiming to prevent overexposure of some items and to increase the use rate of rarely or never selected items. This paper reviews such strategies that appeared in the relevant literature from 1983 to 2005. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item exposure control strategies for dichotomous scoring, polytomous scoring and testlet-based CAT systems. In addition, the paper discusses the strengths and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed with an overall objective of providing researchers of this field, especially newcomers, a wide view of item exposure control strategies. VL - 5(8) N1 - http://www.jtla.org. {PDF file, 326 KB} ER - TY - JOUR T1 - Test design optimization in CAT early stage with the nominal response model JF - Applied Psychological Measurement Y1 - 2007 A1 - Passos, V. L. A1 - Berger, M. P. F. A1 - Tan, F. E. KW - computerized adaptive testing KW - nominal response model KW - robust performance KW - test design optimization AB - The early stage of computerized adaptive testing (CAT) refers to the phase of the trait estimation during the administration of only a few items. This phase can be characterized by bias and instability of estimation. In this study, an item selection criterion is introduced in an attempt to lessen this instability: the D-optimality criterion. A polytomous unconstrained CAT simulation is carried out to evaluate this criterion's performance under different test premises. The simulation shows that the extent of early stage instability depends primarily on the quality of the item pool information and its size and secondarily on the item selection criteria. The efficiency of the D-optimality criterion is similar to the efficiency of other known item selection criteria. Yet, it often yields estimates that, at the beginning of CAT, display a more robust performance against instability. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 31 SN - 0146-6216 (Print) ER - TY - JOUR T1 - The comparison among item selection strategies of CAT with multiple-choice items JF - Acta Psychologica Sinica Y1 - 2006 A1 - Hai-qi, D. A1 - De-zhi, C. A1 - Shuliang, D. A1 - Taiping, D. KW - CAT KW - computerized adaptive testing KW - graded response model KW - item selection strategies KW - multiple choice items AB - The initial purpose of comparing item selection strategies for CAT was to increase the efficiency of tests. As studies continued, however, it was found that increasing the efficiency of item bank using was also an important goal of comparing item selection strategies. These two goals often conflicted. The key solution was to find a strategy with which both goals could be accomplished. The item selection strategies for graded response model in this study included: the average of the difficulty orders matching with the ability; the medium of the difficulty orders matching with the ability; maximum information; A stratified (average); and A stratified (medium). The evaluation indexes used for comparison included: the bias of ability estimates for the true; the standard error of ability estimates; the average items which the examinees have administered; the standard deviation of the frequency of items selected; and sum of the indices weighted. Using the Monte Carlo simulation method, we obtained some data and computer iterated the data 20 times each under the conditions that the item difficulty parameters followed the normal distribution and even distribution. The results were as follows; The results indicated that no matter difficulty parameters followed the normal distribution or even distribution. Every type of item selection strategies designed in this research had its strong and weak points. In general evaluation, under the condition that items were stratified appropriately, A stratified (medium) (ASM) had the best effect. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Science Press: China VL - 38 SN - 0439-755X (Print) ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2006 A1 - Haley, S. M. A1 - Siebens, H. A1 - Coster, W. J. A1 - Tao, W. A1 - Black-Schaffer, R. M. A1 - Gandek, B. A1 - Sinclair, S. J. A1 - Ni, P. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. VL - 87 SN - 0003-9993 (Print) N1 - Haley, Stephen MSiebens, HilaryCoster, Wendy JTao, WeiBlack-Schaffer, Randie MGandek, BarbaraSinclair, Samuel JNi, PengshengK0245354-01/phsR01 hd043568/hd/nichdResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2006 Aug;87(8):1033-42. ER - TY - JOUR T1 - Evaluation parameters for computer adaptive testing JF - British Journal of Educational Technology Y1 - 2006 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. VL - Vol. 37 IS - No 2 ER - TY - JOUR T1 - Overview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications JF - Medical Care Y1 - 2006 A1 - Teresi, J. A. KW - *Cross-Cultural Comparison KW - Data Interpretation, Statistical KW - Factor Analysis, Statistical KW - Guidelines as Topic KW - Humans KW - Models, Statistical KW - Psychometrics/*methods KW - Statistics as Topic/*methods KW - Statistics, Nonparametric AB - BACKGROUND: Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Discussed are parameters and statistics that are not invariant and cannot be compared validly in crosscultural studies with varying distributions of disability in contrast to those that can be compared (if the model assumptions are met) because they are produced by models such as linear and nonlinear regression. OBJECTIVES: The purpose of this overview is to provide an integrated approach to the quantitative methods used in this special issue to examine measurement equivalence. The methods include classical test theory (CTT), factor analytic, and parametric and nonparametric approaches to DIF detection. Also included in the quantitative section is a discussion of item banking and computerized adaptive testing (CAT). METHODS: Factorial invariance and the articles discussing this topic are introduced. A brief overview of the DIF methods presented in the quantitative section of the special issue is provided together with a discussion of ways in which DIF analyses and examination of invariance using factor models may be complementary. CONCLUSIONS: Although factor analytic and DIF detection methods share features, they provide unique information and can be viewed as complementary in informing about measurement equivalence. VL - 44 SN - 0025-7079 (Print)0025-7079 (Linking) N1 - Teresi, Jeanne AAG15294/AG/NIA NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tReviewUnited StatesMedical careMed Care. 2006 Nov;44(11 Suppl 3):S39-49. ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Cook, K. F. A1 - Mioduski, J. E. A1 - Teal, C. R. A1 - Crane, P. K. KW - computerized adaptive testing KW - Flexilevel Scale of Shoulder Function KW - Item Response Theory KW - Rehabilitation AB -Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability.