TY - JOUR T1 - How Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change? JF - Journal of Computerized Adaptive Testing Y1 - 2023 A1 - Ming Him Tai A1 - Allison W. Cooperman A1 - Joseph N. DeWeese A1 - David J. Weiss KW - adaptive measurement of change KW - computerized adaptive testing KW - longitudinal measurement KW - trait change patterns VL - 10 IS - 3 ER - TY - JOUR T1 - A Dynamic Stratification Method for Improving Trait Estimation in Computerized Adaptive Testing Under Item Exposure Control JF - Applied Psychological Measurement Y1 - 2020 A1 - Jyun-Hong Chen A1 - Hsiu-Yi Chao A1 - Shu-Ying Chen AB - When computerized adaptive testing (CAT) is under stringent item exposure control, the precision of trait estimation will substantially decrease. A new item selection method, the dynamic Stratification method based on Dominance Curves (SDC), which is aimed at improving trait estimation, is proposed to mitigate this problem. The objective function of the SDC in item selection is to maximize the sum of test information for all examinees rather than maximizing item information for individual examinees at a single-item administration, as in conventional CAT. To achieve this objective, the SDC uses dominance curves to stratify an item pool into strata with the number being equal to the test length to precisely and accurately increase the quality of the administered items as the test progresses, reducing the likelihood that a high-discrimination item will be administered to an examinee whose ability is not close to the item difficulty. Furthermore, the SDC incorporates a dynamic process for on-the-fly item–stratum adjustment to optimize the use of quality items. Simulation studies were conducted to investigate the performance of the SDC in CAT under item exposure control at different levels of severity. According to the results, the SDC can efficiently improve trait estimation in CAT through greater precision and more accurate trait estimation than those generated by other methods (e.g., the maximum Fisher information method) in most conditions. VL - 44 UR - https://doi.org/10.1177/0146621619843820 ER - TY - JOUR T1 - Item Calibration Methods With Multiple Subscale Multistage Testing JF - Journal of Educational Measurement Y1 - 2020 A1 - Wang, Chun A1 - Chen, Ping A1 - Jiang, Shengyu KW - EM KW - marginal maximum likelihood KW - missing data KW - multistage testing AB - Abstract Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait (θ) estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to calibrate items using the incomplete data from MST design. Further complication arises when there are multiple correlated subscales per test, and when items from different subscales need to be calibrated according to their respective score reporting metric. The current calibration-per-subscale method produced biased item parameters, and there is no available method for resolving the challenge. Deriving from the missing data principle, we showed when calibrating all items together the Rubin's ignorability assumption is satisfied such that the traditional single-group calibration is sufficient. When calibrating items per subscale, we proposed a simple modification to the current calibration-per-subscale method that helps reinstate the missing-at-random assumption and therefore corrects for the estimation bias that is otherwise existent. Three mainstream calibration methods are discussed in the context of MST, they are the marginal maximum likelihood estimation, the expectation maximization method, and the fixed parameter calibration. An extensive simulation study is conducted and a real data example from NAEP is analyzed to provide convincing empirical evidence. VL - 57 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12241 ER - TY - JOUR T1 - Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items JF - Journal of Educational Measurement Y1 - 2020 A1 - Chen, Chia-Wen A1 - Wang, Wen-Chung A1 - Chiu, Ming Ming A1 - Ro, Sage AB - Abstract The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within-person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson-Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision. VL - 57 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12252 ER - TY - JOUR T1 - New Efficient and Practicable Adaptive Designs for Calibrating Items Online JF - Applied Psychological Measurement Y1 - 2020 A1 - Yinhong He A1 - Ping Chen A1 - Yong Li AB - When calibrating new items online, it is practicable to first compare all new items according to some criterion and then assign the most suitable one to the current examinee who reaches a seeding location. The modified D-optimal design proposed by van der Linden and Ren (denoted as D-VR design) works within this practicable framework with the aim of directly optimizing the estimation of item parameters. However, the optimal design point for a given new item should be obtained by comparing all examinees in a static examinee pool. Thus, D-VR design still has room for improvement in calibration efficiency from the view of traditional optimal design. To this end, this article incorporates the idea of traditional optimal design into D-VR design and proposes a new online calibration design criterion, namely, excellence degree (ED) criterion. Four different schemes are developed to measure the information provided by the current examinee when implementing this new criterion, and four new ED designs equipped with them are put forward accordingly. Simulation studies were conducted under a variety of conditions to compare the D-VR design and the four proposed ED designs in terms of calibration efficiency. Results showed that the four ED designs outperformed D-VR design in almost all simulation conditions. VL - 44 UR - https://doi.org/10.1177/0146621618824854 ER - TY - JOUR T1 - Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2020 A1 - Jing Yang A1 - Hua-Hua Chang A1 - Jian Tao A1 - Ningzhong Shi AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback–Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst. VL - 44 UR - https://doi.org/10.1177/0146621619893783 ER - TY - JOUR T1 - Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation JF - Journal of Educational Measurement Y1 - 2019 A1 - Albano, Anthony D. A1 - Cai, Liuhan A1 - Lease, Erin M. A1 - McConnell, Scott R. AB - Abstract Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20-item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed. VL - 56 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12215 ER - TY - JOUR T1 - Imputation Methods to Deal With Missing Responses in Computerized Adaptive Multistage Testing JF - Educational and Psychological Measurement Y1 - 2019 A1 - Dee Duygu Cetin-Berber A1 - Halil Ibrahim Sari A1 - Anne Corinne Huggins-Manley AB - Routing examinees to modules based on their ability level is a very important aspect in computerized adaptive multistage testing. However, the presence of missing responses may complicate estimation of examinee ability, which may result in misrouting of individuals. Therefore, missing responses should be handled carefully. This study investigated multiple missing data methods in computerized adaptive multistage testing, including two imputation techniques, the use of full information maximum likelihood and the use of scoring missing data as incorrect. These methods were examined under the missing completely at random, missing at random, and missing not at random frameworks, as well as other testing conditions. Comparisons were made to baseline conditions where no missing data were present. The results showed that imputation and the full information maximum likelihood methods outperformed incorrect scoring methods in terms of average bias, average root mean square error, and correlation between estimated and true thetas. VL - 79 UR - https://doi.org/10.1177/0013164418805532 ER - TY - JOUR T1 - Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2019 A1 - Chuan-Ju Lin A1 - Hua-Hua Chang AB - For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. VL - 79 UR - https://doi.org/10.1177/0013164418790634 ER - TY - JOUR T1 - Nonparametric CAT for CD in Educational Settings With Small Samples JF - Applied Psychological Measurement Y1 - 2019 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small. VL - 43 UR - https://doi.org/10.1177/0146621618813113 ER - TY - JOUR T1 - Evaluation of a New Method for Providing Full Review Opportunities in Computerized Adaptive Testing—Computerized Adaptive Testing With Salt JF - Journal of Educational Measurement Y1 - 2018 A1 - Cui, Zhongmin A1 - Liu, Chunyan A1 - He, Yong A1 - Chen, Hanwei AB - Abstract Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees’ abilities. The practice of item review in CAT, however, is hindered by the potential danger of test-manipulation strategies. To provide review opportunities to examinees while minimizing the effect of test-manipulation strategies, researchers have proposed different algorithms to implement CAT with restricted revision options. In this article, we propose and evaluate a new method that implements CAT without any restriction on item review. In particular, we evaluate the new method in terms of the accuracy on ability estimates and the robustness against test-manipulation strategies. This study shows that the newly proposed method is promising in a win-win situation: examinees have full freedom to review and change answers, and the impacts of test-manipulation strategies are undermined. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12193 ER - TY - JOUR T1 - Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2018 A1 - Chuan-Ju Lin A1 - Hua-Hua Chang AB - For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. UR - https://doi.org/10.1177/0013164418790634 ER - TY - JOUR T1 - Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Dongbo Tu A1 - Yuting Han A1 - Yan Cai A1 - Xuliang Gao AB - Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. VL - 42 UR - https://doi.org/10.1177/0146621618762748 ER - TY - JOUR T1 - Latent Class Analysis of Recurrent Events in Problem-Solving Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Haochen Xu A1 - Guanhua Fang A1 - Yunxiao Chen A1 - Jingchen Liu A1 - Zhiliang Ying AB - Computer-based assessment of complex problem-solving abilities is becoming more and more popular. In such an assessment, the entire problem-solving process of an examinee is recorded, providing detailed information about the individual, such as behavioral patterns, speed, and learning trajectory. The problem-solving processes are recorded in a computer log file which is a time-stamped documentation of events related to task completion. As opposed to cross-sectional response data from traditional tests, process data in log files are massive and irregularly structured, calling for effective exploratory data analysis methods. Motivated by a specific complex problem-solving item “Climate Control” in the 2012 Programme for International Student Assessment, the authors propose a latent class analysis approach to analyzing the events occurred in the problem-solving processes. The exploratory latent class analysis yields meaningful latent classes. Simulation studies are conducted to evaluate the proposed approach. VL - 42 UR - https://doi.org/10.1177/0146621617748325 ER - TY - JOUR T1 - On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis JF - Journal of Educational Measurement Y1 - 2018 A1 - Liu, Shuchang A1 - Cai, Yan A1 - Tu, Dongbo AB - Abstract This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index method (MPI), the revised maximum priority index (RMPI), the weighted deviation model (WDM), and the two revised Monte Carlo methods (R1-MC, R2-MC). Simulation results showed that on the whole the CD-OMST performs well in that it not only has acceptable attribute pattern correct classification rates but also satisfies both statistical and nonstatistical constraints; the RMPI method was generally better than the MPI method, the R2-MC method was generally better than the R1-MC method, and the two revised Monte Carlo methods performed best in terms of test security and constraint management, whereas the RMPI and WDM methods worked best in terms of measurement precision. The study is not only expected to provide information about how to combine MST and CD using an on-the-fly method and how do these assembled methods in CD-OMST perform relative to each other but also offer guidance for practitioners to assemble modules in CD-OMST with both statistical and nonstatistical constraints. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12194 ER - TY - CONF T1 - Bayesian Perspectives on Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Wim J. van der Linden A1 - Bingnan Jiang A1 - Hao Ren A1 - Seung W. Choi A1 - Qi Diao KW - Bayesian Perspective KW - CAT AB -

Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.

The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.

Session Video 1

Session Video 2

 

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Computerized Adaptive Testing for Cognitive Diagnosis in Classroom: A Nonparametric Approach T2 - IACAT 2017 Conference Y1 - 2017 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai KW - CD-CAT KW - non-parametric approach AB -

In the past decade, CDMs of educational test performance have received increasing attention among educational researchers (for details, see Fu & Li, 2007, and Rupp, Templin, & Henson, 2010). CDMs of educational test performance decompose the ability domain of a given test into specific skills, called attributes, each of which an examinee may or may not have mastered. The resulting attribute profile documents the individual’s strengths and weaknesses within the ability domain. The Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation (e.g., Cheng & Chang, 2007; Cheng, 2009; Liu, You, Wang, Ding, & Chang, 2013; Tatsuoka & Tatsuoka, 1997). While model-based CD-CAT is relatively well-researched in the context of large-scale assessments, this type of system has not received the same degree of development in small-scale settings, where it would be most useful. The main challenge is that the statistical estimation techniques successfully applied to the parametric CD-CAT require large samples to guarantee the reliable calibration of item parameters and accurate estimation of examinees’ attribute profiles. In response to the challenge, a nonparametric approach that does not require any parameter calibration, and thus can be used in small educational programs, is proposed. The proposed nonparametric CD-CAT relies on the same principle as the regular CAT algorithm, but uses the nonparametric classification method (Chiu & Douglas, 2013) to assess and update the student’s ability state while the test proceeds. Based on a student’s initial responses, 2 a neighborhood of candidate proficiency classes is identified, and items not characteristic of the chosen proficiency classes are precluded from being chosen next. The response to the next item then allows for an update of the skill profile, and the set of possible proficiency classes is further narrowed. In this manner, the nonparametric CD-CAT cycles through item administration and update stages until the most likely proficiency class has been pinpointed. The simulation results show that the proposed method outperformed the compared parametric CD-CAT algorithms and the differences were significant when the item parameter calibration was not optimal.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.

Cheng, Y., & Chang, H. (2007). The modified maximum global discrimination index method for cognitive diagnostic CAT. In D. Weiss (Ed.) Proceedings of the 2007 GMAC Computerized Adaptive Testing Conference.

Chiu, C.-Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250.

Fu, J., & Li, Y. (2007). An integrative review of cognitively diagnostic psychometric models. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, Illinois.

Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172.

Rupp, A. A., & Templin, J. L., & Henson, R. A. (2010). Diagnostic Measurement. Theory, Methods, and Applications. New York: Guilford.

Tatsuoka, K.K., & Tatsuoka, M.M. (1997), Computerized cognitive diagnostic adaptive testing: Effect on remedial instruction as empirical validation. Journal of Educational Measurement, 34, 3–20.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Considerations in Performance Evaluations of Computerized Formative Assessments T2 - IACAT 2017 Conference Y1 - 2017 A1 - Michael Chajewski A1 - John Harnisher KW - algebra KW - Formative Assessment KW - Performance Evaluations AB -

Computerized adaptive instruments have been widely established and used in the context of summative assessments for purposes including licensure, admissions and proficiency testing. The benefits of examinee tailored examinations, which can provide estimates of performance that are more reliable and valid, have in recent years attracted a greater audience (i.e. patient oriented outcomes, test prep, etc.). Formative assessment, which are most widely understood in their implementation as diagnostic tools, have recently started to expand to lesser known areas of computerized testing such as in implementations of instructional designs aiming to maximize examinee learning through targeted practice.

Using a CAT instrument within the framework of evaluating repetitious examinee performances (in such settings as a Quiz Bank practices for example) poses unique challenges not germane to summative assessments. The scale on which item parameters (and subsequently examinee performance estimates such as Maximum Likelihood Estimates) are determined usually do not take change over time under consideration. While vertical scaling features resolve the learning acquisition problem, most content practice engines do not make use of explicit practice windows which could be vertically aligned. Alternatively, the Multidimensional (MIRT)- and Hierarchical Item Response Theory (HIRT) models allow for the specification of random effects associated with change over time in examinees’ skills, but are often complex and require content and usage resources not often observed.

The research submitted for consideration simulated examinees’ repeated variable length Quiz Bank practice in algebra using a 500 1-PL operational item pool. The stability simulations sought to determine with which rolling item interval size ability estimates would provide the most informative insight into the examinees’ learning progression over time. Estimates were evaluated in terms of reduction in estimate uncertainty, bias and RMSD with the true and total item based ability estimates. It was found that rolling item intervals between 20-25 items provided the best reduction of uncertainty around the estimate without compromising the ability to provide informed performance estimates to students. However, while asymptotically intervals of 20-25 items tended to provide adequate estimates of performance, changes over shorter periods of time assessed with shorter quizzes could not be detected as those changes would be suppressed in lieu of the performance based on the full interval considered. Implications for infrastructure (such as recommendation engines, etc.), product and scale development are discussed.

Session video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - JOUR T1 - The Development of MST Test Information for the Prediction of Test Performances JF - Educational and Psychological Measurement Y1 - 2017 A1 - Ryoungsun Park A1 - Jiseon Kim A1 - Hyewon Chung A1 - Barbara G. Dodd AB - The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance. VL - 77 UR - http://dx.doi.org/10.1177/0013164416662960 ER - TY - JOUR T1 - Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing JF - Journal of Educational Measurement Y1 - 2017 A1 - Kang, Hyeon-Ah A1 - Zhang, Susu A1 - Chang, Hua-Hua AB - The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen-Shannon (JS) divergence, a symmetrized version of the Kullback-Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. VL - 54 UR - http://dx.doi.org/10.1111/jedm.12139 ER - TY - CONF T1 - An Imputation Approach to Handling Incomplete Computerized Tests T2 - IACAT 2017 Conference Y1 - 2017 A1 - Troy Chen A1 - Chi-Yu Huang A1 - Chunyan Liu KW - CAT KW - imputation approach KW - incomplete computerized test AB -

As technology advances, computerized adaptive testing (CAT) is becoming increasingly popular as it allows tests to be tailored to an examinee’s ability.  Nevertheless, examinees might devise testing strategies to use CAT to their advantage.  For instance, if only the items that examinees answer count towards their score, then a higher theta score might be obtained by spending more time on items at the beginning of the test and skipping items at the end if time runs out. This type of gaming can be discouraged if examinees’ scores are lowered or “penalized” based on the amount of non-response.

The goal of this study was to devise a penalty function that would meet two criteria: 1) the greater the omit rate, the greater the penalty, and 2) examinees with the same ability and the same omit rate should receive the same penalty. To create the penalty, theta was calculated based on only the items the examinee responded to ( ).  Next, the expected number correct score (EXR) was obtained using  and the test characteristic curve. A penalized expected number correct score (E ) was obtained by multiplying EXR by the proportion of items the examinee responded to. Finally, the penalized theta ( ) was identified using the test characteristic curve. Based on   and the item parameters ( ) of an unanswered item, the likelihood of a correct response,  , is computed and employed to estimate the imputed score ( ) for the unanswered item.

Two datasets were used to generate tests with completion rates of 50%, 80%, and 90%.  The first dataset included real data where approximately 4,500 examinees responded to a 21 -item test which provided a baseline/truth. Sampling was done to achieve the three completion rate conditions. The second dataset consisted of simulated item scores for 50,000 simulees under a 1-2-4 multi-stage CAT design where each stage contained seven items. Imputed item scores for unanswered items were computed using a variety of values for G (and therefore T).  Three other approaches to handling unanswered items were also considered: all correct (i.e., T = 0), all incorrect (i.e., T = 1), and random scoring (i.e., T = 0.5).

The current study investigated the impact on theta estimates resulting from the proposed approach to handling unanswered items in a fixed-length CAT. In real testing situations, when examinees do not finish a test, it is hard to tell whether they tried diligently but ran out of time or whether they attempted to manipulate the scoring engine.  To handle unfinished tests with penalties, the proposed approach considers examinees’ abilities and incompletion rates. The results of this study provide direction for psychometric practitioners when considering penalties for omitted responses.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1vznZeO3nsZZK0k6_oyw5c9ZTP8uyGnXh ER - TY - CONF T1 - Item Parameter Drifting and Online Calibration T2 - IACAT 2017 Conference Y1 - 2017 A1 - Hua-Hua Chang A1 - Rui Guo KW - online calibration KW - Parameter Drift AB -

Item calibration is a part of the most important topics in item response theory (IRT). Since many largescale testing programs have switched from paper and pencil (P&P) testing mode to computerized adaptive testing (CAT) mode, developing methods for efficiently calibrating new items have become vital. Among many proposed item calibration processes in CAT, online calibration is the most cost-effective. This presentation introduces an online (re)calibration design to detect item parameter drift for computerized adaptive testing (CAT) in both unidimensional and multidimensional environments. Specifically, for online calibration optimal design in unidimensional computerized adaptive testing model, a two-stage design is proposed by implementing a proportional density index algorithm. For a multidimensional computerized adaptive testing model, a four-quadrant online calibration pretest item selection design with proportional density index algorithm is proposed. Comparisons were made between different online calibration item selection strategies. Results showed that under unidimensional computerized adaptive testing, the proposed modified two-stage item selection criterion with the proportional density algorithm outperformed the other existing methods in terms of item parameter calibration and item parameter drift detection, and under multidimensional computerized adaptive testing, the online (re)calibration technique with the proposed four-quadrant item selection design with proportional density index outperformed other methods.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Item Selection Strategies for Developing CAT in Indonesia T2 - IACAT 2017 Conference Y1 - 2017 A1 - Istiani Chandra KW - CAT KW - Indonesia KW - item selection strategies AB -

Recently, development of computerized testing in Indonesia is quiet promising for the future. Many government institutions used the technology for recruitment. Starting from Indonesian Army acknowledged the benefits of computerized adaptive testing (CAT) over conventional test administration, ones of the issues of selection the first item have taken place of attention. Due to CAT’s basic philosophy, several methods can be used to select the first item such as educational level, ability estimation from item simulation, or other methods. In this case, the question is remains how apply the methods most effective in the context of constrained adaptive testing. This paper reviews such strategies that appeared in the relevant literature. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item selection strategies for dichotomous scoring. In this paper, also discusses the strength and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed in term of learning in the newcomer context, a wide view of first item selection strategies.

 

JF - IACAT 2017 Conference PB - Niiagata Seiryo University CY - Niigata Japan UR - https://www.youtube.com/watch?v=2KuFrRATq9Q ER - TY - CONF T1 - A Large-Scale Progress Monitoring Application with Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Okan Bulut A1 - Damien Cormier KW - CAT KW - Large-Scale tests KW - Process monitoring AB -

Many conventional assessment tools are available to teachers in schools for monitoring student progress in a formative manner. The outcomes of these assessment tools are essential to teachers’ instructional modifications and schools’ data-driven educational strategies, such as using remedial activities and planning instructional interventions for students with learning difficulties. When measuring student progress toward instructional goals or outcomes, assessments should be not only considerably precise but also sensitive to individual change in learning. Unlike conventional paper-pencil assessments that are usually not appropriate for every student, computerized adaptive tests (CATs) are highly capable of estimating growth consistently with minimum and consistent error. Therefore, CATs can be used as a progress monitoring tool in measuring student growth.

This study focuses on an operational CAT assessment that has been used for measuring student growth in reading during the academic school year. The sample of this study consists of nearly 7 million students from the 1st grade to the 12th grade in the US. The students received a CAT-based reading assessment periodically during the school year. The purpose of these periodical assessments is to measure the growth in students’ reading achievement and identify the students who may need additional instructional support (e.g., academic interventions). Using real data, this study aims to address the following research questions: (1) How many CAT administrations are necessary to make psychometrically sound decisions about the need for instructional changes in the classroom or when to provide academic interventions?; (2) What is the ideal amount of time between CAT administrations to capture student growth for the purpose of producing meaningful decisions from assessment results?

To address these research questions, we first used the Theil-Sen estimator for robustly fitting a regression line to each student’s test scores obtained from a series of CAT administrations. Next, we used the conditional standard error of measurement (cSEM) from the CAT administrations to create an error band around the Theil-Sen slope (i.e., student growth rate). This process resulted in the normative slope values across all the grade levels. The optimal number of CAT administrations was established from grade-level regression results. The amount of time needed for progress monitoring was determined by calculating the amount of time required for a student to show growth beyond the median cSEM value for each grade level. The results showed that the normative slope values were the highest for lower grades and declined steadily as grade level increased. The results also suggested that the CAT-based reading assessment is most useful for grades 1 through 4, since most struggling readers requiring an intervention appear to be within this grade range. Because CAT yielded very similar cSEM values across administrations, the amount of error in the progress monitoring decisions did not seem to depend on the number of CAT administrations.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1uGbCKenRLnqTxImX1fZicR2c7GRV6Udc ER - TY - CONF T1 - MHK-MST Design and the Related Simulation Study T2 - IACAT 2017 Conference Y1 - 2017 A1 - Ling Yuyu A1 - Zhou Chenglin A1 - Ren Jie KW - language testing KW - MHK KW - multistage testing AB -

The MHK is a national standardized exam that tests and rates Chinese language proficiency. It assesses non-native Chinese minorities’ abilities in using the Chinese language in their daily, academic and professional lives; Computerized multistage adaptive testing (MST) is a combination of conventional paper-and-pencil (P&P) and item level computerized adaptive test (CAT), it is a kind of test forms based on computerized technology, take the item set as the scoring unit. It can be said that, MST estimate the Ability extreme value more accurate than conventional paper-and-pencil (P&P), also used the CAT auto-adapted characteristic to reduce the examination length and the score time of report. At present, MST has used in some large test, like Uniform CPA Examination and Graduate Record Examination(GRE). Therefore, it is necessary to develop the MST of application in China.

Based on consideration of the MHK characteristics and its future development, the researchers start with design of MHK-MST. This simulation study is conducted to validate the performance of the MHK -MST system. Real difficulty parameters of MHK items and the simulated ability parameters of the candidates are used to generate the original score matrix and the item modules are delivered to the candidates following the adaptive procedures set according to the path rules. This simulation study provides a sound basis for the implementation of MHK-MST.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - New Challenges (With Solutions) and Innovative Applications of CAT T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chun Wang A1 - David J. Weiss A1 - Xue Zhang A1 - Jian Tao A1 - Yinhong He A1 - Ping Chen A1 - Shiyu Wang A1 - Susu Zhang A1 - Haiyan Lin A1 - Xiaohong Gao A1 - Hua-Hua Chang A1 - Zhuoran Shang KW - CAT KW - challenges KW - innovative applications AB -

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa ER - TY - CONF T1 - A New Cognitive Diagnostic Computerized Adaptive Testing for Simultaneously Diagnosing Skills and Misconceptions T2 - IACAT 2017 Conference Y1 - 2017 A1 - Bor-Chen Kuo A1 - Chun-Hua Chen KW - CD-CAT KW - Misconceptions KW - Simultaneous diagnosis AB -

In education diagnoses, diagnosing misconceptions is important as well as diagnosing skills. However, traditional cognitive diagnostic computerized adaptive testing (CD-CAT) is usually developed to diagnose skills. This study aims to propose a new CD-CAT that can simultaneously diagnose skills and misconceptions. The proposed CD-CAT is based on a recently published new CDM, called the simultaneously identifying skills and misconceptions (SISM) model (Kuo, Chen, & de la Torre, in press). A new item selection algorithm is also proposed in the proposed CD-CAT for achieving high adaptive testing performance. In simulation studies, we compare our new item selection algorithm with three existing item selection methods, including the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) proposed by Cheng (2009) and the modified PWKL (MPWKL) proposed by Kaplan, de la Torre, and Barrada (2015). The results show that our proposed CD-CAT can efficiently diagnose skills and misconceptions; the accuracy of our new item selection algorithm is close to the MPWKL but less computational burden; and our new item selection algorithm outperforms the KL and PWKL methods on diagnosing skills and misconceptions.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. doi: 10.1007/s11336-009-9123-2

Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. doi:10.1177/0146621614554650

Kuo, B.-C., Chen, C.-H., & de la Torre, J. (in press). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - New Results on Bias in Estimates due to Discontinue Rules in Intelligence Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Matthias von Davier A1 - Youngmi Cho A1 - Tianshu Pan KW - Bias KW - CAT KW - Intelligence Testing AB -

The presentation provides new results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty, and the presentation of items is adaptive in the sense that each subtest session is discontinued once a test taker produces a certain number of incorrect responses in sequence. The subsequent (not observed) responses are commonly scored as wrong for that subtest, even though the test taker has not seen these. Discontinuation rules allow a certain form of adaptiveness both in paper-based and computerbased testing, and help reducing testing time.

Two lines of research that are relevant are studies that directly assess the impact of discontinuation rules, and studies that more broadly look at the impact of scoring rules on test results with a large number of not administered or not reached items. He & Wolf (2012) compared different ability estimation methods for this type of discontinuation rule adaptation of test length in a simulation study. However, to our knowledge there has been no rigorous analytical study of the underlying distributional changes of the response variables under discontinuation rules. It is important to point out that the results obtained by He & Wolf (2012) agree with results presented by, for example, DeAyala, Plake & Impara (2001) as well as Rose, von Davier & Xu (2010) and Rose, von Davier & Nagengast (2016) in that ability estimates are biased most when scoring the not observed responses as wrong. Discontinuation rules combined with scoring the non-administered items as wrong is used operationally in several major intelligence tests, so more research is needed in order to improve this particular type of adaptiveness in the testing practice.

The presentation extends existing research on adaptiveness by discontinue-rules in intelligence tests in multiple ways: First, a rigorous analytical study of the distributional properties of discontinue-rule scored items is presented. Second, an extended simulation is presented that includes additional alternative scoring rules as well as bias-corrected ability estimators that may be suitable to improve results for discontinue-rule scored intelligence tests.

References: DeAyala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38, 213-234.

He, W. & Wolfe, E. W. (2012). Treatment of Not-Administered Items on Individually Administered Intelligence Tests. Educational and Psychological Measurement, Vol 72, Issue 5, pp. 808 – 826. DOI: 10.1177/0013164412441937

Rose, N., von Davier, M., & Xu, X. (2010). Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11). Princeton, NJ: Educational Testing Service.

Rose, N., von Davier, M., & Nagengast, B. (2016) Modeling omitted and not-reached items in irt models. Psychometrika. doi:10.1007/s11336-016-9544-7

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Scripted On-the-fly Multistage Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Edison Choe A1 - Bruce Williams A1 - Sung-Hyuck Lee KW - CAT KW - multistage testing KW - On-the-fly testing AB -

On-the-fly multistage testing (OMST) was introduced recently as a promising alternative to preassembled MST. A decidedly appealing feature of both is the reviewability of items within the current stage. However, the fundamental difference is that, instead of routing to a preassembled module, OMST adaptively assembles a module at each stage according to an interim ability estimate. This produces more individualized forms with finer measurement precision, but imposing nonstatistical constraints and controlling item exposure become more cumbersome. One recommendation is to use the maximum priority index followed by a remediation step to satisfy content constraints, and the Sympson-Hetter method with a stratified item bank for exposure control.

However, these methods can be computationally expensive, thereby impeding practical implementation. Therefore, this study investigated the script method as a simpler solution to the challenge of strict content balancing and effective item exposure control in OMST. The script method was originally devised as an item selection algorithm for CAT and generally proceeds as follows: For a test with m items, there are m slots to be filled, and an item is selected according to pre-defined rules for each slot. For the first slot, randomly select an item from a designated content area (collection). For each subsequent slot, 1) Discard any enemies of items already administered in previous slots; 2) Draw a designated number of candidate items (selection length) from the designated collection according to the current ability estimate; 3) Randomly select one item from the set of candidates. There are two distinct features of the script method. First, a predetermined sequence of collections guarantees meeting content specifications. The specific ordering may be determined either randomly or deliberately by content experts. Second, steps 2 and 3 depict a method of exposure control, in which selection length balances item usage at the possible expense of ability estimation accuracy. The adaptation of the script method to OMST is straightforward. For the first module, randomly select each item from a designated collection. For each subsequent module, the process is the same as in scripted CAT (SCAT) except the same ability estimate is used for the selection of all items within the module. A series of simulations was conducted to evaluate the performance of scripted OMST (SOMST, with 3 or 4 evenly divided stages) relative to SCAT under various item exposure restrictions. In all conditions, reliability was maximized by programming an optimization algorithm that searches for the smallest possible selection length for each slot within the constraints. Preliminary results indicated that SOMST is certainly a capable design with performance comparable to that of SCAT. The encouraging findings and ease of implementation highly motivate the prospect of operational use for large-scale assessments.

Presentation Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1wKuAstITLXo6BM4APf2mPsth1BymNl-y ER - TY - CONF T1 - A Simulation Study to Compare Classification Method in Cognitive Diagnosis Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Jing Yang A1 - Jian Tao A1 - Hua-Hua Chang A1 - Ning-Zhong Shi AB -

Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models that can be viewed as restricted latent class models have been developed to classify the examinees into the correct profile of skills that have been mastered and those that have not so as to get more efficient remediation. Chiu & Douglas (2013) introduces a nonparametric procedure that only requires specification of Q-matrix to classify by proximity to ideal response pattern. In this article, we compare nonparametric procedure with common profile estimation method like maximum a posterior (MAP) in CD-CAT. Simulation studies consider a variety of Q-matrix structure, the number of attributes, ways to generate attribute profiles, and item quality. Results indicate that nonparametric procedure consistently gets the higher pattern and attribute recovery rate in nearly all conditions.

References

Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. doi: 10.1007/s00357-013-9132-9

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1jCL3fPZLgzIdwvEk20D-FliZ15OTUtpr ER - TY - CONF T1 - Using Bayesian Decision Theory in Cognitive Diagnosis Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chia-Ling Hsu A1 - Wen-Chung Wang A1 - ShuYing Chen KW - Bayesian Decision Theory KW - CD-CAT AB -

Cognitive diagnosis computerized adaptive testing (CD-CAT) purports to provide each individual a profile about the strengths and weaknesses of attributes or skills with computerized adaptive testing. In the CD-CAT literature, researchers dedicated to evolving item selection algorithms to improve measurement efficiency, and most algorithms were developed based on information theory. By the discontinuous nature of the latent variables in CD-CAT, this study introduced an alternative for item selection, called the minimum expected cost (MEC) method, which was derived based on Bayesian decision theory. Using simulations, the MEC method was evaluated against the posterior weighted Kullback-Leibler (PWKL) information, the modified PWKL (MPWKL), and the mutual information (MI) methods by manipulating item bank quality, item selection algorithm, and termination rule. Results indicated that, regardless of item quality and termination criterion, the MEC, MPWKL, and MI methods performed very similarly and they all outperformed the PWKL method in classification accuracy and test efficiency, especially in short tests; the MEC method had more efficient item bank usage than the MPWKL and MI methods. Moreover, the MEC method could consider the costs of incorrect decisions and improve classification accuracy and test efficiency when a particular profile was of concern. All the results suggest the practicability of the MEC method in CD-CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata Japan ER - TY - JOUR T1 - Bayesian Networks in Educational Assessment: The State of the Field JF - Applied Psychological Measurement Y1 - 2016 A1 - Culbertson, Michael J. AB - Bayesian networks (BN) provide a convenient and intuitive framework for specifying complex joint probability distributions and are thus well suited for modeling content domains of educational assessments at a diagnostic level. BN have been used extensively in the artificial intelligence community as student models for intelligent tutoring systems (ITS) but have received less attention among psychometricians. This critical review outlines the existing research on BN in educational assessment, providing an introduction to the ITS literature for the psychometric community, and points out several promising research paths. The online appendix lists 40 assessment systems that serve as empirical examples of the use of BN for educational assessment in a variety of domains. VL - 40 UR - http://apm.sagepub.com/content/40/1/3.abstract ER - TY - JOUR T1 - Effect of Imprecise Parameter Estimation on Ability Estimation in a Multistage Test in an Automatic Item Generation Context JF - Journal of Computerized Adaptive Testing Y1 - 2016 A1 - Colvin, Kimberly A1 - Keller, Lisa A A1 - Robin, Frederic KW - Adaptive Testing KW - automatic item generation KW - errors in item parameters KW - item clones KW - multistage testing VL - 4 UR - http://iacat.org/jcat/index.php/jcat/article/view/59/27 IS - 1 ER - TY - JOUR T1 - High-Efficiency Response Distribution–Based Item Selection Algorithms for Short-Length Cognitive Diagnostic Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2016 A1 - Zheng, Chanjin A1 - Chang, Hua-Hua AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to obtain useful diagnostic information with great efficiency brought by CAT technology. Most of the existing CD-CAT item selection algorithms are evaluated when test length is fixed and relatively long, but some applications of CD-CAT, such as in interim assessment, require to obtain the cognitive pattern with a short test. The mutual information (MI) algorithm proposed by Wang is the first endeavor to accommodate this need. To reduce the computational burden, Wang provided a simplified scheme, but at the price of scale/sign change in the original index. As a result, it is very difficult to combine it with some popular constraint management methods. The current study proposes two high-efficiency algorithms, posterior-weighted cognitive diagnostic model (CDM) discrimination index (PWCDI) and posterior-weighted attribute-level CDM discrimination index (PWACDI), by modifying the CDM discrimination index. They can be considered as an extension of the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) methods. A pre-calculation strategy has also been developed to address the computational issue. Simulation studies indicate that the newly developed methods can produce results comparable with or better than the MI and PWKL in both short and long tests. The other major advantage is that the computational issue has been addressed more elegantly than MI. PWCDI and PWACDI can run as fast as PWKL. More importantly, they do not suffer from the problem of scale/sign change as MI and, thus, can be used with constraint management methods together in a straightforward manner. VL - 40 UR - http://apm.sagepub.com/content/40/8/608.abstract ER - TY - JOUR T1 - Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design JF - Journal of Educational Measurement Y1 - 2016 A1 - Wang, Shiyu A1 - Lin, Haiyan A1 - Chang, Hua-Hua A1 - Douglas, Jeff AB - Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing.  Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range. VL - 53 UR - http://dx.doi.org/10.1111/jedm.12100 ER - TY - JOUR T1 - Optimal Reassembly of Shadow Tests in CAT JF - Applied Psychological Measurement Y1 - 2016 A1 - Choi, Seung W. A1 - Moellering, Karin T. A1 - Li, Jie A1 - van der Linden, Wim J. AB - Even in the age of abundant and fast computing resources, concurrency requirements for large-scale online testing programs still put an uninterrupted delivery of computer-adaptive tests at risk. In this study, to increase the concurrency for operational programs that use the shadow-test approach to adaptive testing, we explored various strategies aiming for reducing the number of reassembled shadow tests without compromising the measurement quality. Strategies requiring fixed intervals between reassemblies, a certain minimal change in the interim ability estimate since the last assembly before triggering a reassembly, and a hybrid of the two strategies yielded substantial reductions in the number of reassemblies without degradation in the measurement accuracy. The strategies effectively prevented unnecessary reassemblies due to adapting to the noise in the early test stages. They also highlighted the practicality of the shadow-test approach by minimizing the computational load involved in its use of mixed-integer programming. VL - 40 UR - http://apm.sagepub.com/content/40/7/469.abstract ER - TY - JOUR T1 - Parameter Drift Detection in Multidimensional Computerized Adaptive Testing Based on Informational Distance/Divergence Measures JF - Applied Psychological Measurement Y1 - 2016 A1 - Kang, Hyeon-Ah A1 - Chang, Hua-Hua AB - An informational distance/divergence-based approach is proposed to detect the presence of parameter drift in multidimensional computerized adaptive testing (MCAT). The study presents significance testing procedures for identifying changes in multidimensional item response functions (MIRFs) over time based on informational distance/divergence measures that capture the discrepancy between two probability functions. To approximate the MIRFs from the observed response data, the k-nearest neighbors algorithm is used with the random search method. A simulation study suggests that the distance/divergence-based drift measures perform effectively in identifying the instances of parameter drift in MCAT. They showed moderate power with small samples of 500 examinees and excellent power when the sample size was as large as 1,000. The proposed drift measures also adequately controlled for Type I error at the nominal level under the null hypothesis. VL - 40 UR - http://apm.sagepub.com/content/40/7/534.abstract ER - TY - JOUR T1 - Assessing Individual-Level Impact of Interruptions During Online Testing JF - Journal of Educational Measurement Y1 - 2015 A1 - Sinharay, Sandip A1 - Wan, Ping A1 - Choi, Seung W. A1 - Kim, Dong-In AB - With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as Hill and Sinharay et al. examined the impact of interruptions at an aggregate level. However, there is a lack of research on the assessment of impact of interruptions at an individual level. We attempt to fill that void. We suggest four methodological approaches, primarily based on statistical hypothesis testing, linear regression, and item response theory, which can provide evidence on the individual-level impact of interruptions. We perform a realistic simulation study to compare the Type I error rate and power of the suggested approaches. We then apply the approaches to data from the 2013 Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions. VL - 52 UR - http://dx.doi.org/10.1111/jedm.12064 ER - TY - JOUR T1 - a-Stratified Computerized Adaptive Testing in the Presence of Calibration Error JF - Educational and Psychological Measurement Y1 - 2015 A1 - Cheng, Ying A1 - Patton, Jeffrey M. A1 - Shao, Can AB - a-Stratified computerized adaptive testing with b-blocking (AST), as an alternative to the widely used maximum Fisher information (MFI) item selection method, can effectively balance item pool usage while providing accurate latent trait estimates in computerized adaptive testing (CAT). However, previous comparisons of these methods have treated item parameter estimates as if they are the true population parameter values. Consequently, capitalization on chance may occur. In this article, we examined the performance of the AST method under more realistic conditions where item parameter estimates instead of true parameter values are used in the CAT. Its performance was compared against that of the MFI method when the latter is used in conjunction with Sympson–Hetter or randomesque exposure control. Results indicate that the MFI method, even when combined with exposure control, is susceptible to capitalization on chance. This is particularly true when the calibration sample size is small. On the other hand, AST is more robust to capitalization on chance. Consistent with previous investigations using true item parameter values, AST yields much more balanced item pool usage, with a small loss in the precision of latent trait estimates. The loss is negligible when the test is as long as 40 items. VL - 75 UR - http://epm.sagepub.com/content/75/2/260.abstract ER - TY - JOUR T1 - The Effect of Upper and Lower Asymptotes of IRT Models on Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Cheng, Ying A1 - Liu, Cheng AB - In this article, the effect of the upper and lower asymptotes in item response theory models on computerized adaptive testing is shown analytically. This is done by deriving the step size between adjacent latent trait estimates under the four-parameter logistic model (4PLM) and two models it subsumes, the usual three-parameter logistic model (3PLM) and the 3PLM with upper asymptote (3PLMU). The authors show analytically that the large effect of the discrimination parameter on the step size holds true for the 4PLM and the two models it subsumes under both the maximum information method and the b-matching method for item selection. Furthermore, the lower asymptote helps reduce the positive bias of ability estimates associated with early guessing, and the upper asymptote helps reduce the negative bias induced by early slipping. Relative step size between modeling versus not modeling the upper or lower asymptote under the maximum Fisher information method (MI) and the b-matching method is also derived. It is also shown analytically why the gain from early guessing is smaller than the loss from early slipping when the lower asymptote is modeled, and vice versa when the upper asymptote is modeled. The benefit to loss ratio is quantified under both the MI and the b-matching method. Implications of the analytical results are discussed. VL - 39 UR - http://apm.sagepub.com/content/39/7/551.abstract ER - TY - JOUR T1 - Online Item Calibration for Q-Matrix in CD-CAT JF - Applied Psychological Measurement Y1 - 2015 A1 - Chen, Yunxiao A1 - Liu, Jingchen A1 - Ying, Zhiliang AB -

Item replenishment is important for maintaining a large-scale item bank. In this article, the authors consider calibrating new items based on pre-calibrated operational items under the deterministic inputs, noisy-and-gate model, the specification of which includes the so-called -matrix, as well as the slipping and guessing parameters. Making use of the maximum likelihood and Bayesian estimators for the latent knowledge states, the authors propose two methods for the calibration. These methods are applicable to both traditional paper–pencil–based tests, for which the selection of operational items is prefixed, and computerized adaptive tests, for which the selection of operational items is sequential and random. Extensive simulations are done to assess and to compare the performance of these approaches. Extensions to other diagnostic classification models are also discussed.

VL - 39 UR - http://apm.sagepub.com/content/39/1/5.abstract ER - TY - JOUR T1 - On-the-Fly Assembled Multistage Adaptive Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Zheng, Yi A1 - Chang, Hua-Hua AB -

Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing (CAT), the rapidly growing MST alleviates several major problems of earlier CAT applications. Nevertheless, MST is only one among all possible solutions to these problems. This article presents a new adaptive testing design, “on-the-fly assembled multistage adaptive testing” (OMST), which combines the benefits of CAT and MST and offsets their limitations. Moreover, OMST also provides some unique advantages over both CAT and MST. A simulation study was conducted to compare OMST with MST and CAT, and the results demonstrated the promising features of OMST. Finally, the “Discussion” section provides suggestions on possible future adaptive testing designs based on the OMST framework, which could provide great flexibility for adaptive tests in the digital future and open an avenue for all types of hybrid designs based on the different needs of specific tests.

VL - 39 UR - http://apm.sagepub.com/content/39/2/104.abstract ER - TY - JOUR T1 - Cognitive Diagnostic Models and Computerized Adaptive Testing: Two New Item-Selection Methods That Incorporate Response Times JF - Journal of Computerized Adaptive Testing Y1 - 2014 A1 - Finkelman, M. D. A1 - Kim, W. A1 - Weissman, A. A1 - Cook, R.J. VL - 2 UR - http://www.iacat.org/jcat/index.php/jcat/article/view/43/21 IS - 4 ER - TY - JOUR T1 - Computerized Adaptive Testing for the Random Weights Linear Logistic Test Model JF - Applied Psychological Measurement Y1 - 2014 A1 - Crabbe, Marjolein A1 - Vandebroek, Martina AB -

This article discusses four-item selection rules to design efficient individualized tests for the random weights linear logistic test model (RWLLTM): minimum posterior-weighted -error minimum expected posterior-weighted -error maximum expected Kullback–Leibler divergence between subsequent posteriors (KLP), and maximum mutual information (MUI). The RWLLTM decomposes test items into a set of subtasks or cognitive features and assumes individual-specific effects of the features on the difficulty of the items. The model extends and improves the well-known linear logistic test model in which feature effects are only estimated at the aggregate level. Simulations show that the efficiencies of the designs obtained with the different criteria appear to be equivalent. However, KLP and MUI are given preference over and due to their lesser complexity, which significantly reduces the computational burden.

VL - 38 UR - http://apm.sagepub.com/content/38/6/415.abstract ER - TY - JOUR T1 - Determining the Overall Impact of Interruptions During Online Testing JF - Journal of Educational Measurement Y1 - 2014 A1 - Sinharay, Sandip A1 - Wan, Ping A1 - Whitaker, Mike A1 - Kim, Dong-In A1 - Zhang, Litong A1 - Choi, Seung W. AB -

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees’ scores. There is a lack of research on this topic due to the novelty of the problem. This article is an attempt to fill that void. Several methods, primarily based on propensity score matching, linear regression, and item response theory, were suggested to determine the overall impact of the interruptions on the examinees’ scores. A realistic simulation study shows that the suggested methods have satisfactory Type I error rate and power. Then the methods were applied to data from the Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions in 2013. The results indicate that the interruptions did not have a significant overall impact on the student scores for the ISTEP+ test.

VL - 51 UR - http://dx.doi.org/10.1111/jedm.12052 ER - TY - JOUR T1 - An Enhanced Approach to Combine Item Response Theory With Cognitive Diagnosis in Adaptive Testing JF - Journal of Educational Measurement Y1 - 2014 A1 - Wang, Chun A1 - Zheng, Chanjin A1 - Chang, Hua-Hua AB -

Computerized adaptive testing offers the possibility of gaining information on both the overall ability and cognitive profile in a single assessment administration. Some algorithms aiming for these dual purposes have been proposed, including the shadow test approach, the dual information method (DIM), and the constraint weighted method. The current study proposed two new methods, aggregate ranked information index (ARI) and aggregate standardized information index (ASI), which appropriately addressed the noncompatibility issue inherent in the original DIM method. More flexible weighting schemes that put different emphasis on information about general ability (i.e., θ in item response theory) and information about cognitive profile (i.e., α in cognitive diagnostic modeling) were also explored. Two simulation studies were carried out to investigate the effectiveness of the new methods and weighting schemes. Results showed that the new methods with the flexible weighting schemes could produce more accurate estimation of both overall ability and cognitive profile than the original DIM. Among them, the ASI with both empirical and theoretical weights is recommended, and attribute-level weighting scheme is preferred if some attributes are considered more important from a substantive perspective.

VL - 51 UR - http://dx.doi.org/10.1111/jedm.12057 ER - TY - JOUR T1 - Enhancing Pool Utilization in Constructing the Multistage Test Using Mixed-Format Tests JF - Applied Psychological Measurement Y1 - 2014 A1 - Park, Ryoungsun A1 - Kim, Jiseon A1 - Chung, Hyewon A1 - Dodd, Barbara G. AB -

This study investigated a new pool utilization method of constructing multistage tests (MST) using the mixed-format test based on the generalized partial credit model (GPCM). MST simulations of a classification test were performed to evaluate the MST design. A linear programming (LP) model was applied to perform MST reassemblies based on the initial MST construction. Three subsequent MST reassemblies were performed. For each reassembly, three test unit replacement ratios (TRRs; 0.22, 0.44, and 0.66) were investigated. The conditions of the three passing rates (30%, 50%, and 70%) were also considered in the classification testing. The results demonstrated that various MST reassembly conditions increased the overall pool utilization rates, while maintaining the desired MST construction. All MST testing conditions performed equally well in terms of the precision of the classification decision.

VL - 38 UR - http://apm.sagepub.com/content/38/4/268.abstract ER - TY - JOUR T1 - General Test Overlap Control: Improved Algorithm for CAT and CCT JF - Applied Psychological Measurement Y1 - 2014 A1 - Chen, Shu-Ying A1 - Lei, Pui-Wa A1 - Chen, Jyun-Hong A1 - Liu, Tzu-Chen AB -

This article proposed a new online test overlap control algorithm that is an improvement of Chen’s algorithm in controlling general test overlap rate for item pooling among a group of examinees. Chen’s algorithm is not very efficient in that not only item pooling between current examinee and prior examinees is controlled for but also item pooling between previous examinees, which would have been controlled for when they were current examinees. The proposed improvement increases efficiency by only considering item pooling between current and previous examinees, and its improved performance over Chen is demonstrated in a simulated computerized adaptive testing (CAT) environment. Moreover, the proposed algorithm is adapted for computerized classification testing (CCT) using the sequential probability ratio test procedure and is evaluated against some existing exposure control procedures. The proposed algorithm appears to work best in controlling general test overlap rate among the exposure control procedures examined without sacrificing much classification precision, though longer tests might be required for more stringent control of item pooling among larger groups. Given the capability of the proposed algorithm in controlling item pooling among a group of examinees of any size and its ease of implementation, it appears to be a good test overlap control method.

VL - 38 UR - http://apm.sagepub.com/content/38/3/229.abstract ER - TY - JOUR T1 - A Numerical Investigation of the Recovery of Point Patterns With Minimal Information JF - Applied Psychological Measurement Y1 - 2014 A1 - Cox, M. A. A. AB -

A method has been proposed (Tsogo et al. 2001) in order to reconstruct the geometrical configuration of a large point set using minimal information. This paper employs numerical examples to investigate the proposed procedure. The suggested method has two great advantages. It reduces the volume of the data collection exercise and eases the computational effort involved in analyzing the data. It is suggested, however, that the method while possibly providing a useful starting point for a solution, does not provide a universal panacea.

VL - 38 UR - http://apm.sagepub.com/content/38/4/329.abstract ER - TY - JOUR T1 - Deriving Stopping Rules for Multidimensional Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Wang, Chun A1 - Chang, Hua-Hua A1 - Boughton, Keith A. AB -

Multidimensional computerized adaptive testing (MCAT) is able to provide a vector of ability estimates for each examinee, which could be used to provide a more informative profile of an examinee’s performance. The current literature on MCAT focuses on the fixed-length tests, which can generate less accurate results for those examinees whose abilities are quite different from the average difficulty level of the item bank when there are only a limited number of items in the item bank. Therefore, instead of stopping the test with a predetermined fixed test length, the authors use a more informative stopping criterion that is directly related to measurement accuracy. Specifically, this research derives four stopping rules that either quantify the measurement precision of the ability vector (i.e., minimum determinant rule [D-rule], minimum eigenvalue rule [E-rule], and maximum trace rule [T-rule]) or quantify the amount of available information carried by each item (i.e., maximum Kullback–Leibler divergence rule [K-rule]). The simulation results showed that all four stopping rules successfully terminated the test when the mean squared error of ability estimation is within a desired range, regardless of examinees’ true abilities. It was found that when using the D-, E-, or T-rule, examinees with extreme abilities tended to have tests that were twice as long as the tests received by examinees with moderate abilities. However, the test length difference with K-rule is not very dramatic, indicating that K-rule may not be very sensitive to measurement precision. In all cases, the cutoff value for each stopping rule needs to be adjusted on a case-by-case basis to find an optimal solution.

VL - 37 UR - http://apm.sagepub.com/content/37/2/99.abstract ER - TY - JOUR T1 - Estimating Measurement Precision in Reduced-Length Multi-Stage Adaptive Testing JF - Journal of Computerized Adaptive Testing Y1 - 2013 A1 - Crotts, K.M. A1 - Zenisky, A. L. A1 - Sireci, S.G. A1 - Li, X. VL - 1 IS - 4 ER - TY - JOUR T1 - A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2013 A1 - Wang, Chun A1 - Fan, Zhewen A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A. AB -

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the advantage of conciseness, but may suffer from reduced flexibility to fit real data. We propose a semiparametric approach, specifically, the Cox proportional hazards model with a latent speed covariate to model the RTs, embedded within the hierarchical framework proposed by van der Linden to model the RTs and response accuracy simultaneously. This semiparametric approach combines the flexibility of nonparametric modeling and the brevity and interpretability of the parametric modeling. A Markov chain Monte Carlo method for parameter estimation is given and may be used with sparse data obtained by computerized adaptive testing. Both simulation studies and real data analysis are carried out to demonstrate the applicability of the new model.

VL - 38 UR - http://jeb.sagepub.com/cgi/content/abstract/38/4/381 ER - TY - JOUR T1 - Variable-Length Computerized Adaptive Testing Based on Cognitive Diagnosis Models JF - Applied Psychological Measurement Y1 - 2013 A1 - Hsu, Chia-Ling A1 - Wang, Wen-Chung A1 - Chen, Shu-Ying AB -

Interest in developing computerized adaptive testing (CAT) under cognitive diagnosis models (CDMs) has increased recently. CAT algorithms that use a fixed-length termination rule frequently lead to different degrees of measurement precision for different examinees. Fixed precision, in which the examinees receive the same degree of measurement precision, is a major advantage of CAT over nonadaptive testing. In addition to the precision issue, test security is another important issue in practical CAT programs. In this study, the authors implemented two termination criteria for the fixed-precision rule and evaluated their performance under two popular CDMs using simulations. The results showed that using the two criteria with the posterior-weighted Kullback–Leibler information procedure for selecting items could achieve the prespecified measurement precision. A control procedure was developed to control item exposure and test overlap simultaneously among examinees. The simulation results indicated that in contrast to no method of controlling exposure, the control procedure developed in this study could maintain item exposure and test overlap at the prespecified level at the expense of only a few more items.

VL - 37 UR - http://apm.sagepub.com/content/37/7/563.abstract ER - TY - JOUR T1 - Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study. JF - BMC Med Res Methodol Y1 - 2012 A1 - Riley, Barth B A1 - Carle, Adam C KW - Bayes Theorem KW - Data Interpretation, Statistical KW - Humans KW - Mathematical Computing KW - Monte Carlo Method KW - Outcome Assessment (Health Care) AB -

BACKGROUND: Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.

METHOD: Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust Z (RZ) test, and (2) 95% credible intervals (CrI) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in θ estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition.

RESULTS: Both methods evidenced good to excellent false positive control, with RZ providing better control of false positives and with slightly higher power for CrI, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. RZ outperformed CrI, due to better control of false positive DIF.

CONCLUSIONS: Whereas false positives were well controlled, particularly for RZ, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.

VL - 12 ER - TY - JOUR T1 - Computerized Adaptive Testing Using a Class of High-Order Item Response Theory Models JF - Applied Psychological Measurement Y1 - 2012 A1 - Huang, Hung-Yu A1 - Chen, Po-Hsi A1 - Wang, Wen-Chung AB -

In the human sciences, a common assumption is that latent traits have a hierarchical structure. Higher order item response theory models have been developed to account for this hierarchy. In this study, computerized adaptive testing (CAT) algorithms based on these kinds of models were implemented, and their performance under a variety of situations was examined using simulations. The results showed that the CAT algorithms were very effective. The progressive method for item selection, the Sympson and Hetter method with online and freeze procedure for item exposure control, and the multinomial model for content balancing can simultaneously maintain good measurement precision, item exposure control, content balance, test security, and pool usage.

VL - 36 UR - http://apm.sagepub.com/content/36/8/689.abstract ER - TY - JOUR T1 - An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2012 A1 - Yen, Yung-Chin A1 - Ho, Rong-Guey A1 - Laio, Wen-Wei A1 - Chen, Li-Ju A1 - Kuo, Ching-Chin AB -

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item administration. To enhance the robust performance of CAT against aberrant responses, Barton and Lord proposed the four-parameter logistic (4PL) item response theory (IRT) model. However, most studies relevant to the 4PL IRT model were conducted based on simulation experiments. This study attempts to investigate the performance of the 4PL IRT model as a slip-correction mechanism with an empirical experiment. The results showed that the 4PL IRT model could not only reduce the problematic underestimation of the examinees’ ability introduced by careless mistakes in practical situations but also improve measurement efficiency.

VL - 36 UR - http://apm.sagepub.com/content/36/2/75.abstract ER - TY - JOUR T1 - Investigating the Effect of Item Position in Computer-Based Tests JF - Journal of Educational Measurement Y1 - 2012 A1 - Li, Feiming A1 - Cohen, Allan A1 - Shen, Linjun AB -

Computer-based tests (CBTs) often use random ordering of items in order to minimize item exposure and reduce the potential for answer copying. Little research has been done, however, to examine item position effects for these tests. In this study, different versions of a Rasch model and different response time models were examined and applied to data from a CBT administration of a medical licensure examination. The models specifically were used to investigate whether item position affected item difficulty and item intensity estimates. Results indicated that the position effect was negligible.

VL - 49 UR - http://dx.doi.org/10.1111/j.1745-3984.2012.00181.x ER - TY - JOUR T1 - A Mixture Rasch Model–Based Computerized Adaptive Test for Latent Class Identification JF - Applied Psychological Measurement Y1 - 2012 A1 - Hong Jiao, A1 - Macready, George A1 - Liu, Junhui A1 - Cho, Youngmi AB -

This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback–Leibler (KL) information were proposed and compared with the reversed and the adaptive KL information under simulated testing conditions. When item separation was large, all item selection methods did not differ evidently in terms of accuracy in classifying examinees into different latent classes and estimating latent ability. However, when item separation was small, two methods with class-specific ability estimates performed better than the other two methods based on a single latent ability estimate across all latent classes. The three types of KL information distributions were compared. The KL and the reversed KL information could be the same or different depending on the ability level and the item difficulty difference between latent classes. Although the KL information and the reversed KL information were different at some ability levels and item difficulty difference levels, the use of the KL, the reversed KL, or the adaptive KL information did not affect the results substantially due to the symmetric distribution of item difficulty differences between latent classes in the simulated item pools. Item pool usage and classification convergence points were examined as well.

VL - 36 UR - http://apm.sagepub.com/content/36/6/469.abstract ER - TY - JOUR T1 - Panel Design Variations in the Multistage Test Using the Mixed-Format Tests JF - Educational and Psychological Measurement Y1 - 2012 A1 - Kim, Jiseon A1 - Chung, Hyewon A1 - Dodd, Barbara G. A1 - Park, Ryoungsun AB -

This study compared various panel designs of the multistage test (MST) using mixed-format tests in the context of classification testing. Simulations varied the design of the first-stage module. The first stage was constructed according to three levels of test information functions (TIFs) with three different TIF centers. Additional computerized adaptive test (CAT) conditions provided baseline comparisons. Three passing rate conditions were also included. The various MST conditions using mixed-format tests were constructed properly and performed well. When the levels of TIFs at the first stage were higher, the simulations produced a greater number of correct classifications. CAT with the randomesque-10 procedure yielded comparable results to the MST with increased levels of TIFs. Finally, all MST conditions achieved better test security results compared with CAT’s maximum information conditions.

VL - 72 UR - http://epm.sagepub.com/content/72/4/574.abstract ER - TY - JOUR T1 - On the Reliability and Validity of a Numerical Reasoning Speed Dimension Derived From Response Times Collected in Computerized Testing JF - Educational and Psychological Measurement Y1 - 2012 A1 - Davison, Mark L. A1 - Semmes, Robert A1 - Huang, Lan A1 - Close, Catherine N. AB -

Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy of speeded responses. Speed factor scores were significantly correlated with performance on the ACT math scale. Results suggest that the speed dimension underlying response times can be reliably measured and that the dimension is related to the accuracy of performance under the pressure of time limits.

VL - 72 UR - http://epm.sagepub.com/content/72/2/245.abstract ER - TY - JOUR T1 - Applying computerized adaptive testing to the CES-D scale: A simulation study JF - Psychiatry Research Y1 - 2011 A1 - Smits, N. A1 - Cuijpers, P. A1 - van Straten, A. VL - 188 IS - 1 ER - TY - JOUR T1 - Applying computerized adaptive testing to the CES-D scale: A simulation study JF - Psychiatry Research Y1 - 2011 A1 - Smits, N. A1 - Cuijpers, P. A1 - van Straten, A. AB - In this paper we studied the appropriateness of developing an adaptive version of the Center of Epidemiological Studies-Depression (CES-D, Radloff, 1977) scale. Computerized Adaptive Testing (CAT) involves the computerized administration of a test in which each item is dynamically selected from a pool of items until a pre-specified measurement precision is reached. Two types of analyses were performed using the CES-D responses of a large sample of adolescents (N=1392). First, it was shown that the items met the psychometric requirements needed for CAT. Second, CATs were simulated by using the existing item responses as if they had been collected adaptively. CATs selecting only a small number of items gave results which, in terms of depression measurement and criterion validity, were only marginally different from the results of full CES-D assessment. It was concluded that CAT is a very fruitful way of improving the efficiency of the CES-D questionnaire. The discussion addresses the strengths and limitations of the application of CAT in mental health research. SN - 0165-1781 (Print)0165-1781 (Linking) N1 - Psychiatry Res. 2011 Jan 3. ER - TY - CONF T1 - Building Affordable CD-CAT Systems for Schools To Address Today's Challenges In Assessment T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Chang, Hua-Hua KW - affordability KW - CAT KW - cost JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model JF - International Journal of Testing Y1 - 2011 A1 - Stark, Stephen A1 - Chernyshenko, Oleksandr S. VL - 11 UR - http://www.tandfonline.com/doi/abs/10.1080/15305058.2011.561459 ER - TY - ABST T1 - Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients Y1 - 2011 A1 - Giesinger, J. M. A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Arraras, J. I. A1 - Conroy, T. A1 - Gamper, E. M. A1 - Kemmler, G. A1 - King, M. T. A1 - Oberguggenberger, A. S. A1 - Velikova, G. A1 - Young, T. A1 - Holzner, B. A1 - Eortc-Qlg, E. O. AB - ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. JF - Health and Quality of Life Outcomes VL - 9 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Health Qual Life Outcomes. 2011 Mar 29;9(1):19. ER - TY - CONF T1 - Detecting DIF between Conventional and Computerized Adaptive Testing: A Monte Carlo Study T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Barth B. Riley A1 - Adam C. Carle KW - 95% Credible Interval KW - CAT KW - DIF KW - differential item function KW - modified robust Z statistic KW - Monte Carlo methodologies AB -

A comparison od two procedures, Modified Robust Z and 95% Credible Interval, were compared in a Monte Carlo study. Both procedures evidenced adequate control of false positive DIF results.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - CONF T1 - A Heuristic Of CAT Item Selection Procedure For Testlets T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Yuehmei Chien A1 - David Shin A1 - Walter Denny Way KW - CAT KW - shadow test KW - testlets JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - ABST T1 - Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger Y1 - 2011 A1 - Pilkonis, P. A. A1 - Choi, S. W. A1 - Reise, S. P. A1 - Stover, A. M. A1 - Riley, W. T. A1 - Cella, D. JF - Assessment SN - 1073-1911 ER - TY - JOUR T1 - A new adaptive testing algorithm for shortening health literacy assessments JF - BMC Medical Informatics and Decision Making Y1 - 2011 A1 - Kandula, S. A1 - Ancker, J.S. A1 - Kaufman, D.R. A1 - Currie, L.M. A1 - Qing, Z.-T. AB -

 

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178473/?tool=pmcentrez
VL - 11 IS - 52 ER - TY - JOUR T1 - A New Stopping Rule for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2011 A1 - Choi, Seung W. A1 - Grady, Matthew W. A1 - Dodd, Barbara G. AB -

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing (CAT). The predicted standard error reduction (PSER) stopping rule uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared with that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant.

VL - 71 UR - http://epm.sagepub.com/content/71/1/37.abstract ER - TY - JOUR T1 - Restrictive Stochastic Item Selection Methods in Cognitive Diagnostic Computerized Adaptive Testing JF - Journal of Educational Measurement Y1 - 2011 A1 - Wang, Chun A1 - Chang, Hua-Hua A1 - Huebner, Alan AB -

This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback-Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson-Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.

VL - 48 UR - http://dx.doi.org/10.1111/j.1745-3984.2011.00145.x ER - TY - JOUR T1 - A Comparison of Content-Balancing Procedures for Estimating Multiple Clinical Domains in Computerized Adaptive Testing: Relative Precision, Validity, and Detection of Persons With Misfitting Responses JF - Applied Psychological Measurement Y1 - 2010 A1 - Barth B. Riley A1 - Michael L. Dennis A1 - Conrad, Kendon J. AB -

This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target number of items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.

VL - 34 UR - http://apm.sagepub.com/content/34/6/410.abstract ER - TY - JOUR T1 - A comparison of content-balancing procedures for estimating multiple clinical domains in computerized adaptive testing: Relative precision, validity, and detection of persons with misfitting responses JF - Applied Psychological Measurement Y1 - 2010 A1 - Riley, B. B. A1 - Dennis, M. L. A1 - Conrad, K. J. AB - This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target numberof items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.Keywords VL - 34 SN - 0146-62161552-3497 ER - TY - JOUR T1 - Development and evaluation of a confidence-weighting computerized adaptive testing JF - Educational Technology & Society Y1 - 2010 A1 - Yen, Y. C. A1 - Ho, R. G. A1 - Chen, L. J. A1 - Chou, K. Y. A1 - Chen, Y. L. VL - 13(3) ER - TY - JOUR T1 - Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension JF - Quality of Life Research Y1 - 2010 A1 - Petersen, M. A. A1 - Groenvold, M. A1 - Aaronson, N. K. A1 - Chie, W. C. A1 - Conroy, T. A1 - Costantini, A. A1 - Fayers, P. A1 - Helbostad, J. A1 - Holzner, B. A1 - Kaasa, S. A1 - Singer, S. A1 - Velikova, G. A1 - Young, T. AB - PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. VL - 20 SN - 1573-2649 (Electronic)0962-9343 (Linking) N1 - Qual Life Res. 2010 Oct 23. ER - TY - JOUR T1 - Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms JF - Quality of Life Research Y1 - 2010 A1 - Choi, S. A1 - Reise, S. P. A1 - Pilkonis, P. A. A1 - Hays, R. D. A1 - Cella, D. VL - 19(1) ER - TY - JOUR T1 - A new stopping rule for computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 2010 A1 - Choi, S. W. A1 - Grady, M. W. A1 - Dodd, B. G. AB - The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. VL - 70 SN - 0013-1644 (Print)0013-1644 (Linking) N1 - U01 AR052177-04/NIAMS NIH HHS/Educ Psychol Meas. 2010 Dec 1;70(6):1-17. U2 - 3028267 ER - TY - JOUR T1 - Online calibration via variable length computerized adaptive testing JF - Psychometrika Y1 - 2010 A1 - Chang, Y. I. A1 - Lu, H. Y. AB - Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration processes being proposed and discussed from both theoretical and practical perspectives. Among them, the online calibration may be one of the most cost effective processes. In this paper, under a variable length computerized adaptive testing scenario, we integrate the methods of adaptive design, sequential estimation, and measurement error models to solve online item calibration problems. The proposed sequential estimate of item parameters is shown to be strongly consistent and asymptotically normally distributed with a prechosen accuracy. Numerical results show that the proposed method is very promising in terms of both estimation accuracy and efficiency. The results of using calibrated items to estimate the latent trait levels are also reported. VL - 75 SN - 0033-3123 ER - TY - JOUR T1 - A Procedure for Controlling General Test Overlap in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2010 A1 - Chen, Shu-Ying AB -

To date, exposure control procedures that are designed to control test overlap in computerized adaptive tests (CATs) are based on the assumption of item sharing between pairs of examinees. However, in practice, examinees may obtain test information from more than one previous test taker. This larger scope of information sharing needs to be considered in conducting test overlap control. The purpose of this study is to propose a test overlap control method such that the proportion of overlapping items encountered by an examinee with a group of previous examinees (described as general test overlap rate) can be controlled. Results indicated that item exposure rate and general test overlap rate could be simultaneously controlled by implementing the procedure. In addition, these two indices were controlled on the fly without any iterative simulations conducted prior to operational CATs. Thus, the proposed procedure would be an efficient method for controlling both the item exposure and general test overlap in CATs.

VL - 34 UR - http://apm.sagepub.com/content/34/6/393.abstract ER - TY - JOUR T1 - Stratified and maximum information item selection procedures in computer adaptive testing JF - Journal of Educational Measurement Y1 - 2010 A1 - Deng, H. A1 - Ansley, T. A1 - Chang, H.-H. VL - 47 ER - TY - JOUR T1 - Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing JF - Journal of Educational Measurement Y1 - 2010 A1 - Deng, Hui A1 - Ansley, Timothy A1 - Chang, Hua-Hua AB -

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.

VL - 47 UR - http://dx.doi.org/10.1111/j.1745-3984.2010.00109.x ER - TY - JOUR T1 - The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research JF - Journal of Applied Measurement Y1 - 2010 A1 - Gershon, R. C. A1 - Rothrock, N. A1 - Hanrahan, R. A1 - Bass, M. A1 - Cella, D. AB - The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed as one of the first projects funded by the NIH Roadmap for Medical Research Initiative to re-engineer the clinical research enterprise. The primary goal of PROMIS is to build item banks and short forms that measure key health outcome domains that are manifested in a variety of chronic diseases which could be used as a "common currency" across research projects. To date, item banks, short forms and computerized adaptive tests (CAT) have been developed for 13 domains with relevance to pediatric and adult subjects. To enable easy delivery of these new instruments, PROMIS built a web-based resource (Assessment Center) for administering CATs and other self-report data, tracking item and instrument development, monitoring accrual, managing data, and storing statistical analysis results. Assessment Center can also be used to deliver custom researcher developed content, and has numerous features that support both simple and complicated accrual designs (branching, multiple arms, multiple time points, etc.). This paper provides an overview of the development of the PROMIS item banks and details Assessment Center functionality. VL - 11 SN - 1529-7713 ER - TY - ABST T1 - Validation of a computer-adaptive test to evaluate generic health-related quality of life Y1 - 2010 A1 - Rebollo, P. A1 - Castejon, I. A1 - Cuervo, J. A1 - Villa, G. A1 - Garcia-Cueto, E. A1 - Diaz-Cuervo, H. A1 - Zardain, P. C. A1 - Muniz, J. A1 - Alonso, J. AB - BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. JF - Health and Quality of Life Outcomes VL - 8 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Rebollo, PabloCastejon, IgnacioCuervo, JesusVilla, GuillermoGarcia-Cueto, EduardoDiaz-Cuervo, HelenaZardain, Pilar CMuniz, JoseAlonso, JordiSpanish CAT-Health Research GroupEnglandHealth Qual Life Outcomes. 2010 Dec 3;8:147. U2 - 3022567 ER - TY - JOUR T1 - An adaptive testing system for supporting versatile educational assessment JF - Computers and Education Y1 - 2009 A1 - Huang, Y-M. A1 - Lin, Y-T. A1 - Cheng, S-C. KW - Architectures for educational technology system KW - Distance education and telelearning AB - With the rapid growth of computer and mobile technology, it is a challenge to integrate computer based test (CBT) with mobile learning (m-learning) especially for formative assessment and self-assessment. In terms of self-assessment, computer adaptive test (CAT) is a proper way to enable students to evaluate themselves. In CAT, students are assessed through a process that uses item response theory (IRT), a well-founded psychometric theory. Furthermore, a large item bank is indispensable to a test, but when a CAT system has a large item bank, the test item selection of IRT becomes more tedious. Besides the large item bank, item exposure mechanism is also essential to a testing system. However, IRT all lack the above-mentioned points. These reasons have motivated the authors to carry out this study. This paper describes a design issue aimed at the development and implementation of an adaptive testing system. The system can support several assessment functions and different devices. Moreover, the researchers apply a novel approach, particle swarm optimization (PSO) to alleviate the computational complexity and resolve the problem of item exposure. Throughout the development of the system, a formative evaluation was embedded into an integral part of the design methodology that was used for improving the system. After the system was formally released onto the web, some questionnaires and experiments were conducted to evaluate the usability, precision, and efficiency of the system. The results of these evaluations indicated that the system provides an adaptive testing for different devices and supports versatile assessment functions. Moreover, the system can estimate students' ability reliably and validly and conduct an adaptive test efficiently. Furthermore, the computational complexity of the system was alleviated by the PSO approach. By the approach, the test item selection procedure becomes efficient and the average best fitness values are very close to the optimal solutions. VL - 52 SN - 0360-1315 N1 - doi: DOI: 10.1016/j.compedu.2008.06.007 ER - TY - CHAP T1 - Adequacy of an item pool measuring proficiency in English language to implement a CAT procedure Y1 - 2009 A1 - Karino, C. A. A1 - Costa, D. R. A1 - Laros, J. A. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 160 KB} ER - TY - CHAP T1 - Applications of CAT in admissions to higher education in Israel: Twenty-two years of experience Y1 - 2009 A1 - Gafni, N. A1 - Cohen, Y. A1 - Roded, K A1 - Baumer, M A1 - Moshinsky, A. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 326 KB} ER - TY - CHAP T1 - A burdened CAT: Incorporating response burden with maximum Fisher's information for item selection Y1 - 2009 A1 - Swartz, R.J.. A1 - Choi, S. W. AB - Widely used in various educational and vocational assessment applications, computerized adaptive testing (CAT) has recently begun to be used to measure patient-reported outcomes Although successful in reducing respondent burden, most current CAT algorithms do not formally consider it as part of the item selection process. This study used a loss function approach motivated by decision theory to develop an item selection method that incorporates respondent burden into the item selection process based on maximum Fisher information item selection. Several different loss functions placing varying degrees of importance on respondent burden were compared, using an item bank of 62 polytomous items measuring depressive symptoms. One dataset consisted of the real responses from the 730 subjects who responded to all the items. A second dataset consisted of simulated responses to all the items based on a grid of latent trait scores with replicates at each grid point. The algorithm enables a CAT administrator to more efficiently control the respondent burden without severely affecting the measurement precision than when using MFI alone. In particular, the loss function incorporating respondent burden protected respondents from receiving longer tests when their estimated trait score fell in a region where there were few informative items. CY - In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 374 KB} ER - TY - CHAP T1 - Comparison of adaptive Bayesian estimation and weighted Bayesian estimation in multidimensional computerized adaptive testing Y1 - 2009 A1 - Chen, P. H. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 308KB} ER - TY - JOUR T1 - Comparison of CAT Item Selection Criteria for Polytomous Items JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, Seung W. A1 - Swartz, Richard J. AB -

Item selection is a core component in computerized adaptive testing (CAT). Several studies have evaluated new and classical selection methods; however, the few that have applied such methods to the use of polytomous items have reported conflicting results. To clarify these discrepancies and further investigate selection method properties, six different selection methods are compared systematically. The results showed no clear benefit from more sophisticated selection criteria and showed one method previously believed to be superior—the maximum expected posterior weighted information (MEPWI)—to be mathematically equivalent to a simpler method, the maximum posterior weighted information (MPWI).

VL - 33 UR - http://apm.sagepub.com/content/33/6/419.abstract ER - TY - JOUR T1 - Comparison of CAT item selection criteria for polytomous items JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, S. W. A1 - Swartz, R.J.. VL - 33 ER - TY - CHAP T1 - A comparison of three methods of item selection for computerized adaptive testing Y1 - 2009 A1 - Costa, D. R. A1 - Karino, C. A. A1 - Moura, F. A. S. A1 - Andrade, D. F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF file, 531 K ER - TY - CHAP T1 - Computerized adaptive testing for cognitive diagnosis Y1 - 2009 A1 - Cheng, Y CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 308 KB} ER - TY - JOUR T1 - Constraint-weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control JF - Educational and Psychological Measurement Y1 - 2009 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Douglas, J. A1 - Guo, F. VL - 69 ER - TY - JOUR T1 - Constraint-Weighted a-Stratification for Computerized Adaptive Testing With Nonstatistical Constraints JF - Educational and Psychological Measurement Y1 - 2009 A1 - Ying Cheng, A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A1 - Fanmin Guo, AB -

a-stratification is a method that utilizes items with small discrimination (a) parameters early in an exam and those with higher a values when more is learned about the ability parameter. It can achieve much better item usage than the maximum information criterion (MIC). To make a-stratification more practical and more widely applicable, a method for weighting the item selection process in a-stratification as a means of satisfying multiple test constraints is proposed. This method is studied in simulation against an analogous method without stratification as well as a-stratification using descending-rather than ascending-a procedures. In addition, a variation of a-stratification that allows for unbalanced usage of a parameters is included in the study to examine the trade-off between efficiency and exposure control. Finally, MIC and randomized item selection are included as baseline measures. Results indicate that the weighting mechanism successfully addresses the constraints, that stratification helps to a great extent balancing exposure rates, and that the ascending-a design improves measurement precision.

VL - 69 UR - http://epm.sagepub.com/content/69/1/35.abstract ER - TY - JOUR T1 - Diagnostic classification models and multidimensional adaptive testing: A commentary on Rupp and Templin. JF - Measurement: Interdisciplinary Research and Perspectives Y1 - 2009 A1 - Frey, A. A1 - Carstensen, C. H. VL - 7 ER - TY - JOUR T1 - Firestar: Computerized adaptive testing simulation program for polytomous IRT models JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, S. W. VL - 33 SN - 1552-3497 (Electronic)0146-6216 (Linking) N1 - U01 AR052177-04/NIAMS NIH HHS/United StatesJournal articleApplied psychological measurementAppl Psychol Meas. 2009 Nov 1;33(8):644-645. U2 - 2790213 ER - TY - JOUR T1 - Firestar: Computerized adaptive testing simulation program for polytomous IRT models JF - Applied Psychological Measurement Y1 - 2009 A1 - Choi, S. W. VL - 33 ER - TY - CHAP T1 - Kullback-Leibler information in multidimensional adaptive testing: theory and application Y1 - 2009 A1 - Wang, C. A1 - Chang, Hua-Hua AB - Built on multidimensional item response theory (MIRT), multidimensional adaptive testing (MAT) can, in principle, provide a promising choice to ensuring efficient estimation of each ability dimension in a multidimensional vector. Currently, two item selection procedures have been developed for MAT, one based on Fisher information embedded within a Bayesian framework, and the other powered by Kullback-Leibler (KL) information. It is well-known that in unidimensional IRT that the second derivative of KL information (also termed “global information”) is Fisher information evaluated atθ 0. This paper first generalizes the relationship between these two types of information in two ways—the analytical result is given as well as the graphical representation, to enhance interpretation and understanding. Second, a KL information index is constructed for MAT, which represents the integration of KL nformation over all of the ability dimensions. This paper further discusses how this index correlates with the item discrimination parameters. The analytical results would lay foundation for future development of item selection methods in MAT which can help equalize the item exposure rate. Finally, a simulation study is conducted to verify the above results. The connection between the item parameters, item KL information, and item exposure rate is demonstrated for empirical MAT delivered by an item bank calibrated under two-dimensional IRT. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 316 KB} ER - TY - JOUR T1 - The maximum priority index method for severely constrained item selection in computerized adaptive testing JF - British Journal of Mathematical and Statistical Psychology Y1 - 2009 A1 - Cheng, Y A1 - Chang, Hua-Hua KW - Aptitude Tests/*statistics & numerical data KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Educational Measurement/*statistics & numerical data KW - Humans KW - Mathematical Computing KW - Models, Statistical KW - Personality Tests/*statistics & numerical data KW - Psychometrics/*statistics & numerical data KW - Reproducibility of Results KW - Software AB - This paper introduces a new heuristic approach, the maximum priority index (MPI) method, for severely constrained item selection in computerized adaptive testing. Our simulation study shows that it is able to accommodate various non-statistical constraints simultaneously, such as content balancing, exposure control, answer key balancing, and so on. Compared with the weighted deviation modelling method, it leads to fewer constraint violations and better exposure control while maintaining the same level of measurement precision. VL - 62 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Cheng, YingChang, Hua-HuaResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2009 May;62(Pt 2):369-83. Epub 2008 Jun 2. ER - TY - CHAP T1 - Obtaining reliable diagnostic information through constrained CAT Y1 - 2009 A1 - Wang, C. A1 - Chang, Hua-Hua A1 - Douglas, J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 252 KB} ER - TY - CHAP T1 - Optimizing item exposure control algorithms for polytomous computerized adaptive tests with restricted item banks Y1 - 2009 A1 - Chajewski, M. A1 - Lewis, C. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 923 KB} ER - TY - JOUR T1 - Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing JF - Journal of Rheumatology Y1 - 2009 A1 - Fries, J.F. A1 - Cella, D. A1 - Rose, M. A1 - Krishnan, E. A1 - Bruce, B. KW - *Disability Evaluation KW - *Outcome Assessment (Health Care) KW - Arthritis/diagnosis/*physiopathology KW - Health Surveys KW - Humans KW - Prognosis KW - Reproducibility of Results AB - OBJECTIVE: Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated. METHODS: A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ. RESULTS: Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms. CONCLUSION: Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length. VL - 36 SN - 0315-162X (Print)0315-162X (Linking) N1 - Fries, James FCella, DavidRose, MatthiasKrishnan, EswarBruce, BonnieU01 AR052158/AR/NIAMS NIH HHS/United StatesU01 AR52177/AR/NIAMS NIH HHS/United StatesConsensus Development ConferenceResearch Support, N.I.H., ExtramuralCanadaThe Journal of rheumatologyJ Rheumatol. 2009 Sep;36(9):2061-6. ER - TY - JOUR T1 - Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation JF - Health and Quality of Life Outcomes Y1 - 2009 A1 - Chien, T. W. A1 - Wu, H. M. A1 - Wang, W-C. A1 - Castillo, R. V. A1 - Chou, W. KW - *Activities of Daily Living KW - *Computer Graphics KW - *Computer Simulation KW - *Diagnosis, Computer-Assisted KW - Female KW - Humans KW - Male KW - Point-of-Care Systems KW - Reproducibility of Results KW - Stroke/*rehabilitation KW - Taiwan KW - United States AB - BACKGROUND: The aim of this study was to verify the effectiveness and efficacy of saving time and reducing burden for patients, nurses, and even occupational therapists through computer adaptive testing (CAT). METHODS: Based on an item bank of the Barthel Index (BI) and the Frenchay Activities Index (FAI) for assessing comprehensive activities of daily living (ADL) function in stroke patients, we developed a visual basic application (VBA)-Excel CAT module, and (1) investigated whether the averaged test length via CAT is shorter than that of the traditional all-item-answered non-adaptive testing (NAT) approach through simulation, (2) illustrated the CAT multimedia on a tablet PC showing data collection and response errors of ADL clinical functional measures in stroke patients, and (3) demonstrated the quality control of endorsing scale with fit statistics to detect responding errors, which will be further immediately reconfirmed by technicians once patient ends the CAT assessment. RESULTS: The results show that endorsed items could be shorter on CAT (M = 13.42) than on NAT (M = 23) at 41.64% efficiency in test length. However, averaged ability estimations reveal insignificant differences between CAT and NAT. CONCLUSION: This study found that mobile nursing services, placed at the bedsides of patients could, through the programmed VBA-Excel CAT module, reduce the burden to patients and save time, more so than the traditional NAT paper-and-pencil testing appraisals. VL - 7 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Chien, Tsair-WeiWu, Hing-ManWang, Weng-ChungCastillo, Roberto VasquezChou, WillyComparative StudyValidation StudiesEnglandHealth and quality of life outcomesHealth Qual Life Outcomes. 2009 May 5;7:39. U2 - 2688502 ER - TY - JOUR T1 - When cognitive diagnosis meets computerized adaptive testing: CD-CAT JF - Psychometrika Y1 - 2009 A1 - Cheng, Y VL - 74 ER - TY - JOUR T1 - Assessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Coster, W. J. A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Disability Evaluation KW - *Social Adjustment KW - Activities of Daily Living KW - Adolescent KW - Age Factors KW - Child KW - Child, Preschool KW - Computer Simulation KW - Cross-Over Studies KW - Disabled Children/*rehabilitation KW - Female KW - Follow-Up Studies KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care) KW - Reference Values KW - Reproducibility of Results KW - Retrospective Studies KW - Risk Factors KW - Self Care/*standards/trends KW - Sex Factors KW - Sickness Impact Profile AB - OBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Coster, Wendy JHaley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesR41 HD052318-01A1/HD/NICHD NIH HHS/United StatesR43 HD42388-01/HD/NICHD NIH HHS/United StatesComparative StudyResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Apr;89(4):622-9. U2 - 2666276 ER - TY - JOUR T1 - Combining computer adaptive testing technology with cognitively diagnostic assessment JF - Behavioral Research Methods Y1 - 2008 A1 - McGlohen, M. A1 - Chang, Hua-Hua KW - *Cognition KW - *Computers KW - *Models, Statistical KW - *User-Computer Interface KW - Diagnosis, Computer-Assisted/*instrumentation KW - Humans AB - A major advantage of computerized adaptive testing (CAT) is that it allows the test to home in on an examinee's ability level in an interactive manner. The aim of the new area of cognitive diagnosis is to provide information about specific content areas in which an examinee needs help. The goal of this study was to combine the benefit of specific feedback from cognitively diagnostic assessment with the advantages of CAT. In this study, three approaches to combining these were investigated: (1) item selection based on the traditional ability level estimate (theta), (2) item selection based on the attribute mastery feedback provided by cognitively diagnostic assessment (alpha), and (3) item selection based on both the traditional ability level estimate (theta) and the attribute mastery feedback provided by cognitively diagnostic assessment (alpha). The results from these three approaches were compared for theta estimation accuracy, attribute mastery estimation accuracy, and item exposure control. The theta- and alpha-based condition outperformed the alpha-based condition regarding theta estimation, attribute mastery pattern estimation, and item exposure control. Both the theta-based condition and the theta- and alpha-based condition performed similarly with regard to theta estimation, attribute mastery estimation, and item exposure control, but the theta- and alpha-based condition has an additional advantage in that it uses the shadow test method, which allows the administrator to incorporate additional constraints in the item selection process, such as content balancing, item type constraints, and so forth, and also to select items on the basis of both the current theta and alpha estimates, which can be built on top of existing 3PL testing programs. VL - 40 SN - 1554-351X (Print) N1 - McGlohen, MeghanChang, Hua-HuaUnited StatesBehavior research methodsBehav Res Methods. 2008 Aug;40(3):808-21. ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Haley, S. M. A1 - Gandek, B. A1 - Siebens, H. A1 - Black-Schaffer, R. M. A1 - Sinclair, S. J. A1 - Tao, W. A1 - Coster, W. J. A1 - Ni, P. A1 - Jette, A. M. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Haley, Stephen MGandek, BarbaraSiebens, HilaryBlack-Schaffer, Randie MSinclair, Samuel JTao, WeiCoster, Wendy JNi, PengshengJette, Alan MK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01/HD/NICHD NIH HHS/United StatesR01 HD043568/HD/NICHD NIH HHS/United StatesR01 HD043568-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Feb;89(2):275-83. U2 - 2666330 ER - TY - JOUR T1 - Controlling item exposure and test overlap on the fly in computerized adaptive testing JF - British Journal of Mathematical and Statistical Psychology Y1 - 2008 A1 - Chen, S-Y. A1 - Lei, P. W. A1 - Liao, W. H. KW - *Decision Making, Computer-Assisted KW - *Models, Psychological KW - Humans AB - This paper proposes an on-line version of the Sympson and Hetter procedure with test overlap control (SHT) that can provide item exposure control at both the item and test levels on the fly without iterative simulations. The on-line procedure is similar to the SHT procedure in that exposure parameters are used for simultaneous control of item exposure rates and test overlap rate. The exposure parameters for the on-line procedure, however, are updated sequentially on the fly, rather than through iterative simulations conducted prior to operational computerized adaptive tests (CATs). Unlike the SHT procedure, the on-line version can control item exposure rate and test overlap rate without time-consuming iterative simulations even when item pools or examinee populations have been changed. Moreover, the on-line procedure was found to perform better than the SHT procedure in controlling item exposure and test overlap for examinees who take tests earlier. Compared with two other on-line alternatives, this proposed on-line method provided the best all-around test security control. Thus, it would be an efficient procedure for controlling item exposure and test overlap in CATs. VL - 61 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Chen, Shu-YingLei, Pui-WaLiao, Wen-HanResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2008 Nov;61(Pt 2):471-92. Epub 2007 Jul 23. ER - TY - JOUR T1 - Investigating item exposure control on the fly in computerized adaptive testing JF - Psychological Testing Y1 - 2008 A1 - Wu, M.-L. A1 - Chen, S-Y. VL - 55 ER - TY - JOUR T1 - Item exposure control in a-stratified computerized adaptive testing JF - Psychological Testing Y1 - 2008 A1 - Jhu, Y.-J., A1 - Chen, S-Y. VL - 55 ER - TY - JOUR T1 - Letting the CAT out of the bag: Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire JF - Spine Y1 - 2008 A1 - Cook, K. F. A1 - Choi, S. W. A1 - Crane, P. K. A1 - Deyo, R. A. A1 - Johnson, K. L. A1 - Amtmann, D. KW - *Disability Evaluation KW - *Health Status Indicators KW - Adult KW - Aged KW - Aged, 80 and over KW - Back Pain/*diagnosis/psychology KW - Calibration KW - Computer Simulation KW - Diagnosis, Computer-Assisted/*standards KW - Humans KW - Middle Aged KW - Models, Psychological KW - Predictive Value of Tests KW - Questionnaires/*standards KW - Reproducibility of Results AB - STUDY DESIGN: A post hoc simulation of a computer adaptive administration of the items of a modified version of the Roland-Morris Disability Questionnaire. OBJECTIVE: To evaluate the effectiveness of adaptive administration of back pain-related disability items compared with a fixed 11-item short form. SUMMARY OF BACKGROUND DATA: Short form versions of the Roland-Morris Disability Questionnaire have been developed. An alternative to paper-and-pencil short forms is to administer items adaptively so that items are presented based on a person's responses to previous items. Theoretically, this allows precise estimation of back pain disability with administration of only a few items. MATERIALS AND METHODS: Data were gathered from 2 previously conducted studies of persons with back pain. An item response theory model was used to calibrate scores based on all items, items of a paper-and-pencil short form, and several computer adaptive tests (CATs). RESULTS: Correlations between each CAT condition and scores based on a 23-item version of the Roland-Morris Disability Questionnaire ranged from 0.93 to 0.98. Compared with an 11-item short form, an 11-item CAT produced scores that were significantly more highly correlated with scores based on the 23-item scale. CATs with even fewer items also produced scores that were highly correlated with scores based on all items. For example, scores from a 5-item CAT had a correlation of 0.93 with full scale scores. Seven- and 9-item CATs correlated at 0.95 and 0.97, respectively. A CAT with a standard-error-based stopping rule produced scores that correlated at 0.95 with full scale scores. CONCLUSION: A CAT-based back pain-related disability measure may be a valuable tool for use in clinical and research contexts. Use of CAT for other common measures in back pain research, such as other functional scales or measures of psychological distress, may offer similar advantages. VL - 33 SN - 1528-1159 (Electronic) N1 - Cook, Karon FChoi, Seung WCrane, Paul KDeyo, Richard AJohnson, Kurt LAmtmann, Dagmar5 P60-AR48093/AR/United States NIAMS5U01AR052171-03/AR/United States NIAMSComparative StudyResearch Support, N.I.H., ExtramuralUnited StatesSpineSpine. 2008 May 20;33(12):1378-83. ER - TY - JOUR T1 - The NAPLEX: evolution, purpose, scope, and educational implications JF - American Journal of Pharmaceutical Education Y1 - 2008 A1 - Newton, D. W. A1 - Boyle, M. A1 - Catizone, C. A. KW - *Educational Measurement KW - Education, Pharmacy/*standards KW - History, 20th Century KW - History, 21st Century KW - Humans KW - Licensure, Pharmacy/history/*legislation & jurisprudence KW - North America KW - Pharmacists/*legislation & jurisprudence KW - Software AB - Since 2004, passing the North American Pharmacist Licensure Examination (NAPLEX) has been a requirement for earning initial pharmacy licensure in all 50 United States. The creation and evolution from 1952-2005 of the particular pharmacy competency testing areas and quantities of questions are described for the former paper-and-pencil National Association of Boards of Pharmacy Licensure Examination (NABPLEX) and the current candidate-specific computer adaptive NAPLEX pharmacy licensure examinations. A 40% increase in the weighting of NAPLEX Blueprint Area 2 in May 2005, compared to that in the preceding 1997-2005 Blueprint, has implications for candidates' NAPLEX performance and associated curricular content and instruction. New pharmacy graduates' scores on the NAPLEX are neither intended nor validated to serve as a criterion for assessing or judging the quality or effectiveness of pharmacy curricula and instruction. The newest cycle of NAPLEX Blueprint revision, a continual process to ensure representation of nationwide contemporary practice, began in early 2008. It may take up to 2 years, including surveying several thousand national pharmacists, to complete. VL - 72 SN - 1553-6467 (Electronic)0002-9459 (Linking) N1 - Newton, David WBoyle, MariaCatizone, Carmen AHistorical ArticleUnited StatesAmerican journal of pharmaceutical educationAm J Pharm Educ. 2008 Apr 15;72(2):33. U2 - 2384208 ER - TY - JOUR T1 - Predicting item exposure parameters in computerized adaptive testing JF - British Journal of Mathematical and Statistical Psychology Y1 - 2008 A1 - Chen, S-Y. A1 - Doong, S. H. KW - *Algorithms KW - *Artificial Intelligence KW - Aptitude Tests/*statistics & numerical data KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Humans KW - Models, Statistical KW - Psychometrics/statistics & numerical data KW - Reproducibility of Results KW - Software AB - The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) - a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge-based solution for finding item exposure parameters. VL - 61 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Chen, Shu-YingDoong, Shing-HwangResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2008 May;61(Pt 1):75-91. ER - TY - JOUR T1 - Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study JF - Applied Psychological Measurement Y1 - 2008 A1 - Qing Yi, A1 - Jinming Zhang, A1 - Chang, Hua-Hua AB -

Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive testing (CAT) for two realistic item selection methods, maximum item information and a-stratified with content blocking, using the randomized method as a baseline for comparison. Damage caused by organized item theft was evaluated by the number of compromised items each examinee could encounter and the impact of the compromised items on examinees' ability estimates. Severity of test security violation was assessed under self-organized and organized item theft simulation scenarios. Results indicated that though item theft could cause severe damage to CAT with either item selection method, the maximum item information method was more vulnerable to the organized item theft simulation than was the a-stratified method.

VL - 32 UR - http://apm.sagepub.com/content/32/7/543.abstract ER - TY - JOUR T1 - To Weight Or Not To Weight? Balancing Influence Of Initial Items In Adaptive Testing JF - Psychometrica Y1 - 2008 A1 - Chang, H.-H. A1 - Ying, Z. AB -

It has been widely reported that in computerized adaptive testing some examinees may get much lower scores than they would normally if an alternative paper-and-pencil version were given. The main purpose of this investigation is to quantitatively reveal the cause for the underestimation phenomenon. The logistic models, including the 1PL, 2PL, and 3PL models, are used to demonstrate our assertions. Our analytical derivation shows that, under the maximum information item selection strategy, if an examinee failed a few items at the beginning of the test, easy but more discriminating items are likely to be administered. Such items are ineffective to move the estimate close to the true theta, unless the test is sufficiently long or a variable-length test is used. Our results also indicate that a certain weighting mechanism is necessary to make the algorithm rely less on the items administered at the beginning of the test.

VL - 73 IS - 3 ER - TY - CHAP T1 - Adaptive testing with the multi-unidimensional pairwise preference model Y1 - 2007 A1 - Stark, S. A1 - Chernyshenko, O. S. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 145 KB} ER - TY - JOUR T1 - The comparison of maximum likelihood estimation and expected a posteriori in CAT using the graded response model JF - Journal of Elementary Education Y1 - 2007 A1 - Chen, S-K. VL - 19 ER - TY - CHAP T1 - Computerized attribute-adaptive testing: A new computerized adaptive testing approach incorporating cognitive psychology Y1 - 2007 A1 - Zhou, J. A1 - Gierl, M. J. A1 - Cui, Y. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 296 KB} ER - TY - JOUR T1 - Detecting Differential Speededness in Multistage Testing JF - Journal of Educational Measurement Y1 - 2007 A1 - van der Linden, Wim J. A1 - Breithaupt, Krista A1 - Chuah, Siang Chee A1 - Zhang, Yanwei AB -

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed between subtests and test takers and detecting differential speededness. An empirical data set for a multistage test in the computerized CPA Exam was used to demonstrate the procedures. Although the more difficult subtests appeared to have items that were more time intensive than the easier subtests, an analysis of the residual response times did not reveal any significant differential speededness because the time limit appeared to be appropriate. In a separate analysis, within each of the subtests, we found minor but consistent patterns of residual times that are believed to be due to a warm-up effect, that is, use of more time on the initial items than they actually need.

VL - 44 UR - http://dx.doi.org/10.1111/j.1745-3984.2007.00030.x ER - TY - JOUR T1 - Developing tailored instruments: item banking and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Bjorner, J. B. A1 - Chang, C-H. A1 - Thissen, D. A1 - Reeve, B. B. KW - *Health Status KW - *Health Status Indicators KW - *Mental Health KW - *Outcome Assessment (Health Care) KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Algorithms KW - Factor Analysis, Statistical KW - Humans KW - Models, Statistical KW - Psychometrics AB - Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. VL - 16 SN - 0962-9343 (Print) N1 - Bjorner, Jakob BueChang, Chih-HungThissen, DavidReeve, Bryce B1R43NS047763-01/NS/United States NINDSAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:95-108. Epub 2007 Feb 15. ER - TY - JOUR T1 - The effect of including pretest items in an operational computerized adaptive test: Do different ability examinees spend different amounts of time on embedded pretest items? JF - Educational Assessment Y1 - 2007 A1 - Ferdous, A. A. A1 - Plake, B. S. A1 - Chang, S-R. KW - ability KW - operational computerized adaptive test KW - pretest items KW - time AB - The purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Lawrence Erlbaum: US VL - 12 SN - 1062-7197 (Print); 1532-6977 (Electronic) ER - TY - JOUR T1 - The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Cella, D. A1 - Gershon, R. C. A1 - Lai, J-S. A1 - Choi, S. W. AB - The use of item banks and computerized adaptive testing (CAT) begins with clear definitions of important outcomes, and references those definitions to specific questions gathered into large and well-studied pools, or “banks” of items. Items can be selected from the bank to form customized short scales, or can be administered in a sequence and length determined by a computer programmed for precision and clinical relevance. Although far from perfect, such item banks can form a common definition and understanding of human symptoms and functional problems such as fatigue, pain, depression, mobility, social function, sensory function, and many other health concepts that we can only measure by asking people directly. The support of the National Institutes of Health (NIH), as witnessed by its cooperative agreement with measurement experts through the NIH Roadmap Initiative known as PROMIS (www.nihpromis.org), is a big step in that direction. Our approach to item banking and CAT is practical; as focused on application as it is on science or theory. From a practical perspective, we frequently must decide whether to re-write and retest an item, add more items to fill gaps (often at the ceiling of the measure), re-test a bank after some modifications, or split up a bank into units that are more unidimensional, yet less clinically relevant or complete. These decisions are not easy, and yet they are rarely unforgiving. We encourage people to build practical tools that are capable of producing multiple short form measures and CAT administrations from common banks, and to further our understanding of these banks with various clinical populations and ages, so that with time the scores that emerge from these many activities begin to have not only a common metric and range, but a shared meaning and understanding across users. In this paper, we provide an overview of item banking and CAT, discuss our approach to item banking and its byproducts, describe testing options, discuss an example of CAT for fatigue, and discuss models for long term sustainability of an entity such as PROMIS. Some barriers to success include limitations in the methods themselves, controversies and disagreements across approaches, and end-user reluctance to move away from the familiar. VL - 16 SN - 0962-9343 ER - TY - JOUR T1 - Improving patient reported outcomes using item response theory and computerized adaptive testing JF - Journal of Rheumatology Y1 - 2007 A1 - Chakravarty, E. F. A1 - Bjorner, J. B. A1 - Fries, J.F. KW - *Rheumatic Diseases/physiopathology/psychology KW - Clinical Trials KW - Data Interpretation, Statistical KW - Disability Evaluation KW - Health Surveys KW - Humans KW - International Cooperation KW - Outcome Assessment (Health Care)/*methods KW - Patient Participation/*methods KW - Research Design/*trends KW - Software AB - OBJECTIVE: Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. METHODS: We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. RESULTS: Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance"items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. CONCLUSION: By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials. VL - 34 SN - 0315-162X (Print) N1 - Chakravarty, Eliza FBjorner, Jakob BFries, James FAr052158/ar/niamsConsensus Development ConferenceResearch Support, N.I.H., ExtramuralCanadaThe Journal of rheumatologyJ Rheumatol. 2007 Jun;34(6):1426-31. ER - TY - JOUR T1 - IRT health outcomes data analysis project: an overview and summary JF - Quality of Life Research Y1 - 2007 A1 - Cook, K. F. A1 - Teal, C. R. A1 - Bjorner, J. B. A1 - Cella, D. A1 - Chang, C-H. A1 - Crane, P. K. A1 - Gibbons, L. E. A1 - Hays, R. D. A1 - McHorney, C. A. A1 - Ocepek-Welikson, K. A1 - Raczek, A. E. A1 - Teresi, J. A. A1 - Reeve, B. B. KW - *Data Interpretation, Statistical KW - *Health Status KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Female KW - HIV Infections/psychology KW - Humans KW - Male KW - Neoplasms/psychology KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Stress, Psychological AB - BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. VL - 16 SN - 0962-9343 (Print) N1 - Cook, Karon FTeal, Cayla RBjorner, Jakob BCella, DavidChang, Chih-HungCrane, Paul KGibbons, Laura EHays, Ron DMcHorney, Colleen AOcepek-Welikson, KatjaRaczek, Anastasia ETeresi, Jeanne AReeve, Bryce B1U01AR52171-01/AR/United States NIAMSR01 (CA60068)/CA/United States NCIY1-PC-3028-01/PC/United States NCIResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:121-32. Epub 2007 Mar 10. ER - TY - JOUR T1 - Methodological issues for building item banks and computerized adaptive scales JF - Quality of Life Research Y1 - 2007 A1 - Thissen, D. A1 - Reeve, B. B. A1 - Bjorner, J. B. A1 - Chang, C-H. AB - Abstract This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research. VL - 16 SN - 0962-93431573-2649 ER - TY - CHAP T1 - The modified maximum global discrimination index method for cognitive diagnostic computerized adaptive testing Y1 - 2007 A1 - Cheng, Y A1 - Chang, Hua-Hua CY -   D. J. Weiss (Ed.).  Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 172 KB} ER - TY - JOUR T1 - Patient-reported outcomes measurement and management with innovative methodologies and technologies JF - Quality of Life Research Y1 - 2007 A1 - Chang, C-H. KW - *Health Status KW - *Outcome Assessment (Health Care) KW - *Quality of Life KW - *Software KW - Computer Systems/*trends KW - Health Insurance Portability and Accountability Act KW - Humans KW - Patient Satisfaction KW - Questionnaires KW - United States AB - Successful integration of modern psychometrics and advanced informatics in patient-reported outcomes (PRO) measurement and management can potentially maximize the value of health outcomes research and optimize the delivery of quality patient care. Unlike the traditional labor-intensive paper-and-pencil data collection method, item response theory-based computerized adaptive testing methodologies coupled with novel technologies provide an integrated environment to collect, analyze and present ready-to-use PRO data for informed and shared decision-making. This article describes the needs, challenges and solutions for accurate, efficient and cost-effective PRO data acquisition and dissemination means in order to provide critical and timely PRO information necessary to actively support and enhance routine patient care in busy clinical settings. VL - 16 Suppl 1 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Chang, Chih-HungR21CA113191/CA/NCI NIH HHS/United StatesResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:157-66. Epub 2007 May 26. ER - TY - JOUR T1 - The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years JF - Medical Care Y1 - 2007 A1 - Cella, D. A1 - Yount, S. A1 - Rothrock, N. A1 - Gershon, R. C. A1 - Cook, K. F. A1 - Reeve, B. A1 - Ader, D. A1 - Fries, J.F. A1 - Bruce, B. A1 - Rose, M. AB - The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. VL - 45 ER - TY - JOUR T1 - Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) JF - Medical Care Y1 - 2007 A1 - Reeve, B. B. A1 - Hays, R. D. A1 - Bjorner, J. B. A1 - Cook, K. F. A1 - Crane, P. K. A1 - Teresi, J. A. A1 - Thissen, D. A1 - Revicki, D. A. A1 - Weiss, D. J. A1 - Hambleton, R. K. A1 - Liu, H. A1 - Gershon, R. C. A1 - Reise, S. P. A1 - Lai, J. S. A1 - Cella, D. KW - *Health Status KW - *Information Systems KW - *Quality of Life KW - *Self Disclosure KW - Adolescent KW - Adult KW - Aged KW - Calibration KW - Databases as Topic KW - Evaluation Studies as Topic KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Questionnaires/standards KW - United States AB - BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. VL - 45 SN - 0025-7079 (Print) N1 - Reeve, Bryce BHays, Ron DBjorner, Jakob BCook, Karon FCrane, Paul KTeresi, Jeanne AThissen, DavidRevicki, Dennis AWeiss, David JHambleton, Ronald KLiu, HonghuGershon, RichardReise, Steven PLai, Jin-sheiCella, DavidPROMIS Cooperative GroupAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralUnited StatesMedical careMed Care. 2007 May;45(5 Suppl 1):S22-31. ER - TY - JOUR T1 - Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN Substance Problem Scale JF - Journal of Applied Measurement Y1 - 2007 A1 - Riley, B. B. A1 - Conrad, K. J. A1 - Bezruczko, N. A1 - Dennis, M. L. KW - My article AB - Substance abuse treatment programs are being pressed to measure and make clinical decisions more efficiently about an increasing array of problems. This computerized adaptive testing (CAT) simulation examined the relative efficiency, precision and construct validity of different starting and stopping rules used to shorten the Global Appraisal of Individual Needs’ (GAIN) Substance Problem Scale (SPS) and facilitate diagnosis based on it. Data came from 1,048 adolescents and adults referred to substance abuse treatment centers in 5 sites. CAT performance was evaluated using: (1) average standard errors, (2) average number of items, (3) bias in personmeasures, (4) root mean squared error of person measures, (5) Cohen’s kappa to evaluate CAT classification compared to clinical classification, (6) correlation between CAT and full-scale measures, and (7) construct validity of CAT classification vs. clinical classification using correlations with five theoretically associated instruments. Results supported both CAT efficiency and validity. VL - 8 ER - TY - JOUR T1 - A system for interactive assessment and management in palliative care JF - Journal of Pain Symptom Management Y1 - 2007 A1 - Chang, C-H. A1 - Boni-Saenz, A. A. A1 - Durazo-Arvizu, R. A. A1 - DesHarnais, S. A1 - Lau, D. T. A1 - Emanuel, L. L. KW - *Needs Assessment KW - Humans KW - Medical Informatics/*organization & administration KW - Palliative Care/*organization & administration AB - The availability of psychometrically sound and clinically relevant screening, diagnosis, and outcome evaluation tools is essential to high-quality palliative care assessment and management. Such data will enable us to improve patient evaluations, prognoses, and treatment selections, and to increase patient satisfaction and quality of life. To accomplish these goals, medical care needs more precise, efficient, and comprehensive tools for data acquisition, analysis, interpretation, and management. We describe a system for interactive assessment and management in palliative care (SIAM-PC), which is patient centered, model driven, database derived, evidence based, and technology assisted. The SIAM-PC is designed to reliably measure the multiple dimensions of patients' needs for palliative care, and then to provide information to clinicians, patients, and the patients' families to achieve optimal patient care, while improving our capacity for doing palliative care research. This system is innovative in its application of the state-of-the-science approaches, such as item response theory and computerized adaptive testing, to many of the significant clinical problems related to palliative care. VL - 33 SN - 0885-3924 (Print) N1 - Chang, Chih-HungBoni-Saenz, Alexander ADurazo-Arvizu, Ramon ADesHarnais, SusanLau, Denys TEmanuel, Linda LR21CA113191/CA/United States NCIResearch Support, N.I.H., ExtramuralReviewUnited StatesJournal of pain and symptom managementJ Pain Symptom Manage. 2007 Jun;33(6):745-55. Epub 2007 Mar 23. ER - TY - JOUR T1 - Two-phase item selection procedure for flexible content balancing in CAT JF - Applied Psychological. Measurement Y1 - 2007 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Yi, Q. VL - 3 ER - TY - JOUR T1 - Two-Phase Item Selection Procedure for Flexible Content Balancing in CAT JF - Applied Psychological Measurement Y1 - 2007 A1 - Ying Cheng, A1 - Chang, Hua-Hua A1 - Qing Yi, AB -

Content balancing is an important issue in the design and implementation of computerized adaptive testing (CAT). Content-balancing techniques that have been applied in fixed content balancing, where the number of items from each content area is fixed, include constrained CAT (CCAT), the modified multinomial model (MMM), modified constrained CAT (MCCAT), and others. In this article, four methods are proposed to address the flexible content-balancing issue with the a-stratification design, named STR_C. The four methods are MMM+, an extension of MMM; MCCAT+, an extension of MCCAT; the TPM method, a two-phase content-balancing method using MMM in both phases; and the TPF method, a two-phase content-balancing method using MMM in the first phase and MCCAT in the second. Simulation results show that all of the methods work well in content balancing, and TPF performs the best in item exposure control and item pool utilization while maintaining measurement precision.

VL - 31 UR - http://apm.sagepub.com/content/31/6/467.abstract ER - TY - Generic T1 - The use of computerized adaptive testing to assess psychopathology using the Global Appraisal of Individual Needs T2 - American Evaluation Association Y1 - 2007 A1 - Conrad, K. J. A1 - Riley, B. B. A1 - Dennis, M. L. JF - American Evaluation Association PB - American Evaluation Association CY - Portland, OR USA ER - TY - JOUR T1 - Assessing CAT Test Security Severity JF - Applied Psychological Measurement Y1 - 2006 A1 - Yi, Q., Zhang, J. A1 - Chang, Hua-Hua VL - 30(1) ER - TY - JOUR T1 - Comparing methods of assessing differential item functioning in a computerized adaptive testing environment JF - Journal of Educational Measurement Y1 - 2006 A1 - Lei, P-W. A1 - Chen, S-Y. A1 - Yu, L. KW - computerized adaptive testing KW - educational testing KW - item response theory likelihood ratio test KW - logistic regression KW - trait estimation KW - unidirectional & non-unidirectional differential item functioning AB - Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 43 SN - 0022-0655 (Print) ER - TY - JOUR T1 - Comparing Methods of Assessing Differential Item Functioning in a Computerized Adaptive Testing Environment JF - Journal of Educational Measurement Y1 - 2006 A1 - Lei, Pui-Wa A1 - Chen, Shu-Ying A1 - Yu, Lan AB -

Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed.

VL - 43 UR - http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x ER - TY - CHAP T1 - Computer-based testing T2 - Handbook of multimethod measurement in psychology Y1 - 2006 A1 - F Drasgow A1 - Chuah, S. C. KW - Adaptive Testing computerized adaptive testing KW - Computer Assisted Testing KW - Experimentation KW - Psychometrics KW - Theories AB - (From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA ) JF - Handbook of multimethod measurement in psychology PB - American Psychological Association CY - Washington D.C. USA VL - xiv N1 - Using Smart Source ParsingHandbook of multimethod measurement in psychology. (pp. 87-100). Washington, DC : American Psychological Association, [URL:http://www.apa.org/books]. xiv, 553 pp ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2006 A1 - Haley, S. M. A1 - Siebens, H. A1 - Coster, W. J. A1 - Tao, W. A1 - Black-Schaffer, R. M. A1 - Gandek, B. A1 - Sinclair, S. J. A1 - Ni, P. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. VL - 87 SN - 0003-9993 (Print) N1 - Haley, Stephen MSiebens, HilaryCoster, Wendy JTao, WeiBlack-Schaffer, Randie MGandek, BarbaraSinclair, Samuel JNi, PengshengK0245354-01/phsR01 hd043568/hd/nichdResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2006 Aug;87(8):1033-42. ER - TY - CONF T1 - Constraints-weighted information method for item selection of severely constrained computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2006 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Wang, X. B. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - JOUR T1 - Expansion of a physical function item bank and development of an abbreviated form for clinical research JF - Journal of Applied Measurement Y1 - 2006 A1 - Bode, R. K. A1 - Lai, J-S. A1 - Dineen, K. A1 - Heinemann, A. W. A1 - Shevrin, D. A1 - Von Roenn, J. A1 - Cella, D. KW - clinical research KW - computerized adaptive testing KW - performance levels KW - physical function item bank KW - Psychometrics KW - test reliability KW - Test Validity AB - We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Richard M Smith: US VL - 7 SN - 1529-7713 (Print) ER - TY - JOUR T1 - Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue JF - Quality of Life Research Y1 - 2006 A1 - Lai, J-S. A1 - Crane, P. K. A1 - Cella, D. KW - *Factor Analysis, Statistical KW - *Quality of Life KW - Aged KW - Chicago KW - Fatigue/*etiology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/*complications KW - Questionnaires AB - BACKGROUND: Fatigue is the most common unrelieved symptom experienced by people with cancer. The purpose of this study was to examine whether cancer-related fatigue (CRF) can be summarized using a single score, that is, whether CRF is sufficiently unidimensional for measurement approaches that require or assume unidimensionality. We evaluated this question using factor analysis techniques including the theory-driven bi-factor model. METHODS: Five hundred and fifty five cancer patients from the Chicago metropolitan area completed a 72-item fatigue item bank, covering a range of fatigue-related concerns including intensity, frequency and interference with physical, mental, and social activities. Dimensionality was assessed using exploratory and confirmatory factor analysis (CFA) techniques. RESULTS: Exploratory factor analysis (EFA) techniques identified from 1 to 17 factors. The bi-factor model suggested that CRF was sufficiently unidimensional. CONCLUSIONS: CRF can be considered sufficiently unidimensional for applications that require unidimensionality. One such application, item response theory (IRT), will facilitate the development of short-form and computer-adaptive testing. This may further enable practical and accurate clinical assessment of CRF. VL - 15 N1 - 0962-9343 (Print)Journal ArticleResearch Support, N.I.H., Extramural ER - TY - JOUR T1 - How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation JF - Applied Measurement in Education Y1 - 2006 A1 - Chuah, Siang Chee A1 - F Drasgow A1 - Luecht, Richard VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_5 ER - TY - JOUR T1 - Item banks and their potential applications to health status assessment in diverse populations JF - Medical Care Y1 - 2006 A1 - Hahn, E. A. A1 - Cella, D. A1 - Bode, R. K. A1 - Gershon, R. C. A1 - Lai, J. S. AB - In the context of an ethnically diverse, aging society, attention is increasingly turning to health-related quality of life measurement to evaluate healthcare and treatment options for chronic diseases. When evaluating and treating symptoms and concerns such as fatigue, pain, or physical function, reliable and accurate assessment is a priority. Modern psychometric methods have enabled us to move from long, static tests that provide inefficient and often inaccurate assessment of individual patients, to computerized adaptive tests (CATs) that can precisely measure individuals on health domains of interest. These modern methods, collectively referred to as item response theory (IRT), can produce calibrated "item banks" from larger pools of questions. From these banks, CATs can be conducted on individuals to produce their scores on selected domains. Item banks allow for comparison of patients across different question sets because the patient's score is expressed on a common scale. Other advantages of using item banks include flexibility in terms of the degree of precision desired; interval measurement properties under most circumstances; realistic capability for accurate individual assessment over time (using CAT); and measurement equivalence across different patient populations. This work summarizes the process used in the creation and evaluation of item banks and reviews their potential contributions and limitations regarding outcome assessment and patient care, particularly when they are applied across people of different cultural backgrounds. VL - 44 N1 - 0025-7079 (Print)Journal ArticleResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov't ER - TY - JOUR T1 - Multistage Testing: Widely or Narrowly Applicable? JF - Applied Measurement in Education Y1 - 2006 A1 - Stark, Stephen A1 - Chernyshenko, Oleksandr S. VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_6 ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Cook, K. F. A1 - Mioduski, J. E. A1 - Teal, C. R. A1 - Crane, P. K. KW - computerized adaptive testing KW - Flexilevel Scale of Shoulder Function KW - Item Response Theory KW - Rehabilitation AB -

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

VL - 59 IS - 3 ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Cook, K. F. A1 - Mioduski, J. E. A1 - Teal, C. R. A1 - Crane, P. K. KW - *Computer Simulation KW - *Range of Motion, Articular KW - Activities of Daily Living KW - Adult KW - Aged KW - Aged, 80 and over KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Prospective Studies KW - Reproducibility of Results KW - Research Support, N.I.H., Extramural KW - Research Support, U.S. Gov't, Non-P.H.S. KW - Shoulder Dislocation/*physiopathology/psychology/rehabilitation KW - Shoulder Pain/*physiopathology/psychology/rehabilitation KW - Shoulder/*physiopathology KW - Sickness Impact Profile KW - Treatment Outcome AB - BACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability. VL - 59 N1 - 0895-4356 (Print)Journal ArticleValidation Studies ER - TY - JOUR T1 - Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. M. A1 - Raczek, A. E. A1 - Coster, W. J. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Computer Simulation KW - *Disability Evaluation KW - Adolescent KW - Child KW - Child, Preschool KW - Cross-Sectional Studies KW - Disabled Children/*rehabilitation KW - Female KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care)/*methods KW - Rehabilitation Centers KW - Rehabilitation/*standards KW - Sensitivity and Specificity AB - OBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time. VL - 86 SN - 0003-9993 (Print) N1 - Haley, Stephen MRaczek, Anastasia ECoster, Wendy JDumas, Helene MFragala-Pinkham, Maria AK02 hd45354-01a1/hd/nichdR43 hd42388-01/hd/nichdResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, P.H.S.United StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2005 May;86(5):932-9. ER - TY - JOUR T1 - Assessing Mobility in Children Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability Inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. A1 - Raczek, A. A1 - Coster, W. A1 - Dumas, H. A1 - Fragalapinkham, M. VL - 86 SN - 00039993 ER - TY - JOUR T1 - An Authoring Environment for Adaptive Testing JF - Educational Technology & Society Y1 - 2005 A1 - Guzmán, E A1 - Conejo, R A1 - García-Hervás, E KW - Adaptability KW - Adaptive Testing KW - Authoring environment KW - Item Response Theory AB -

SIETTE is a web-based adaptive testing system. It implements Computerized Adaptive Tests. These tests are tailor-made, theory-based tests, where questions shown to students, finalization of the test, and student knowledge estimation is accomplished adaptively. To construct these tests, SIETTE has an authoring environment comprising a suite of tools that helps teachers create questions and tests properly, and analyze students’ performance after taking a test. In this paper, we present this authoring environment in the
framework of adaptive testing. As will be shown, this set of visual tools, that contain some adaptable eatures, can be useful for teachers lacking skills in this kind of testing. Additionally, other systems that implement adaptive testing will be studied. 

VL - 8 IS - 3 ER - TY - JOUR T1 - A closer look at using judgments of item difficulty to change answers on computerized adaptive tests JF - Journal of Educational Measurement Y1 - 2005 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. VL - 42 ER - TY - JOUR T1 - A computer adaptive testing approach for assessing physical functioning in children and adolescents JF - Developmental Medicine and Child Neuropsychology Y1 - 2005 A1 - Haley, S. M. A1 - Ni, P. A1 - Fragala-Pinkham, M. A. A1 - Skrinar, A. M. A1 - Corzo, D. KW - *Computer Systems KW - Activities of Daily Living KW - Adolescent KW - Age Factors KW - Child KW - Child Development/*physiology KW - Child, Preschool KW - Computer Simulation KW - Confidence Intervals KW - Demography KW - Female KW - Glycogen Storage Disease Type II/physiopathology KW - Health Status Indicators KW - Humans KW - Infant KW - Infant, Newborn KW - Male KW - Motor Activity/*physiology KW - Outcome Assessment (Health Care)/*methods KW - Reproducibility of Results KW - Self Care KW - Sensitivity and Specificity AB - The purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning. VL - 47 SN - 0012-1622 (Print) N1 - Haley, Stephen MNi, PengshengFragala-Pinkham, Maria ASkrinar, Alison MCorzo, DeyaniraComparative StudyResearch Support, Non-U.S. Gov'tEnglandDevelopmental medicine and child neurologyDev Med Child Neurol. 2005 Feb;47(2):113-20. ER - TY - JOUR T1 - Computerized adaptive testing: a mixture item selection approach for constrained situations JF - British Journal of Mathematical and Statistical Psychology Y1 - 2005 A1 - Leung, C. K. A1 - Chang, Hua-Hua A1 - Hau, K. T. KW - *Computer-Aided Design KW - *Educational Measurement/methods KW - *Models, Psychological KW - Humans KW - Psychometrics/methods AB - In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (theta) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item-overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non-statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS-MI), integrating the MS and MI approaches, in testing with non-statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS-MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS-MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed. VL - 58 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Leung, Chi-KeungChang, Hua-HuaHau, Kit-TaiEnglandBr J Math Stat Psychol. 2005 Nov;58(Pt 2):239-57. ER - TY - JOUR T1 - Controlling item exposure and test overlap in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2005 A1 - Chen, S-Y. A1 - Lei, P-W. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Content (Test) computerized adaptive testing AB - This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 29 ER - TY - JOUR T1 - Controlling Item Exposure and Test Overlap in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2005 A1 - Chen, Shu-Ying A1 - Lei, Pui-Wa AB -

This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent.

VL - 29 UR - http://apm.sagepub.com/content/29/3/204.abstract ER - TY - JOUR T1 - Controlling item exposure and test overlap in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2005 A1 - Chen, S.Y. A1 - Lei, P. W. VL - 29(2) ER - TY - JOUR T1 - Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer JF - Evaluation and the Health Professions Y1 - 2005 A1 - Eton, D. T. A1 - Lai, J. S. A1 - Cella, D. A1 - Reeve, B. B. A1 - Talcott, J. A. A1 - Clark, J. A. A1 - McPherson, C. P. A1 - Litwin, M. S. A1 - Moinpour, C. M. KW - *Quality of Life KW - *Questionnaires KW - Adult KW - Aged KW - Data Collection/methods KW - Humans KW - Intestine, Large/*physiopathology KW - Male KW - Middle Aged KW - Prostatic Neoplasms/*physiopathology KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Statistics, Nonparametric AB - Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. VL - 28 N1 - 0163-2787 (Print)Journal Article ER - TY - JOUR T1 - Dynamic assessment of health outcomes: Time to let the CAT out of the bag? JF - Health Services Research Y1 - 2005 A1 - Cook, K. F. A1 - O'Malley, K. J. A1 - Roddey, T. S. KW - computer adaptive testing KW - Item Response Theory KW - self reported health outcomes AB - Background: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 40 SN - 0017-9124 (Print); 1475-6773 (Electronic) ER - TY - CONF T1 - The effectiveness of using multiple item pools in computerized adaptive testing T2 - Annual meeting of the National Council on Measurement in Education Y1 - 2005 A1 - Zhang, J. A1 - Chang, H. JF - Annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Identifying practical indices for enhancing item pool security T2 - Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) Y1 - 2005 A1 - Yi, Q. A1 - Zhang, J. A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) CY - Montreal, Canada ER - TY - ABST T1 - Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach Y1 - 2005 A1 - van der Linden, W. J. A1 - Chang, Hua-Hua CY - Law School Admission Council, Computerized Testing Report 01-09 ER - TY - JOUR T1 - An item bank was created to improve the measurement of cancer-related fatigue JF - Journal of Clinical Epidemiology Y1 - 2005 A1 - Lai, J-S. A1 - Cella, D. A1 - Dineen, K. A1 - Bode, R. A1 - Von Roenn, J. A1 - Gershon, R. C. A1 - Shevrin, D. KW - Adult KW - Aged KW - Aged, 80 and over KW - Factor Analysis, Statistical KW - Fatigue/*etiology/psychology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/*complications/psychology KW - Psychometrics KW - Questionnaires AB - OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. VL - 58 SN - 0895-4356 (Print)0895-4356 (Linking) N1 - Lai, Jin-SheiCella, DavidDineen, KellyBode, RitaVon Roenn, JamieGershon, Richard CShevrin, DanielEnglandJ Clin Epidemiol. 2005 Feb;58(2):190-7. ER - TY - JOUR T1 - An item response theory-based pain item bank can enhance measurement precision JF - Journal of Pain and Symptom Management Y1 - 2005 A1 - Lai, J-S. A1 - Dineen, K. A1 - Reeve, B. B. A1 - Von Roenn, J. A1 - Shervin, D. A1 - McGuire, M. A1 - Bode, R. K. A1 - Paice, J. A1 - Cella, D. KW - computerized adaptive testing AB - Cancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice. VL - 30 N1 - 0885-3924Journal Article ER - TY - JOUR T1 - Measuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach JF - American Journal of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Siebens, H. A1 - Andres, P. L. A1 - Pengsheng, N. A1 - Coster, W. J. A1 - Haley, S. M. KW - Activities of Daily Living/*classification KW - Adult KW - Aged KW - Cohort Studies KW - Continuity of Patient Care KW - Disability Evaluation KW - Female KW - Health Services Research KW - Humans KW - Male KW - Middle Aged KW - Postoperative Care/*rehabilitation KW - Prognosis KW - Recovery of Function KW - Rehabilitation Centers KW - Rehabilitation/*standards KW - Sensitivity and Specificity KW - Sickness Impact Profile KW - Treatment Outcome AB - OBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions. VL - 84 N1 - 0894-9115 (Print)Comparative StudyJournal ArticleResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, P.H.S. ER - TY - JOUR T1 - The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes JF - Clinical and Experimental Rheumatology Y1 - 2005 A1 - Fries, J.F. A1 - Bruce, B. A1 - Cella, D. KW - computerized adaptive testing AB - PROMIS (Patient-Reported-Outcomes Measurement Information System) is an NIH Roadmap network project intended to improve the reliability, validity, and precision of PROs and to provide definitive new instruments that will exceed the capabilities of classic instruments and enable improved outcome measurement for clinical research across all NIH institutes. Item response theory (IRT) measurement models now permit us to transition conventional health status assessment into an era of item banking and computerized adaptive testing (CAT). Item banking uses IRT measurement models and methods to develop item banks from large pools of items from many available questionnaires. IRT allows the reduction and improvement of items and assembles domains of items which are unidimensional and not excessively redundant. CAT provides a model-driven algorithm and software to iteratively select the most informative remaining item in a domain until a desired degree of precision is obtained. Through these approaches the number of patients required for a clinical trial may be reduced while holding statistical power constant. PROMIS tools, expected to improve precision and enable assessment at the individual patient level which should broaden the appeal of PROs, will begin to be available to the general medical community in 2008. VL - 23 ER - TY - CONF T1 - Rescuing CAT by fixing the problems T2 - National Council on Measurement in Education Y1 - 2005 A1 - Chang, S-H. A1 - Zhang, J. JF - National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire JF - Alcoholism: Clinical & Experimental Research Y1 - 2005 A1 - Kahler, C. W. A1 - Strong, D. R. A1 - Read, J. P. A1 - De Boeck, P. A1 - Wilson, M. A1 - Acton, G. S. A1 - Palfai, T. P. A1 - Wood, M. D. A1 - Mehta, P. D. A1 - Neale, M. C. A1 - Flay, B. R. A1 - Conklin, C. A. A1 - Clayton, R. R. A1 - Tiffany, S. T. A1 - Shiffman, S. A1 - Krueger, R. F. A1 - Nichol, P. E. A1 - Hicks, B. M. A1 - Markon, K. E. A1 - Patrick, C. J. A1 - Iacono, William G. A1 - McGue, Matt A1 - Langenbucher, J. W. A1 - Labouvie, E. A1 - Martin, C. S. A1 - Sanjuan, P. M. A1 - Bavly, L. A1 - Kirisci, L. A1 - Chung, T. A1 - Vanyukov, M. A1 - Dunn, M. A1 - Tarter, R. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. A1 - Watt, M. KW - Psychometrics KW - Substance-Related Disorders AB - Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association VL - 29 N1 - MiscellaneousArticleMiscellaneous Article ER - TY - JOUR T1 - Validation of a computerized adaptive testing version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) JF - Psychological Assessment Y1 - 2005 A1 - Simms, L. J., A1 - Clark, L. A. AB - This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting. VL - 17(1) ER - TY - JOUR T1 - Validation of a computerized adaptive version of the Schedule of Non-Adaptive and Adaptive Personality (SNAP) JF - Psychological Assessment Y1 - 2005 A1 - Simms, L. J. A1 - Clark, L.J. AB - This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting. VL - 17 ER - TY - JOUR T1 - Activity outcome measurement for postacute care JF - Medical Care Y1 - 2004 A1 - Haley, S. M. A1 - Coster, W. J. A1 - Andres, P. L. A1 - Ludlow, L. H. A1 - Ni, P. A1 - Bond, T. L. A1 - Sinclair, S. J. A1 - Jette, A. M. KW - *Self Efficacy KW - *Sickness Impact Profile KW - Activities of Daily Living/*classification/psychology KW - Adult KW - Aftercare/*standards/statistics & numerical data KW - Aged KW - Boston KW - Cognition/physiology KW - Disability Evaluation KW - Factor Analysis, Statistical KW - Female KW - Human KW - Male KW - Middle Aged KW - Movement/physiology KW - Outcome Assessment (Health Care)/*methods/statistics & numerical data KW - Psychometrics KW - Questionnaires/standards KW - Rehabilitation/*standards/statistics & numerical data KW - Reproducibility of Results KW - Sensitivity and Specificity KW - Support, U.S. Gov't, Non-P.H.S. KW - Support, U.S. Gov't, P.H.S. AB - BACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings. VL - 42 N1 - 0025-7079Journal ArticleMulticenter Study ER - TY - CONF T1 - Combining computer adaptive testing technology with cognitively diagnostic assessment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - McGlohen, MK A1 - Chang, Hua-Hua A1 - Wills, J. T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - {PDF file, 782 KB} ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory Y1 - 2004 A1 - Chang, C-H. JF - Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory CY - Item Banks, and Computer-Adaptive Testing,” Bethesda MD N1 - {PDF file, 181 KB} ER - TY - JOUR T1 - Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2004 A1 - Chen, S-Y. A1 - Ankenmann, R. D. KW - computerized adaptive testing KW - item selection rules KW - practical constraints AB - The purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Blackwell Publishing: United Kingdom VL - 41 SN - 0022-0655 (Print) ER - TY - JOUR T1 - ffects of practical constraints on item selection rules at the early stages of computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2004 A1 - Chen, Y.-Y. A1 - Ankenmann, R. D. VL - 41 ER - TY - JOUR T1 - Implementation and Measurement Efficiency of Multidimensional Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2004 A1 - Wang, Wen-Chung A1 - Chen, Po-Hsi AB -

Multidimensional adaptive testing (MAT) procedures are proposed for the measurement of several latent traits by a single examination. Bayesian latent trait estimation and adaptive item selection are derived. Simulations were conducted to compare the measurement efficiency of MAT with those of unidimensional adaptive testing and random administration. The results showed that the higher the correlation between latent traits, the more latent traits there were, and the more scoring levels there were in the items, the more efficient MAT was than the other two procedures. For tests containing multidimensional items, only MAT is applicable, whereas unidimensional adaptive testing is not. Issues in implementing MAT are discussed.

VL - 28 UR - http://apm.sagepub.com/content/28/5/295.abstract ER - TY - CONF T1 - Item parameter recovery with adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Do, B.-R. A1 - Chuah, S. C. A1 - F Drasgow JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - #DO04-01 {PDF file, 379 KB} ER - TY - CONF T1 - Protecting the integrity of computer-adaptive licensure tests: Results of a legal challenge T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2004 A1 - Cizek, G. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Diego CA N1 - {PDF file, 191 KB} ER - TY - JOUR T1 - Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain JF - Medical Care Y1 - 2004 A1 - Coster, W. J. A1 - Haley, S. M. A1 - Andres, P. L. A1 - Ludlow, L. H. A1 - Bond, T. L. A1 - Ni, P. S. KW - *Self Efficacy KW - *Sickness Impact Profile KW - Activities of Daily Living/*classification/psychology KW - Adult KW - Aged KW - Aged, 80 and over KW - Disability Evaluation KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods/statistics & numerical data KW - Questionnaires/*standards KW - Recovery of Function/physiology KW - Rehabilitation/*standards/statistics & numerical data KW - Reproducibility of Results KW - Research Support, U.S. Gov't, Non-P.H.S. KW - Research Support, U.S. Gov't, P.H.S. KW - Sensitivity and Specificity AB - BACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches. VL - 42 N1 - 0025-7079Journal Article ER - TY - JOUR T1 - Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care JF - Archives of Physical Medicine and Rehabilitation Y1 - 2004 A1 - Haley, S. M. A1 - Coster, W. J. A1 - Andres, P. L. A1 - Kosinski, M. A1 - Ni, P. KW - Boston KW - Factor Analysis, Statistical KW - Humans KW - Outcome Assessment (Health Care)/*methods KW - Prospective Studies KW - Questionnaires/standards KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals. VL - 85 SN - 0003-9993 (Print) N1 - Haley, Stephen MCoster, Wendy JAndres, Patricia LKosinski, MarkNi, PengshengR01 hd43568/hd/nichdComparative StudyMulticenter StudyResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.United StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2004 Apr;85(4):661-6. ER - TY - JOUR T1 - Sequential estimation in variable length computerized adaptive testing JF - Journal of Statistical Planning and Inference Y1 - 2004 A1 - Chang, I. Y. AB - With the advent of modern computer technology, there have been growing e3orts in recent years to computerize standardized tests, including the popular Graduate Record Examination (GRE), the Graduate Management Admission Test (GMAT) and the Test of English as a Foreign Language (TOEFL). Many of such computer-based tests are known as the computerized adaptive tests, a major feature of which is that, depending on their performance in the course of testing, di3erent examinees may be given with di3erent sets of items (questions). In doing so, items can be e>ciently utilized to yield maximum accuracy for estimation of examinees’ ability traits. We consider, in this article, one type of such tests where test lengths vary with examinees to yield approximately same predetermined accuracy for all ability traits. A comprehensive large sample theory is developed for the expected test length and the sequential point and interval estimates of the latent trait. Extensive simulations are conducted with results showing that the large sample approximations are adequate for realistic sample sizes. VL - 121 SN - 03783758 ER - TY - JOUR T1 - Siette: a web-based tool for adaptive testing JF - International Journal of Artificial Intelligence in Education Y1 - 2004 A1 - Conejo, R A1 - Guzmán, E A1 - Millán, E A1 - Trella, M A1 - Pérez-De-La-Cruz, JL A1 - Ríos, A KW - computerized adaptive testing VL - 14 ER - TY - CHAP T1 - Understanding computerized adaptive testing: From Robbins-Munro to Lord and beyond Y1 - 2004 A1 - Chang, Hua-Hua CY - D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 117-133). New York: Sage. ER - TY - JOUR T1 - Alpha-stratified adaptive testing with large numbers of content constraints JF - Applied Psychological Measurement Y1 - 2003 A1 - van der Linden, W. J. A1 - Chang, Hua-Hua VL - 27 ER - TY - CONF T1 - Assessing CAT security breaches by the item pooling index T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Chang, Hua-Hua A1 - Zhang, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - a-Stratified multistage CAT design with content-blocking JF - British Journal of Mathematical and Statistical Psychology Y1 - 2003 A1 - Yi, Q. A1 - Chang, H.-H. VL - 56 ER - TY - JOUR T1 - Can an item response theory-based pain item bank enhance measurement precision? JF - Clinical Therapeutics Y1 - 2003 A1 - Lai, J-S. A1 - Dineen, K. A1 - Cella, D. A1 - Von Roenn, J. VL - 25 JO - Clin Ther ER - TY - JOUR T1 - A comparative study of item exposure control methods in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chang, S-W. A1 - Ansley, T. N. KW - Adaptive Testing KW - Computer Assisted Testing KW - Educational KW - Item Analysis (Statistical) KW - Measurement KW - Strategies computerized adaptive testing AB - This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA ) VL - 40 ER - TY - CONF T1 - Computerized adaptive testing: A comparison of three content balancing methods T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. A1 - Wen. Z. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 227 KB} ER - TY - JOUR T1 - Computerized adaptive testing: A comparison of three content balancing methods JF - The Journal of Technology, Learning and Assessment Y1 - 2003 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. AB - Content balancing is often a practical consideration in the design of computerized adaptive testing (CAT). This study compared three content balancing methods, namely, the constrained CAT (CCAT), the modified constrained CAT (MCCAT), and the modified multinomial model (MMM), under various conditions of test length and target maximum exposure rate. Results of a series of simulation studies indicate that there is no systematic effect of content balancing method in measurement efficiency and pool utilization. However, among the three methods, the MMM appears to consistently over-expose fewer items. VL - 2 ER - TY - JOUR T1 - Computerized adaptive testing using the nearest-neighbors criterion JF - Applied Psychological Measurement Y1 - 2003 A1 - Cheng, P. E. A1 - Liou, M. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis KW - Item Response Theory KW - Statistical Analysis KW - Statistical Estimation computerized adaptive testing KW - Statistical Tests AB - Item selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - JOUR T1 - Computerized adaptive testing using the nearest-neighbors criterion JF - Applied Psychological Measurement Y1 - 2003 A1 - Cheng, P. E. A1 - Liou, M. VL - 27 ER - TY - JOUR T1 - Developing an initial physical function item bank from existing sources JF - Journal of Applied Measurement Y1 - 2003 A1 - Bode, R. K. A1 - Cella, D. A1 - Lai, J. S. A1 - Heinemann, A. W. KW - *Databases KW - *Sickness Impact Profile KW - Adaptation, Psychological KW - Data Collection KW - Humans KW - Neoplasms/*physiopathology/psychology/therapy KW - Psychometrics KW - Quality of Life/*psychology KW - Research Support, U.S. Gov't, P.H.S. KW - United States AB - The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed. VL - 4 N1 - 1529-7713Journal Article ER - TY - JOUR T1 - Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function (FLEX-SF) JF - Medical Care (in press) Y1 - 2003 A1 - Cook, K. F. A1 - Roddey, T. S. A1 - Gartsman, G M A1 - Olson, S L N1 - #CO03-01 ER - TY - ABST T1 - Effect of extra time on GRE® Quantitative and Verbal Scores (Research Report 03-13) Y1 - 2003 A1 - Bridgeman, B. A1 - Cline, F. A1 - Hessinger, J. CY - Princeton NJ: Educational Testing service N1 - {PDF file, 88 KB} ER - TY - JOUR T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model JF - Journal of Applied Measurement Y1 - 2003 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. J. KW - *Computers KW - *Educational Measurement KW - *Models, Theoretical KW - Automation KW - Decision Making KW - Humans KW - Reproducibility of Results AB - The purpose of the present investigation was to systematically examine the effectiveness of the Sympson-Hetter technique and rotated content balancing relative to no exposure control and no content rotation conditions in a computerized adaptive testing system (CAT) based on the partial credit model. A series of simulated fixed and variable length CATs were run using two data sets generated to multiple content areas for three sizes of item pools. The 2 (exposure control) X 2 (content rotation) X 2 (test length) X 3 (item pool size) X 2 (data sets) yielded a total of 48 conditions. Results show that while both procedures can be used with no deleterious effect on measurement precision, the gains in exposure control, pool utilization, and item overlap appear quite modest. Difficulties involved with setting the exposure control parameters in small item pools make questionable the utility of the Sympson-Hetter technique with similar item pools. VL - 4 N1 - 1529-7713Journal Article ER - TY - JOUR T1 - Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach JF - Applied Psychological Measurement Y1 - 2003 A1 - van der Linden, W. J. A1 - Chang, Hua-Hua VL - 27 ER - TY - JOUR T1 - Incorporation of Content Balancing Requirements in Stratification Designs for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2003 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. KW - computerized adaptive testing AB - Studied three stratification designs for computerized adaptive testing in conjunction with three well-developed content balancing methods. Simulation study results show substantial differences in item overlap rate and pool utilization among different methods. Recommends an optimal combination of stratification design and content balancing method. (SLD) VL - 63 ER - TY - JOUR T1 - Incorporation Of Content Balancing Requirements In Stratification Designs For Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2003 A1 - Leung, Chi-Keung A1 - Chang, Hua-Hua A1 - Hau, Kit-Tai AB -

In computerized adaptive testing, the multistage a-stratified design advocates a new philosophy on pool management and item selection in which, contradictory to common practice, less discriminating items are used first. The method is effective in reducing item-overlap rate and enhancing pool utilization. This stratification method has been extended in different ways to deal with the practical issues of content constraints and the positive correlation between item difficulty and discrimination. Nevertheless, these modified designs on their own do not automatically satisfy content requirements. In this study, three stratification designs were examined in conjunction with three well developed content balancing methods. The performance of each of these nine combinational methods was evaluated in terms of their item security, measurement efficiency, and pool utilization. Results showed substantial differences in item-overlap rate and pool utilization among different methods. An optimal combination of stratification design and content balancing method is recommended.

VL - 63 UR - http://epm.sagepub.com/content/63/2/257.abstract ER - TY - JOUR T1 - Item banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale JF - Quality of Life Research Y1 - 2003 A1 - Lai, J-S. A1 - Crane, P. K. A1 - Cella, D. A1 - Chang, C-H. A1 - Bode, R. K. A1 - Heinemann, A. W. KW - *Health Status Indicators KW - *Questionnaires KW - Adult KW - Fatigue/*diagnosis/etiology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/complications KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Research Support, U.S. Gov't, P.H.S. KW - Sickness Impact Profile AB - Fatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum. VL - 12 N1 - 0962-9343Journal Article ER - TY - JOUR T1 - Optimal stratification of item pools in α-stratified computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - Chang, Hua-Hua A1 - van der Linden, W. J. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Content (Test) KW - Item Response Theory KW - Mathematical Modeling KW - Test Construction computerized adaptive testing AB - A method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - CONF T1 - Predicting item exposure parameters in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Chen, S-Y. A1 - Doong, H. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 239 KB} ER - TY - JOUR T1 - The relationship between item exposure and test overlap in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chen, S. A1 - Ankenmann, R. D. A1 - Spray, J. A. VL - 40 ER - TY - JOUR T1 - The relationship between item exposure and test overlap in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chen, S-Y. A1 - Ankemann, R. D. A1 - Spray, J. A. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Human Computer KW - Interaction computerized adaptive testing KW - Item Analysis KW - Item Analysis (Test) KW - Test Items AB - The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA ) VL - 40 ER - TY - JOUR T1 - The relationship between item exposure and test overlap in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chen, S. A1 - Ankenmann, R. D. A1 - Spray, J. A. VL - 40 ER - TY - CONF T1 - A simulation study to compare CAT strategies for cognitive diagnosis T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Xu, X. A1 - Chang, Hua-Hua A1 - Douglas, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 250 KB} ER - TY - CONF T1 - Test-score comparability, ability estimation, and item-exposure control in computerized adaptive testing T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - Advances in quality of life measurements in oncology patients JF - Seminars in Oncology Y1 - 2002 A1 - Cella, D. A1 - Chang, C-H. A1 - Lai, J. S. A1 - Webster, K. KW - *Quality of Life KW - *Sickness Impact Profile KW - Cross-Cultural Comparison KW - Culture KW - Humans KW - Language KW - Neoplasms/*physiopathology KW - Questionnaires AB - Accurate assessment of the quality of life (QOL) of patients can provide important clinical information to physicians, especially in the area of oncology. Changes in QOL are important indicators of the impact of a new cytotoxic therapy, can affect a patient's willingness to continue treatment, and may aid in defining response in the absence of quantifiable endpoints such as tumor regression. Because QOL is becoming an increasingly important aspect in the management of patients with malignant disease, it is vital that the instruments used to measure QOL are reliable and accurate. Assessment of QOL involves a multidimensional approach that includes physical, functional, social, and emotional well-being, and the most comprehensive instruments measure at least three of these domains. Instruments to measure QOL can be generic (eg, the Nottingham Health Profile), targeted toward specific illnesses (eg, Functional Assessment of Cancer Therapy - Lung), or be a combination of generic and targeted. Two of the most widely used examples of the combination, or hybrid, instruments are the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 Items and the Functional Assessment of Chronic Illness Therapy. A consequence of the increasing international collaboration in clinical trials has been the growing necessity for instruments that are valid across languages and cultures. To assure the continuing reliability and validity of QOL instruments in this regard, item response theory can be applied. Techniques such as item response theory may be used in the future to construct QOL item banks containing large sets of validated questions that represent various levels of QOL domains. As QOL becomes increasingly important in understanding and approaching the overall management of cancer patients, the tools available to clinicians and researchers to assess QOL will continue to evolve. While the instruments currently available provide reliable and valid measurement, further improvements in precision and application are anticipated. VL - 29 N1 - 0093-7754 (Print)Journal ArticleReview ER - TY - JOUR T1 - Applicable adaptive testing models for school teachers JF - Educational Media International Y1 - 2002 A1 - Chang-Hwa, W. A. A1 - Chuang, C-L. AB - The purpose of this study was to investigate the attitudinal effects on SPRT adaptive testing environment for junior high school students. Subjects were 39 eighth graders from a selected junior high school. Major instruments for the study were the Junior High School Natural Sciences Adaptive Testing System driven by the SPRT algorithm, and a self-developed attitudinal questionnaire, factors examined include: test anxiety, examinee preference, adaptability of the test, and acceptance of the test result. The major findings were that overall, junior high school students" attitudes towards computerized adaptive tests were positive, no significant correlations existed between test attitude and the test length. The results indicated that junior high school students generally have positive attitudes towards adaptive testing.Modèles de tests d"adaptation à l"usage des enseignants. L"objectif de cette étude était d"enquêter sur les effets causés par une passation de tests d"adaptation ( selon l"algorithme "Sequential Probability Radio Test " (SPRT) ) dans une classe de trente-neuf élèves de huitième année du secondaire inférieur. Les principaux instruments utilisés ont été ceux du système de tests d"adaptation (avec le SPRT) et destiné aux classes de sciences naturelles du degré secondaire inférieur. Un questionnaire d"attitude, développé par nos soins, a également été utilisé pour examiner les facteurs suivants: test d"anxiété, préférence des candidats, adaptabilité du test et acceptation des résultats. Les principales conclusions ont été que, dans l"ensemble, l"attitude des élèves du secondaire inférieur face aux tests d"adaptation informatisés a été positive, aucune corrélation significative existant entre cette attitude et la longueur des tests. Les résultats démontrent aussi que les élèves du secondaire ont une attitude généralement positive envers les tests d"adaptation.Test Modelle zur Anwendung durch Klassenlehrer Zweck dieser Untersuchung war, die Auswirkungen über die Einstellung von Jun. High School Schülern im Zusammenhang mit dem SPRT Testumfeld zu untersuchen. 39 Achtklässler einer Jun. High School nahmen an dem Test teil. Die Untersuchung stützte sich hauptsächlich auf das Jun. High School Natural. Sciences Adaptive Testing System, das auf dem SPRT Rechnungsverfahren basiert sowie einem selbst erstellten Fragebogen mit folgenden Faktoren: Testängste, Präferenzen der Testperson, Geeignetheit des Tests, Anerkennung des Testergebnisses. Es stellte sich heraus, dass die Einstellung der Studenten zu den Computer adaptierten Tests im allgemeinen positiv waren; es ergaben sich keine bedeutsamen Wechselwirkungen zwischen persönlicher Testeinstellung und Testlänge. Die Ergebnisse belegen, dass Jun. High School Schüler im allgemeinen eine positive Haltung zu adaptierten Tests haben. VL - 39 ER - TY - JOUR T1 - Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? JF - Journal of Educational Measurement Y1 - 2002 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. A1 - Hendrickson, A. B. A1 - Ihrig, D. VL - 39 ER - TY - CONF T1 - Comparing three item selection approaches for computerized adaptive testing with content balancing requirement T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 226 KB} ER - TY - JOUR T1 - A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model JF - Applied Psychological Measurement Y1 - 2002 A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chang, Hua-Hua KW - (Statistical) KW - Adaptive Testing KW - Algorithms computerized adaptive testing KW - Computer Assisted Testing KW - Item Analysis KW - Item Response Theory KW - Mathematical Modeling AB - The use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 26 ER - TY - JOUR T1 - Computerised adaptive testing JF - British Journal of Educational Technology Y1 - 2002 A1 - Latu, E. A1 - Chapman, E. KW - computerized adaptive testing AB - Considers the potential of computer adaptive testing (CAT). Discusses the use of CAT instead of traditional paper and pencil tests, identifies decisions that impact the efficacy of CAT, and concludes that CAT is beneficial when used to its full potential on certain types of tests. (LRW) VL - 33 ER - TY - CONF T1 - Fairness issues in adaptive tests with strict time limits T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Bridgeman, B. A1 - Cline, F. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 1.287 MB} ER - TY - JOUR T1 - Feasibility and acceptability of computerized adaptive testing (CAT) for fatigue monitoring in clinical practice JF - Quality of Life Research Y1 - 2002 A1 - Davis, K. M. A1 - Chang, C-H. A1 - Lai, J-S. A1 - Cella, D. VL - 11(7) ER - TY - JOUR T1 - Hypergeometric family and item overlap rates in computerized adaptive testing JF - Psychometrika Y1 - 2002 A1 - Chang, Hua-Hua A1 - Zhang, J. KW - Adaptive Testing KW - Algorithms KW - Computer Assisted Testing KW - Taking KW - Test KW - Time On Task computerized adaptive testing AB - A computerized adaptive test (CAT) is usually administered to small groups of examinees at frequent time intervals. It is often the case that examinees who take the test earlier share information with examinees who will take the test later, thus increasing the risk that many items may become known. Item overlap rate for a group of examinees refers to the number of overlapping items encountered by these examinees divided by the test length. For a specific item pool, different item selection algorithms may yield different item overlap rates. An important issue in designing a good CAT item selection algorithm is to keep item overlap rate below a preset level. In doing so, it is important to investigate what the lowest rate could be for all possible item selection algorithms. In this paper we rigorously prove that if every item had an equal possibility to be selected from the pool in a fixed-length CAT, the number of overlapping item among any α randomly sampled examinees follows the hypergeometric distribution family for α ≥ 1. Thus, the expected values of the number of overlapping items among any randomly sampled α examinee can be calculated precisely. These values may serve as benchmarks in controlling item overlap rates for fixed-length adaptive tests. (PsycINFO Database Record (c) 2005 APA ) VL - 67 ER - TY - CONF T1 - Identify the lower bounds for item sharing and item pooling in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Chang, Hua-Hua A1 - Zhang, J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm JF - Applied Psychological Measurement Y1 - 2002 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. VL - 26 ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm JF - Applied Psychological Measurement Y1 - 2002 A1 - Leung, C. K. A1 - Chang, Hua-Hua A1 - Hau, K. T. VL - 26 SN - 0146-6216 ER - TY - JOUR T1 - Measuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system JF - Archives of Physical Medicine and Rehabilitation Y1 - 2002 A1 - Cella, D. A1 - Nowinski, C. J. KW - *Chronic Disease KW - *Quality of Life KW - *Rehabilitation KW - Adult KW - Comparative Study KW - Health Status Indicators KW - Humans KW - Psychometrics KW - Questionnaires KW - Research Support, U.S. Gov't, P.H.S. KW - Sensitivity and Specificity AB - We focus on quality of life (QOL) measurement as applied to chronic illness. There are 2 major types of health-related quality of life (HRQOL) instruments-generic health status and targeted. Generic instruments offer the opportunity to compare results across patient and population cohorts, and some can provide normative or benchmark data from which to interpret results. Targeted instruments ask questions that focus more on the specific condition or treatment under study and, as a result, tend to be more responsive to clinically important changes than generic instruments. Each type of instrument has a place in the assessment of HRQOL in chronic illness, and consideration of the relative advantages and disadvantages of the 2 options best drives choice of instrument. The Functional Assessment of Chronic Illness Therapy (FACIT) system of HRQOL measurement is a hybrid of the 2 approaches. The FACIT system combines a core general measure with supplemental measures targeted toward specific diseases, conditions, or treatments. Thus, it capitalizes on the strengths of each type of measure. Recently, FACIT questionnaires were administered to a representative sample of the general population with results used to derive FACIT norms. These normative data can be used for benchmarking and to better understand changes in HRQOL that are often seen in clinical trials. Future directions in HRQOL assessment include test equating, item banking, and computerized adaptive testing. VL - 83 N1 - 0003-9993Journal Article ER - TY - CONF T1 - Optimum number of strata in the a-stratified adaptive testing design T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Wen, J.-B. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 114 KB} ER - TY - CONF T1 - Redeveloping the exposure control parameters of CAT items when a pool is modified T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Chang, S-W. A1 - Harris, D. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 1.113 MB} ER - TY - CONF T1 - The robustness of the unidimensional 3PL IRT model when applied to two-dimensional data in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Zhao, J. C. A1 - McMorris, R. F. A1 - Pruzek, R. M. A1 - Chen, R. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - {PDF file, 1.356 MB} ER - TY - CHAP T1 - Test models for complex computer-based testing Y1 - 2002 A1 - Luecht, RM A1 - Clauser, B. E. CY - C. N. Mille,. M. T. Potenza, J. J. Fremer, and W. C. Ward (Eds.). Computer-based testing: Building the foundation for future assessments (pp. 67-88). Hillsdale NJ: Erlbaum. ER - TY - CONF T1 - To weight or not to weight – balancing influence of initial and later items in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 252 KB} ER - TY - CONF T1 - Using judgments of item difficulty to change answers on computerized adaptive vocabulary tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA N1 - #VI02-01 ER - TY - CONF T1 - Adaptation of a-stratified method in variable length computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Wen, J.-B. A1 - Chang, Hua-Hua A1 - Hau, K.-T.  JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 384 KB} ER - TY - JOUR T1 - Assessment in the twenty-first century: A role of computerised adaptive testing in national curriculum subjects JF - Teacher Development Y1 - 2001 A1 - Cowan, P. A1 - Morrison, H. KW - computerized adaptive testing AB - With the investment of large sums of money in new technologies forschools and education authorities and the subsequent training of teachers to integrate Information and Communications Technology (ICT) into their teaching strategies, it is remarkable that the old outdated models of assessment still remain. This article highlights the current problems associated with pen-and paper-testing and offers suggestions for an innovative and new approach to assessment for the twenty-first century. Based on the principle of the 'wise examiner' a computerised adaptive testing system which measures pupils' ability against the levels of the United Kingdom National Curriculum has been developed for use in mathematics. Using constructed response items, pupils are administered a test tailored to their ability with a reliability index of 0.99. Since the software administers maximally informative questions matched to each pupil's current ability estimate, no two pupils will receive the same set of items in the same order therefore removing opportunities for plagarism and teaching to the test. All marking is automated and a journal recording the outcome of the test and highlighting the areas of difficulty for each pupil is available for printing by the teacher. The current prototype of the system can be used on a school's network however the authors envisage a day when Examination Boards or the Qualifications and Assessment Authority (QCA) will administer Government tests from a central server to all United Kingdom schools or testing centres. Results will be issued at the time of testing and opportunities for resits will become more widespr VL - 5 ER - TY - CONF T1 - a-stratified CAT design with content-blocking T2 - Paper presented at the Annual Meeting of the Psychometric Society Y1 - 2001 A1 - Yi, Q. A1 - Chang, Hua-Hua JF - Paper presented at the Annual Meeting of the Psychometric Society CY - King of Prussia, PA N1 - {PDF file, 410 KB} ER - TY - CONF T1 - a-stratified computerized adaptive testing with unequal item exposure across strata T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Deng, H. A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - #DE01-01 ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing with b blocking JF - Applied Psychological Measurement Y1 - 2001 A1 - Chang, Hua-Hua A1 - Qian, J. A1 - Yang, Z. KW - computerized adaptive testing AB - Proposed a refinement, based on the stratification of items developed by D. Weiss (1973), of the computerized adaptive testing item selection procedure of H. Chang and Z. Ying (1999). Simulation studies using an item bank from the Graduate Record Examination show the benefits of the new procedure. (SLD) VL - 25 ER - TY - JOUR T1 - a-Stratified multistage computerized adaptive testing with b blocking JF - Applied Psychological Measurement Y1 - 2001 A1 - Chang, Hua-Hua A1 - Qian, J. A1 - Ying, Z. AB - Chang & Ying’s (1999) computerized adaptive testing item-selection procedure stratifies the item bank according to a parameter values and requires b parameter values to be evenly distributed across all strata. Thus, a and b parameter values must be incorporated into how strata are formed. A refinement is proposed, based on Weiss’ (1973) stratification of items according to b values. Simulation studies using a retired item bank of a Graduate Record Examination test indicate that the new approach improved control of item exposure rates and reduced mean squared errors. VL - 25 SN - 0146-6216 ER - TY - CONF T1 - Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Vispoel, W. P. A1 - Clough, S. J. A1 - Bleiler, T. Hendrickson, A. B. A1 - Ihrig, D. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - #VI01-01 ER - TY - CONF T1 - Deriving a stopping rule for sequential adaptive tests T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Grabovsky, I. A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 111 KB} ER - TY - JOUR T1 - Development of an adaptive multimedia program to collect patient health data JF - American Journal of Preventative Medicine Y1 - 2001 A1 - Sutherland, L. A. A1 - Campbell, M. A1 - Ornstein, K. A1 - Wildemuth, B. A1 - Lobach, D. VL - 21 ER - TY - CONF T1 - Effects of changes in the examinees’ ability distribution on the exposure control methods in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Chang, S-W. A1 - Twu, B.-Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - #CH01-02 {PDF file, 695 KB} ER - TY - CONF T1 - An examination of item selection rules by stratified CAT designs integrated with content balancing methods T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 2001 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 296 KB} ER - TY - JOUR T1 - Final answer? JF - American School Board Journal Y1 - 2001 A1 - Coyle, J. KW - computerized adaptive testing AB - The Northwest Evaluation Association helped an Indiana school district develop a computerized adaptive testing system that was aligned with its curriculum and geared toward measuring individual student growth. Now the district can obtain such information from semester to semester and year to year, get immediate results, and test students on demand. (MLH) VL - 188 ER - TY - ABST T1 - Implementing content constraints in a-stratified adaptive testing using a shadow test approach (Research Report 01-001) Y1 - 2001 A1 - Chang, Hua-Hua A1 - van der Linden, W. J. CY - University of Twente, Department of Educational Measurement and Data Analysis ER - TY - CONF T1 - Integrating stratification and information approaches for multiple constrained CAT T2 - Paper presented at the Annual Meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Leung, C.-I. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the Annual Meeting of the National Council on Measurement in Education CY - Seattle WA N1 - {PDF file, 322 KB} ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Should more discriminating items be used first? JF - Journal of Educational Measurement Y1 - 2001 A1 - Hau, Kit-Tai A1 - Chang, Hua-Hua KW - ability KW - Adaptive Testing KW - Computer Assisted Testing KW - Estimation KW - Statistical KW - Test Items computerized adaptive testing AB - During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA ) VL - 38 ER - TY - JOUR T1 - On maximizing item information and matching difficulty with ability JF - Psychometrika Y1 - 2001 A1 - Bickel, P. A1 - Buyske, S. A1 - Chang, Hua-Hua A1 - Ying, Z. VL - 66 ER - TY - CONF T1 - Measurement efficiency of multidimensional computerized adaptive testing T2 - Paper presented at the annual meeting of the American Psychological Association Y1 - 2001 A1 - Wang, W-C. A1 - Chen, B.-H. JF - Paper presented at the annual meeting of the American Psychological Association CY - San Francisco CA ER - TY - CONF T1 - A new approach to simulation studies in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Chen, S-Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - {PDF file, 251 KB} ER - TY - CONF T1 - On-line Calibration Using PARSCALE Item Specific Prior Method: Changing Test Population and Sample Size T2 - Paper presented at National Council on Measurement in Education Annual Meeting Y1 - 2001 A1 - Guo, F. A1 - Stone, E. A1 - Cruz, D. JF - Paper presented at National Council on Measurement in Education Annual Meeting CY - Seattle, Washington ER - TY - CHAP T1 - Practical issues in setting standards on computerized adaptive tests T2 - Setting performance standards: Concepts, methods, and perspectives Y1 - 2001 A1 - Sireci, S. G. A1 - Clauser, B. E. KW - Adaptive Testing KW - Computer Assisted Testing KW - Performance Tests KW - Testing Methods AB - (From the chapter) Examples of setting standards on computerized adaptive tests (CATs) are hard to find. Some examples of CATs involving performance standards include the registered nurse exam and the Novell systems engineer exam. Although CATs do not require separate standard setting-methods, there are special issues to be addressed by test specialist who set performance standards on CATs. Setting standards on a CAT will typical require modifications on the procedures used with more traditional, fixed-form, paper-and -pencil examinations. The purpose of this chapter is to illustrate why CATs pose special challenges to the standard setter. (PsycINFO Database Record (c) 2005 APA ) JF - Setting performance standards: Concepts, methods, and perspectives PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingSetting performance standards: Concepts, methods, and perspectives. (pp. 355-369). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xiii, 510 pp ER - TY - JOUR T1 - A comparison of item selection rules at the early stages of computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Chen, S-Y. A1 - Ankenmann, R. D. A1 - Chang, Hua-Hua KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis (Test) KW - Statistical Estimation computerized adaptive testing AB - The effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 24 ER - TY - JOUR T1 - A comparison of item selection rules at the early stages of computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Chen, S.Y. A1 - Ankenmann, R. D. A1 - Chang, Hua-Hua VL - 24 ER - TY - CONF T1 - Content balancing in stratified computerized adaptive testing designs T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA N1 - {PDF file, 427 KB} ER - TY - JOUR T1 - Does adaptive testing violate local independence? JF - Psychometrika Y1 - 2000 A1 - Mislevy, R. J. A1 - Chang, Hua-Hua VL - 65 ER - TY - ABST T1 - Estimating item parameters from classical indices for item pool development with a computerized classification test (ACT Research 2000-4) Y1 - 2000 A1 - Chang, C.-Y. A1 - Kalohn, J.C. A1 - Lin, C.-J. A1 - Spray, J. CY - Iowa City IA, ACT, Inc ER - TY - JOUR T1 - Estimation of trait level in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Cheng, P. E. A1 - Liou, M. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis KW - Statistical Estimation computerized adaptive testing AB - Notes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA ) VL - 24 ER - TY - JOUR T1 - ETS finds flaws in the way online GRE rates some students JF - Chronicle of Higher Education Y1 - 2000 A1 - Carlson, S. VL - 47 ER - TY - CONF T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA ER - TY - ABST T1 - Multiple stratification CAT designs with content control Y1 - 2000 A1 - Yi, Q. A1 - Chang, Hua-Hua CY - Unpublished manuscript ER - TY - CONF T1 - Performance of item exposure control methods in computerized adaptive testing: Further explorations T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 2000 A1 - Chang, Hua-Hua A1 - Chang, S. A1 - Ansley JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - New Orleans , LA ER - TY - CONF T1 - Solving complex constraints in a-stratified computerized adaptive testing designs T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, USA N1 - PDF file, 384 K ER - TY - CHAP T1 - Using Bayesian Networks in Computerized Adaptive Tests Y1 - 2000 A1 - Millan, E. A1 - Trella, M A1 - Perez-de-la-Cruz, J.-L. A1 - Conejo, R CY - M. Ortega and J. Bravo (Eds.),Computers and Education in the 21st Century. Kluwer, pp. 217228. ER - TY - CONF T1 - Using constraints to develop and deliver adaptive tests T2 - Paper presented at the Computer-Assisted Testing Conference. Y1 - 2000 A1 - Abdullah, S. C A1 - Cooley, R. E. JF - Paper presented at the Computer-Assisted Testing Conference. N1 - {PDF file, 46 KB} ER - TY - ABST T1 - Variations in mean response times for questions on the computer-adaptive GRE general test: Implications for fair assessment (GRE Board Professional Report No Y1 - 2000 A1 - Bridgeman, B. A1 - Cline, F. CY - 96-20P: Educational Testing Service Research Report 00-7) N1 - #BR00-01 Princeton NJ: Educational Testing Service. ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - Chang, Hua-Hua A1 - Ying, Z. VL - 23 ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - Chang, Hua-Hua A1 - Ying, Z. KW - computerized adaptive testing AB - For computerized adaptive tests (CAT) based on the three-parameter logistic mode it was found that administering items with low discrimination parameter (a) values early in the test and administering those with high a values later was advantageous; the skewness of item exposure distributions was reduced while efficiency was maintain in trait level estimation. Thus, a new multistage adaptive testing approach is proposed that factors a into the item selection process. In this approach, the items in the item bank are stratified into a number of levels based on their a values. The early stages of a test use items with lower as and later stages use items with higher as. At each stage, items are selected according to an optimization criterion from the corresponding level. Simulation studies were performed to compare a-stratified CATs with CATs based on the Sympson-Hetter method for controlling item exposure. Results indicated that this new strategy led to tests that were well-balanced, with respect to item exposure, and efficient. The a-stratified CATs achieved a lower average exposure rate than CATs based on Bayesian or information-based item selection and the Sympson-Hetter method. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 23 N1 - Sage Publications, US ER - TY - JOUR T1 - a-stratified multistage computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1999 A1 - Chang, Hua-Hua A1 - Ying, Z. VL - 23 ER - TY - CONF T1 - An enhanced stratified computerized adaptive testing design T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1999 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Montreal, Canada N1 - {PDF file, 478 KB} ER - TY - ABST T1 - Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing Y1 - 1999 A1 - Chen, S. A1 - Ankenmann, R. D. A1 - Spray, J. A. CY - Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada N1 - (Also ACT Research Report 99-5). (Also presented at American Educational Research Association, 1999) ER - TY - ABST T1 - Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing (ACT Research Report series 99-5) Y1 - 1999 A1 - Chen, S-Y. A1 - Ankenmann, R. D. A1 - Spray, J. A. CY - Iowa City IA: ACT, Inc N1 - (also National Council on Measurement in Education paper, 1999). ER - TY - CONF T1 - Fairness in computer-based testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Gallagher, Aand A1 - Bridgeman, B. A1 - Calahan, C JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada N1 - Fairness in computer-based testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada. ER - TY - CONF T1 - Item selection in computerized adaptive testing: improving the a-stratified design with the Sympson-Hetter algorithm T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 1999 A1 - Leung, C-K.. A1 - Chang, Hua-Hua A1 - Hau, K-T. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - Montreal, CA ER - TY - CONF T1 - Performance of the Sympson-Hetter exposure control algorithm with a polytomous item bank T2 - Paper presented at the annual meeting of American Educational Research Association Y1 - 1999 A1 - Pastor, D. A. A1 - Chiang, C. A1 - Dodd, B. G. A1 - Yockey, R. and JF - Paper presented at the annual meeting of American Educational Research Association CY - Montreal, Canada ER - TY - CONF T1 - The use of linear-on-the-fly testing for TOEFL Reading T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Carey, P. A. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - ABST T1 - WISCAT: Een computergestuurd toetspakket voor rekenen en wiskunde [A computerized test package for arithmetic and mathematics] Y1 - 1999 A1 - Cito. CY - Cito: Arnhem, The Netherlands ER - TY - BOOK T1 - Applications of network flows to computerized adaptive testing Y1 - 1998 A1 - Cordova, M. J. CY - Dissertation, Rutgers Center for Operations Research (RUTCOR), Rutgers University, New Brunswick NJ ER - TY - JOUR T1 - Applications of network flows to computerized adaptive testing JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1998 A1 - Claudio, M. J. C. KW - computerized adaptive testing AB - Recently, the concept of Computerized Adaptive Testing (CAT) has been receiving ever growing attention from the academic community. This is so because of both practical and theoretical considerations. Its practical importance lies in the advantages of CAT over the traditional (perhaps outdated) paper-and-pencil test in terms of time, accuracy, and money. The theoretical interest is sparked by its natural relationship to Item Response Theory (IRT). This dissertation offers a mathematical programming approach which creates a model that generates a CAT that takes care of many questions concerning the test, such as feasibility, accuracy and time of testing, as well as item pool security. The CAT generated is designed to obtain the most information about a single test taker. Several methods for eatimating the examinee's ability, based on the (dichotomous) responses to the items in the test, are also offered here. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 59 ER - TY - JOUR T1 - Bayesian identification of outliers in computerized adaptive testing JF - Journal of the American Statistical Association Y1 - 1998 A1 - Bradlow, E. T. A1 - Weiss, R. E. A1 - Cho, M. AB - We consider the problem of identifying examinees with aberrant response patterns in a computerized adaptive test (CAT). The vec-tor of responses yi of person i from the CAT comprise a multivariate response vector. Multivariate observations may be outlying in manydi erent directions and we characterize speci c directions as corre- sponding to outliers with different interpretations. We develop a class of outlier statistics to identify different types of outliers based on a con-trol chart type methodology. The outlier methodology is adaptable to general longitudinal discrete data structures. We consider several procedures to judge how extreme a particular outlier is. Data from the National Council Licensure EXamination (NCLEX) motivates our development and is used to illustrate the results. VL - 93 ER - TY - CONF T1 - CAT item calibration T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Hsu, Y. A1 - Thompson, T.D. A1 - Chen, W-H. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego ER - TY - ABST T1 - A comparative study of item exposure control methods in computerized adaptive testing Y1 - 1998 A1 - Chang, S-W. A1 - Twu, B.-Y. CY - Research Report Series 98-3, Iowa City: American College Testing. N1 - #CH98-03 ER - TY - BOOK T1 - A comparative study of item exposure control methods in computerized adaptive testing Y1 - 1998 A1 - Chang, S-W. CY - Unpublished doctoral dissertation, University of Iowa , Iowa City IA ER - TY - JOUR T1 - A comparison of maximum likelihood estimation and expected a posteriori estimation in CAT using the partial credit model JF - Educational and Psychological Measurement Y1 - 1998 A1 - Chen, S. A1 - Hou, L. A1 - Dodd, B. G. VL - 58 ER - TY - CONF T1 - A comparison of two methods of controlling item exposure in computerized adaptive testing T2 - Paper presented at the meeting of the American Educational Research Association. San Diego CA. Y1 - 1998 A1 - Tang, L. A1 - Jiang, H. A1 - Chang, Hua-Hua JF - Paper presented at the meeting of the American Educational Research Association. San Diego CA. ER - TY - ABST T1 - Does adaptive testing violate local independence? (Research Report 98-33) Y1 - 1998 A1 - Mislevy, R. J. A1 - Chang, Hua-Hua CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Item selection in computerized adaptive testing: Should more discriminating items be used first? Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA Y1 - 1998 A1 - Hau, K. T. A1 - Chang, Hua-Hua ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Should more discriminating items be used first? JF - Journal of Educational Measurement Y1 - 1998 A1 - Hau, K. T. A1 - Chang, Hua-Hua VL - 38 ER - TY - JOUR T1 - Maintaining content validity in computerized adaptive testing JF - Advances in Health Sciences Education Y1 - 1998 A1 - Luecht, RM A1 - de Champlain, A. A1 - Nungester, R. J. KW - computerized adaptive testing AB - The authors empirically demonstrate some of the trade-offs which can occur when content balancing is imposed in computerized adaptive testing (CAT) forms or conversely, when it is ignored. The authors contend that the content validity of a CAT form can actually change across a score scale when content balancing is ignored. However they caution that, efficiency and score precision can be severely reduced by over specifying content restrictions in a CAT form. The results from 2 simulation studies are presented as a means of highlighting some of the trade-offs that could occur between content and statistical considerations in CAT form assembly. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 3 N1 - Kluwer Academic Publishers, Netherlands ER - TY - JOUR T1 - Swedish Enlistment Battery: Construct validity and latent variable estimation of cognitive abilities by the CAT-SEB JF - International Journal of Selection and Assessment Y1 - 1998 A1 - Mardberg, B. A1 - Carlstedt, B. VL - 6 ER - TY - ABST T1 - CAST 5 for Windows users' guide Y1 - 1997 A1 - J. R. McBride A1 - Cooper, R. R CY - Contract No. "MDA903-93-D-0032, DO 0054. Alexandria, VA: Human Resources Research Organization ER - TY - CHAP T1 - CAT-ASVAB cost and benefit analyses Y1 - 1997 A1 - Wise, L. L. A1 - Curran, L. T. A1 - J. R. McBride CY - W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computer adaptive testing: From inquiry to operation (pp. 227-236). Washington, DC: American Psychological Association. ER - TY - JOUR T1 - A comparison of maximum likelihood estimation and expected a posteriori estimation in computerized adaptive testing using the generalized partial credit model JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 1997 A1 - Chen, S-K. KW - computerized adaptive testing AB - A simulation study was conducted to investigate the application of expected a posteriori (EAP) trait estimation in computerized adaptive tests (CAT) based on the generalized partial credit model (Muraki, 1992), and to compare the performance of EAP with maximum likelihood trait estimation (MLE). The performance of EAP was evaluated under different conditions: the number of quadrature points (10, 20, and 30), and the type of prior distribution (normal, uniform, negatively skewed, and positively skewed). The relative performance of the MLE and EAP estimation methods were assessed under two distributional forms of the latent trait, one normal and the other negatively skewed. Also, both the known item parameters and estimated item parameters were employed in the simulation study. Descriptive statistics, correlations, scattergrams, accuracy indices, and audit trails were used to compare the different methods of trait estimation in CAT. The results showed that, regardless of the latent trait distribution, MLE and EAP with a normal prior, a uniform prior, or the prior that matches the latent trait distribution using either 20 or 30 quadrature points provided relatively accurate estimation in CAT based on the generalized partial credit model. However, EAP using only 10 quadrature points did not work well in the generalized partial credit CAT. Also, the study found that increasing the number of quadrature points from 20 to 30 did not increase the accuracy of EAP estimation. Therefore, it appears 20 or more quadrature points are sufficient for accurate EAP estimation. The results also showed that EAP with a negatively skewed prior and positively skewed prior performed poorly for the normal data set, and EAP with positively skewed prior did not provide accurate estimates for the negatively skewed data set. Furthermore, trait estimation in CAT using estimated item parameters produced results similar to those obtained using known item parameters. In general, when at least 20 quadrature points are used, EAP estimation with a normal prior, a uniform prior or the prior that matches the latent trait distribution appears to be a good alternative to MLE in the application of polytomous CAT based on the generalized partial credit model. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 58 ER - TY - CONF T1 - Computer assembly of tests so that content reigns supreme T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Case, S. M. A1 - Luecht, RM JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - JOUR T1 - The effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational & Psychological Measurement Y1 - 1997 A1 - Chen, S-K. A1 - Hou, L. Y. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. KW - computerized adaptive testing AB - Investigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 57 N1 - Sage Publications, US ER - TY - JOUR T1 - The effect of population distribution and methods of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational and Psychological Measurement Y1 - 1997 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. VL - 57 ER - TY - JOUR T1 - Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing JF - Quality of Life Research Y1 - 1997 A1 - Revicki, D. A. A1 - Cella, D. F. KW - *Health Status KW - *HIV Infections/diagnosis KW - *Quality of Life KW - Diagnosis, Computer-Assisted KW - Disease Progression KW - Humans KW - Psychometrics/*methods AB - Health status assessment is frequently used to evaluate the combined impact of human immunodeficiency virus (HIV) disease and its treatment on functioning and well-being from the patient's perspective. No single health status measure can efficiently cover the range of problems in functioning and well-being experienced across HIV disease stages. Item response theory (IRT), item banking and computer adaptive testing (CAT) provide a solution to measuring health-related quality of life (HRQoL) across different stages of HIV disease. IRT allows us to examine the response characteristics of individual items and the relationship between responses to individual items and the responses to each other item in a domain. With information on the response characteristics of a large number of items covering a HRQoL domain (e.g. physical function, and psychological well-being), and information on the interrelationships between all pairs of these items and the total scale, we can construct more efficient scales. Item banks consist of large sets of questions representing various levels of a HRQoL domain that can be used to develop brief, efficient scales for measuring the domain. CAT is the application of IRT and item banks to the tailored assessment of HRQoL domains specific to individual patients. Given the results of IRT analyses and computer-assisted test administration, more efficient and brief scales can be used to measure multiple domains of HRQoL for clinical trials and longitudinal observational studies. VL - 6 SN - 0962-9343 (Print) N1 - Revicki, D ACella, D FEnglandQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 1997 Aug;6(6):595-600. ER - TY - ABST T1 - Modification of the Computerized Adaptive Screening Test (CAST) for use by recruiters in all military services Y1 - 1997 A1 - J. R. McBride A1 - Cooper, R. R CY - Final Technical Report FR-WATSD-97-24, Contract No. MDA903-93-D-0032, DO 0054. Alexandria VA: Human Resources Research Organization. ER - TY - CONF T1 - Multi-stage CAT with stratified design T2 - Paper presented at the annual meeting of the Psychometric Society. Gatlinburg TN. Y1 - 1997 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the Psychometric Society. Gatlinburg TN. ER - TY - JOUR T1 - Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests JF - The Annals of Statistics. Y1 - 1997 A1 - Chang, Hua-Hua A1 - Ying, Z. ER - TY - BOOK T1 - Optimization methods in computerized adaptive testing Y1 - 1997 A1 - Cordova, M. J. CY - Unpublished doctoral dissertation, Rutgers University, New Brunswick NJ ER - TY - CONF T1 - Relationship of response latency to test design, examinee ability, and item difficulty in computer-based test administration T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Swanson, D. B. A1 - Featherman, C. M. A1 - Case, A. M. A1 - Luecht, RM A1 - Nungester, R. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - A simulation study of the use of the Mantel-Haenszel and logistic regression procedures for assessing DIF in a CAT environment T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Ross, L. P. A1 - Nandakumar, R, A1 - Clauser, B. E. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CHAP T1 - Adaptive assessment using granularity hierarchies and Bayesian nets Y1 - 1996 A1 - Collins, J. A. A1 - Greer, J. E. A1 - Huang, S. X. CY - Frasson, C. and Gauthier, G. and Lesgold, A. (Eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montréal, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 569-577. ER - TY - BOOK T1 - Adaptive testing with granularity Y1 - 1996 A1 - Collins, J. A. CY - Masters thesis, University of Saskatchewan, Department of Computer Science ER - TY - CONF T1 - Building a statistical foundation for computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1996 A1 - Chang, Hua-Hua A1 - Ying, Z. JF - Paper presented at the annual meeting of the Psychometric Society CY - Banff, Alberta, Canada ER - TY - CONF T1 - The effects of methods of theta estimation, prior distribution, and number of quadrature points on CAT using the graded response model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Hou, L. A1 - Chen, S. A1 - Dodd. B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York NY ER - TY - JOUR T1 - A global information approach to computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1996 A1 - Chang, Hua-Hua A1 - Ying, Z. AB - based on Fisher information (or item information). At each stage, an item is selected to maximize the Fisher information at the currently estimated trait level (&thetas;). However, this application of Fisher information could be much less efficient than assumed if the estimators are not close to the true &thetas;, especially at early stages of an adaptive test when the test length (number of items) is too short to provide an accurate estimate for true &thetas;. It is argued here that selection procedures based on global information should be used, at least at early stages of a test when &thetas; estimates are not likely to be close to the true &thetas;. For this purpose, an item selection procedure based on average global information is proposed. Results from pilot simulation studies comparing the usual maximum item information item selection with the proposed global information approach are reported, indicating that the new method leads to improvement in terms of bias and mean squared error reduction under many circumstances. Index terms: computerized adaptive testing, Fisher information, global information, information surface, item information, item response theory, Kullback-Leibler information, local information, test information. VL - 20 SN - 0146-6216 ER - TY - JOUR T1 - A Global Information Approach to Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 1996 A1 - Chang, H.-H. A1 - Ying, Z. VL - 20 IS - 3 ER - TY - CONF T1 - A model for score maximization within a computerized adaptive testing environment T2 - Paper presented at the annual meeting of the NMCE Y1 - 1996 A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the NMCE CY - New York NY ER - TY - ABST T1 - Recursive maximum likelihood estimation, sequential design, and computerized adaptive testing Y1 - 1996 A1 - Chang, Hua-Hua A1 - Ying, Z. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - The effect of population distribution and methods of theta estimation on CAT using the rating scale model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - CONF T1 - Equating the CAT-ASVAB: Issues and approach T2 - Paper presented at the meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Segall, D. O. A1 - Carter, G. JF - Paper presented at the meeting of the National Council on Measurement in Education CY - San Francisco ER - TY - ABST T1 - An evaluation of alternative concepts for administering the Armed Services Vocational Aptitude Battery to applicants for enlistment Y1 - 1995 A1 - Hogan, P.F. A1 - J. R. McBride A1 - Curran, L. T. CY - DMDC Technical Report 95-013. Monterey, CA: Personnel Testing Division, Defense Manpower Data Center ER - TY - CONF T1 - A global information approach to computerized adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1995 A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco CA ER - TY - BOOK T1 - Item equivalence from paper-and-pencil to computer adaptive testing Y1 - 1995 A1 - Chae, S. CY - Unpublished doctoral dissertation, University of Chicago N1 - #CH95-01 ER - TY - CONF T1 - Recursive maximum likelihood estimation, sequential designs, and computerized adaptive testing T2 - Paper presented at the Eleventh Workshop on Item Response Theory Y1 - 1995 A1 - Ying, Z. A1 - Chang, Hua-Hua JF - Paper presented at the Eleventh Workshop on Item Response Theory CY - University of Twente, the Netherlands ER - TY - JOUR T1 - Computer adaptive testing: A shift in the evaluation paradigm JF - Educational Technology Systems Y1 - 1994 A1 - Carlson, R. VL - 22 (3) ER - TY - JOUR T1 - Computerized-adaptive and self-adapted music-listening tests: Features and motivational benefits JF - Applied Measurement in Education Y1 - 1994 A1 - Vispoel, W. P., A1 - Coffman, D. D. VL - 7 ER - TY - CONF T1 - Evaluation and implementation of CAT-ASVAB T2 - Paper presented at the annual meeting of the American Psychological Association Y1 - 1994 A1 - Curran, L. T. A1 - Wise, L. L. JF - Paper presented at the annual meeting of the American Psychological Association CY - Los Angeles ER - TY - JOUR T1 - Computer adaptive testing: A comparison of four item selection strategies when used with the golden section search strategy for estimating ability JF - Dissertation Abstracts International Y1 - 1993 A1 - Carlson, R. D. KW - computerized adaptive testing VL - 54 ER - TY - JOUR T1 - Moving in a new direction: Computerized adaptive testing (CAT) JF - Nursing Management Y1 - 1993 A1 - Jones-Dickson, C. A1 - Dorsey, D. A1 - Campbell-Warnock, J. A1 - Fields, F. KW - *Computers KW - Accreditation/methods KW - Educational Measurement/*methods KW - Licensure, Nursing KW - United States VL - 24 SN - 0744-6314 (Print) N1 - Jones-Dickson, CDorsey, DCampbell-Warnock, JFields, FUnited statesNursing managementNurs Manage. 1993 Jan;24(1):80, 82. ER - TY - JOUR T1 - Computerized adaptive testing of music-related skills JF - Bulletin of the Council for Research in Music Education Y1 - 1992 A1 - Vispoel, W. P., A1 - Coffman, D. D. VL - 112 ER - TY - CHAP T1 - The development of alternative operational concepts Y1 - 1992 A1 - J. R. McBride A1 - Curran, L. T. CY - Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center. ER - TY - BOOK T1 - Manual for the General Scholastic Aptitude Test (Senior) Computerized adaptive test Y1 - 1992 A1 - Von Tonder, M. A1 - Claasswn, N. C. W. CY - Pretoria: Human Sciences Research Council ER - TY - JOUR T1 - Inter-subtest branching in computerized adaptive testing JF - Dissertation Abstracts International Y1 - 1991 A1 - Chang, S-H. KW - computerized adaptive testing VL - 52 ER - TY - CONF T1 - MusicCAT: An adaptive testing program to assess musical ability T2 - Paper presented at the ADCIS 32nd International Conference Y1 - 1990 A1 - Vispoel, W. P. A1 - Coffman, D. A1 - Scriven, D. JF - Paper presented at the ADCIS 32nd International Conference CY - San Diego CA ER - TY - JOUR T1 - Adaptive and conventional versions of the DAT: The first complete test battery comparison JF - Applied Psychological Measurement Y1 - 1989 A1 - Henly, S. J. A1 - Klebe, K. J. A1 - J. R. McBride A1 - Cudeck, R. VL - 13 ER - TY - JOUR T1 - Adaptive and Conventional Versions of the DAT: The First Complete Test Battery Comparison JF - Applied Psychological Measurement Y1 - 1989 A1 - Henly, S. J. A1 - Klebe, K. J. A1 - J. R. McBride A1 - Cudeck, R. VL - 13 IS - 4 ER - TY - BOOK T1 - Application of appropriateness measurement to a problem in computerized adaptive testing Y1 - 1988 A1 - Candell, G. L. CY - Unpublished doctoral dissertation, University of Illinois ER - TY - ABST T1 - Refinement of the Computerized Adaptive Screening Test (CAST) (Final Report, Contract No MDA203 06-C-0373) Y1 - 1988 A1 - Wise, L. L. A1 - McHenry, J.J. A1 - Chia, W.J. A1 - Szenas, P.L. A1 - J. R. McBride CY - Washington, DC: American Institutes for Research. ER - TY - CONF T1 - Equating the computerized adaptive edition of the Differential Aptitude Tests T2 - Paper presented at the meeting of the American Psychological Association Y1 - 1987 A1 - J. R. McBride A1 - Corpe, V. A. A1 - Wing, H. JF - Paper presented at the meeting of the American Psychological Association CY - New York ER - TY - JOUR T1 - A structural comparison of conventional and adaptive versions of the ASVAB JF - Multivariate Behavioral Research Y1 - 1985 A1 - Cudeck, R. AB - Examined several structural models of similarity between the Armed Services Vocational Aptitude Battery (ASVAB) and a battery of computerized adaptive tests designed to measure the same aptitudes. 12 plausible models were fitted to sample data in a double cross-validation design. 1,411 US Navy recruits completed 10 ASVAB subtests. A computerized adaptive test version of the ASVAB subtests was developed on item pools of approximately 200 items each. The items were pretested using applicants from military entrance processing stations across the US, resulting in a total calibration sample size of approximately 60,000 for the computerized adaptive tests. Three of the 12 models provided reasonable summaries of the data. One model with a multiplicative structure (M. W. Browne; see record 1984-24964-001) performed quite well. This model provides an estimate of the disattenuated method correlation between conventional testing and adaptive testing. In the present data, this correlation was estimated to be 0.97 and 0.98 in the 2 halves of the data. Results support computerized adaptive tests as replacements for conventional tests. (33 ref) (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 20 N1 - Lawrence Erlbaum, US ER - TY - JOUR T1 - Computerized diagnostic testing JF - Journal of Educational Measurement Y1 - 1984 A1 - MCArthur , D.L. A1 - Choppin, B. H. VL - 21 ER - TY - ABST T1 - Computerized adaptive testing system design: Preliminary design considerations (Tech. Report 82-52) Y1 - 1982 A1 - Croll, P. R. CY - San Diego CA: Navy Personnel Research and Development Center. (AD A118 495) ER - TY - ABST T1 - A comparison of two methods of interactive testing Final report. Y1 - 1981 A1 - Nicewander, W. A. A1 - Chang, H. S. A1 - Doody, E. N. CY - National Institute of Education Grant 79-1045 ER - TY - BOOK T1 - Effect of error in item parameter estimates on adaptive testing (Doctoral dissertation, University of Minnesota) Y1 - 1981 A1 - Crichton, L. I. CY - Dissertation Abstracts International, 42, 06-B N1 - (University Microfilms No. AAD81-25946) ER - TY - ABST T1 - Effects of computerized adaptive testing on Black and White students (Research Report 79-2) Y1 - 1980 A1 - Pine, S. M. A1 - Church, A. T. A1 - Gialluca, K. A. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program N1 - {PDF file, 2.323 MB} ER - TY - JOUR T1 - Implied orders tailored testing: Simulation with the Stanford-Binet JF - Applied Psychological Measurement Y1 - 1980 A1 - Cudeck, R. A1 - McCormick, D. A1 - Cliff, N. A. VL - 4 ER - TY - JOUR T1 - Implied Orders Tailored Testing: Simulation with the Stanford-Binet JF - Applied Psychological Measurement Y1 - 1980 A1 - Cudeck, R. A1 - McCormick, D. J. A1 - N. Cliff VL - 4 IS - 2 ER - TY - JOUR T1 - Evaluation of Implied Orders as a Basis for Tailored Testing with Simulation Data JF - Applied Psychological Measurement Y1 - 1979 A1 - N. Cliff A1 - Cudeck, R. A1 - McCormick, D. J. VL - 3 IS - 4 ER - TY - JOUR T1 - Evaluation of implied orders as a basis for tailored testing with simulation data JF - Applied Psychological Measurement Y1 - 1979 A1 - Cliff, N. A. A1 - McCormick, D. VL - 3 ER - TY - JOUR T1 - Monte carlo evaluation of implied orders as a basis for tailored testing JF - Applied Psychological Measurement Y1 - 1979 A1 - Cudeck, R. A1 - McCormick, D. J. A1 - Cliff, N. A. VL - 3 ER - TY - JOUR T1 - Monte Carlo Evaluation of Implied Orders As a Basis for Tailored Testing JF - Applied Psychological Measurement Y1 - 1979 A1 - Cudeck, R. A1 - McCormick, D. A1 - N. Cliff VL - 3 IS - 1 ER - TY - JOUR T1 - Combining auditory and visual stimuli in the adaptive testing of speech discrimination JF - Journal of Speech and Hearing Disorders Y1 - 1978 A1 - Steele, J. A. A1 - Binnie, C. A. A1 - Cooper, W. A. VL - 43 ER - TY - ABST T1 - Evaluations of implied orders as a basis for tailored testing using simulations (Technical Report No. 4) Y1 - 1978 A1 - Cliff, N. A. A1 - Cudeck, R. A1 - McCormick, D. CY - Los Angeles CA: University of Southern California, Department of Psychology. N1 - #CL77-04 ER - TY - ABST T1 - Implied orders as a basis for tailored testing (Technical Report No. 6) Y1 - 1978 A1 - Cliff, N. A. A1 - Cudeck, R. A1 - McCormick, D. CY - Los Angeles CA: University of Southern California, Department of Psychology. N1 - #CL78-06 ER - TY - CHAP T1 - An empirical evaluation of implied orders as a basis for tailored testing Y1 - 1977 A1 - Cliff, N. A. A1 - Cudeck, R. A1 - McCormick, D. CY - D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. ER - TY - JOUR T1 - TAILOR: A FORTRAN procedure for interactive tailored testing JF - Educational and Psychological Measurement Y1 - 1977 A1 - Cudeck, R. A. A1 - Cliff, N. A. A1 - Kehoe, J. VL - 37 ER - TY - JOUR T1 - TAILOR-APL: An interactive computer program for individual tailored testing JF - Educational and Psychological Measurement Y1 - 1977 A1 - McCormick, D. A1 - Cliff, N. A. VL - 37 ER - TY - JOUR T1 - A theory of consistency ordering generalizable to tailored testing JF - Psychometrika Y1 - 1977 A1 - Cliff, N. A. ER - TY - ABST T1 - Elements of a basic test theory generalizable to tailored testing Y1 - 1976 A1 - Cliff, N. A. CY - Unpublished manuscript ER - TY - BOOK T1 - Incomplete orders and computerized testing Y1 - 1976 A1 - Cliff, N. A. CY - In C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 18-23). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 373 KB} ER - TY - ABST T1 - Monte carlo results from a computer program for tailored testing (Technical Report No. 2) Y1 - 1976 A1 - Cudeck, R. A. A1 - Cliff, N. A. A1 - Reynolds, T. J. A1 - McCormick, D. J. CY - Los Angeles CA: University of California, Department of Psychology. N1 - #CU76-02 ER - TY - BOOK T1 - Proceedings of the first conference on computerized adaptive testing Y1 - 1976 A1 - Clark, C. K. CY - Washington DC: U.S. Government Printing Office N1 - {Complete document: PDF file, 7.494 MB; Table of contents and separate papers} ER - TY - CHAP T1 - Using computerized tests to add new dimensions to the measurement of abilities which are important for on-job performance: An exploratory study Y1 - 1976 A1 - Cory, C. H. CY - C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 64-74). Washington DC: U.S. Government Printing Office. N1 - {PDF file, 632 KB} ER - TY - ABST T1 - A basic test theory generalizable to tailored testing (Technical Report No 1) Y1 - 1975 A1 - Cliff, N. A. CY - Los Angeles CA: University of Southern California, Department of Psychology. ER - TY - JOUR T1 - Complete orders from incomplete data: Interactive ordering and tailored testing JF - Psychological Bulletin Y1 - 1975 A1 - Cliff, N. A. VL - 82 ER - TY - CONF T1 - Tailored testing: Maximizing validity and utility for job selection T2 - Paper presented at the 86th Annual Convention of the American Psychological Association. Toronto Y1 - 1975 A1 - Croll, P. R. A1 - Urry, V. W. JF - Paper presented at the 86th Annual Convention of the American Psychological Association. Toronto CY - Canada ER - TY - CONF T1 - The potential use of tailored testing for allocation to army employments T2 - NATO Conference on Utilisation of Human Resources Y1 - 1973 A1 - Killcross, M. C. A1 - Cassie, A JF - NATO Conference on Utilisation of Human Resources CY - Lisbon, Portugal ER - TY - JOUR T1 - Sequential testing for dichotomous decisions. JF - Educational and Psychological Measurement Y1 - 1972 A1 - Linn, R. L. A1 - Rock, D. A. A1 - Cleary, T. A. KW - CCAT KW - CLASSIFICATION Computerized Adaptive Testing KW - sequential probability ratio testing KW - SPRT VL - 32 ER - TY - ABST T1 - Sequential testing for dichotomous decisions. College Entrance Examination Board Research and Development Report (RDR 69-70, No 3", and Educational Testing Service RB-70-31) Y1 - 1970 A1 - Linn, R. L. A1 - Rock, D. A. A1 - Cleary, T. A. CY - Princeton NJ: Educational Testing Service. N1 - #LI70-31 ER - TY - JOUR T1 - The development and evaluation of several programmed testing methods JF - Educational and Psychological Measurement Y1 - 1969 A1 - Linn, R. L. A1 - Cleary, T. A. VL - 29 ER - TY - JOUR T1 - An exploratory study of programmed tests JF - Educational and Psychological Measurement Y1 - 1969 A1 - Cleary, T. A. A1 - Linn, R. L. A1 - Rock, D. A. VL - 28 ER - TY - ABST T1 - The development and evaluation of several programmed testing methods (Research Bulletin 68-5) Y1 - 1968 A1 - Linn, R. L. A1 - Rock, D. A. A1 - Cleary, T. A. CY - Princeton NJ: Educational Testing Service N1 - #LI68-05 ER - TY - JOUR T1 - Reproduction of total test score through the use of sequential programmed tests JF - Journal of Educational Measurement Y1 - 1968 A1 - Cleary, T. A. A1 - Linn, R. L. A1 - Rock, D. A. VL - 5 ER - TY - CHAP T1 - New light on test strategy from decision theory Y1 - 1966 A1 - Cronbach, L. J. CY - A. Anastasi (Ed.). Testing problems in perspective. Washington DC: American Council on Education. ER - TY - JOUR T1 - An application of sequential sampling to testing students JF - Journal of the American Statistical Association Y1 - 1946 A1 - Cowden, D. J. VL - 41 ER -