00537nas a2200181 4500008004500000022001400045245004000059210004000099260001200139300000800151490000600159653003300165653003000198653002000228653002700248100002200275856005800297 2022 Engldsh a2165-659200aImproving Precision of CAT Measures0 aImproving Precision of CAT Measures c10/2022 a1-70 v910a: dichotomously scored items10aoption probability theory10ascoring methods10asubjective probability1 aBarnard, John, J. uhttp://www.iacat.org/improving-precision-cat-measures01969nas a2200145 4500008004100000245008700041210006900128260005500197520137700252653003201629653001801661653002401679100001701703856010301720 2017 eng d00aAdapting Linear Models for Optimal Test Design to More Complex Test Specifications0 aAdapting Linear Models for Optimal Test Design to More Complex T aNiigata, JapanbNiigata Seiryo Universityc08/20173 a

Combinatorial optimization (CO) has proven to be a very helpful approach for addressing test assembly issues and for providing solutions. Furthermore, CO has been applied for several test designs, including: (1) for the development of linear test forms; (2) for computerized adaptive testing and; (3) for multistage testing. In his seminal work, van der Linden (2006) laid out the basis for using linear models for simultaneously assembling exams and item pools in a variety of conditions: (1) for single tests and multiple tests; (2) with item sets, etc. However, for some testing programs, the number and complexity of test specifications can grow rapidly. Consequently, the mathematical representation of the test assembly problem goes beyond most approaches reported either in van der Linden’s book or in the majority of other publications related to test assembly. In this presentation, we extend van der Linden’s framework by including the concept of blocks for test specifications. We modify the usual mathematical notation of a test assembly problem by including this concept and we show how it can be applied to various test designs. Finally, we will demonstrate an implementation of this approach in a stand-alone software, called the ATASolver.

10aComplex Test Specifications10aLinear Models10aOptimal Test Design1 aMorin, Maxim uhttp://www.iacat.org/adapting-linear-models-optimal-test-design-more-complex-test-specifications-001506nas a2200133 4500008004100000245006000041210005900101260005500160520103000215653001501245653002001260100002101280856007101301 2017 eng d00aConcerto 5 Open Source CAT Platform: From Code to Nodes0 aConcerto 5 Open Source CAT Platform From Code to Nodes aNiigata, JapanbNiigata Seiryo Universityc08/20173 a

Concerto 5 is the newest version of the Concerto open source R-based Computer-Adaptive Testing platform, which is currently used in educational testing and in clinical trials. In our quest to make CAT accessible to all, the latest version uses flowchart nodes to connect different elements of a test, so that CAT test creation is an intuitive high-level process that does not require writing code.

A test creator might connect an Info Page node, to a Consent Page node, to a CAT node, to a Feedback node. And after uploading their items, their test is done.

This talk will show the new flowchart interface, and demonstrate the creation of a CAT test from scratch in less than 10 minutes.

Concerto 5 also includes a new Polytomous CAT node, so CATs with Likert items can be easily created in the flowchart interface. This node is currently used in depression and anxiety tests in a clinical trial.

Session Video

10aConcerto 510aOpen Source CAT1 aStillwell, David uhttps://drive.google.com/open?id=11eu1KKILQEoK5c-CYO1P1AiJgiQxX0E002154nas a2200145 4500008004100000245005100041210005100092260005500143520166100198653002301859653002001882100001901902700001301921856007401934 2017 eng d00aItem Parameter Drifting and Online Calibration0 aItem Parameter Drifting and Online Calibration aNiigata, JapanbNiigata Seiryo Universityc08/20173 a

Item calibration is a part of the most important topics in item response theory (IRT). Since many largescale testing programs have switched from paper and pencil (P&P) testing mode to computerized adaptive testing (CAT) mode, developing methods for efficiently calibrating new items have become vital. Among many proposed item calibration processes in CAT, online calibration is the most cost-effective. This presentation introduces an online (re)calibration design to detect item parameter drift for computerized adaptive testing (CAT) in both unidimensional and multidimensional environments. Specifically, for online calibration optimal design in unidimensional computerized adaptive testing model, a two-stage design is proposed by implementing a proportional density index algorithm. For a multidimensional computerized adaptive testing model, a four-quadrant online calibration pretest item selection design with proportional density index algorithm is proposed. Comparisons were made between different online calibration item selection strategies. Results showed that under unidimensional computerized adaptive testing, the proposed modified two-stage item selection criterion with the proportional density algorithm outperformed the other existing methods in terms of item parameter calibration and item parameter drift detection, and under multidimensional computerized adaptive testing, the online (re)calibration technique with the proposed four-quadrant item selection design with proportional density index outperformed other methods.

Session Video

10aonline calibration10aParameter Drift1 aChang, Hua-Hua1 aGuo, Rui uhttp://www.iacat.org/item-parameter-drifting-and-online-calibration-003598nas a2200169 4500008004100000245004300041210004100084260005500125520306600180653000803246653002303254653002303277100001703300700002003317700002003337856007103357 2017 eng d00aScripted On-the-fly Multistage Testing0 aScripted Onthefly Multistage Testing aNiigata, JapanbNiigata Seiryo Universityc08/20173 a

On-the-fly multistage testing (OMST) was introduced recently as a promising alternative to preassembled MST. A decidedly appealing feature of both is the reviewability of items within the current stage. However, the fundamental difference is that, instead of routing to a preassembled module, OMST adaptively assembles a module at each stage according to an interim ability estimate. This produces more individualized forms with finer measurement precision, but imposing nonstatistical constraints and controlling item exposure become more cumbersome. One recommendation is to use the maximum priority index followed by a remediation step to satisfy content constraints, and the Sympson-Hetter method with a stratified item bank for exposure control.

However, these methods can be computationally expensive, thereby impeding practical implementation. Therefore, this study investigated the script method as a simpler solution to the challenge of strict content balancing and effective item exposure control in OMST. The script method was originally devised as an item selection algorithm for CAT and generally proceeds as follows: For a test with m items, there are m slots to be filled, and an item is selected according to pre-defined rules for each slot. For the first slot, randomly select an item from a designated content area (collection). For each subsequent slot, 1) Discard any enemies of items already administered in previous slots; 2) Draw a designated number of candidate items (selection length) from the designated collection according to the current ability estimate; 3) Randomly select one item from the set of candidates. There are two distinct features of the script method. First, a predetermined sequence of collections guarantees meeting content specifications. The specific ordering may be determined either randomly or deliberately by content experts. Second, steps 2 and 3 depict a method of exposure control, in which selection length balances item usage at the possible expense of ability estimation accuracy. The adaptation of the script method to OMST is straightforward. For the first module, randomly select each item from a designated collection. For each subsequent module, the process is the same as in scripted CAT (SCAT) except the same ability estimate is used for the selection of all items within the module. A series of simulations was conducted to evaluate the performance of scripted OMST (SOMST, with 3 or 4 evenly divided stages) relative to SCAT under various item exposure restrictions. In all conditions, reliability was maximized by programming an optimization algorithm that searches for the smallest possible selection length for each slot within the constraints. Preliminary results indicated that SOMST is certainly a capable design with performance comparable to that of SCAT. The encouraging findings and ease of implementation highly motivate the prospect of operational use for large-scale assessments.

Presentation Video

10aCAT10amultistage testing10aOn-the-fly testing1 aChoe, Edison1 aWilliams, Bruce1 aLee, Sung-Hyuck uhttps://drive.google.com/open?id=1wKuAstITLXo6BM4APf2mPsth1BymNl-y00516nas a2200193 4500008004500000022001400045245004500059210004200104300000900146490000600155653001300161653001500174653001300189653001200202653001100214653001200225100002100237856006400258 2015 Engldsh a2165-659200aImplementing a CAT: The AMC Experience 0 aImplementing a CAT The AMC Experience a1-120 v310aadaptive10aAssessment10acomputer10amedical10aonline10aTesting1 aBarnard, John, J uhttp://www.iacat.org/jcat/index.php/jcat/article/view/52/2502776nas a2200229 4500008004100000022001400041245015400055210006900209260000900278300000800287490000700295520193000302653001802232653003702250653001102287653002702298653002302325653003702348100002002385700001902405856012202424 2012 eng d a1471-228800aComparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study.0 aComparison of two Bayesian methods to detect mode effects betwee c2012 a1240 v123 a

BACKGROUND: Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.

METHOD: Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust Z (RZ) test, and (2) 95% credible intervals (CrI) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in θ estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition.

RESULTS: Both methods evidenced good to excellent false positive control, with RZ providing better control of false positives and with slightly higher power for CrI, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. RZ outperformed CrI, due to better control of false positive DIF.

CONCLUSIONS: Whereas false positives were well controlled, particularly for RZ, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.

10aBayes Theorem10aData Interpretation, Statistical10aHumans10aMathematical Computing10aMonte Carlo Method10aOutcome Assessment (Health Care)1 aRiley, Barth, B1 aCarle, Adam, C uhttp://www.iacat.org/content/comparison-two-bayesian-methods-detect-mode-effects-between-paper-based-and-computerized01103nas a2200169 4500008004100000245006600041210006600107260001200173520055300185653002600738653000800764653002100772653001700793653001000810100002200820856009100842 2011 eng d00aOptimal Calibration Designs for Computerized Adaptive Testing0 aOptimal Calibration Designs for Computerized Adaptive Testing c10/20113 a

Optimaztion

How can we exploit the advantages of Balanced Block Design while keeping the logistics manageable?

Maximize number of item pairs
Subject to maximum number of test booklets
Subject to other constraints

Homogeneous Designs: Overlap between test booklets as regular as possible

Conclusions:

Establish overlaps as regular as possible between all test booklets
Or, at least as many test booklets as possible

10abalanced block design10aCAT10aitem calibration10aoptimization10aRasch1 aVerschoor, Angela uhttp://www.iacat.org/content/optimal-calibration-designs-computerized-adaptive-testing02749nas a2200409 4500008004100000020004600041245009400087210006900181250001500250260000800265300001200273490000700285520144500292653001501737653002001752653003101772653003001803653002001833653001901853653002601872653001101898653001101909653000901920653001601929653002601945653003701971653003002008653004402038653001802082653002002100653002802120100002002148700002302168700001602191700001702207856011502224 2009 eng d a1528-8447 (Electronic)1526-5900 (Linking)00aDevelopment and preliminary testing of a computerized adaptive assessment of chronic pain0 aDevelopment and preliminary testing of a computerized adaptive a a2009/07/15 cSep a932-9430 v103 aThe aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain.10a*Computers10a*Questionnaires10aActivities of Daily Living10aAdaptation, Psychological10aChronic Disease10aCohort Studies10aDisability Evaluation10aFemale10aHumans10aMale10aMiddle Aged10aModels, Psychological10aOutcome Assessment (Health Care)10aPain Measurement/*methods10aPain, Intractable/*diagnosis/psychology10aPsychometrics10aQuality of Life10aUser-Computer Interface1 aAnatchkova, M D1 aSaris-Baglama, R N1 aKosinski, M1 aBjorner, J B uhttp://www.iacat.org/content/development-and-preliminary-testing-computerized-adaptive-assessment-chronic-pain02881nas a2200493 4500008004100000020004100041245014100082210006900223250001500292260000800307300001100315490000700326520125100333653003001584653001001614653000901624653004601633653003301679653001101712653003101723653001101754653000901765653003301774653001601807653002401823653004601847653005501893653005501948653004602003653001902049653003102068653001402099100001602113700001502129700001302144700001402157700001502171700001702186700001502203700001702218700001502235700001302250856012402263 2009 eng d a0090-5550 (Print)0090-5550 (Linking)00aDevelopment of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis0 aDevelopment of an item bank for the assessment of depression in a2009/05/28 cMay a186-970 v543 aOBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings.10aAdaptation, Psychological10aAdult10aAged10aDepressive Disorder/*diagnosis/psychology10aDiagnosis, Computer-Assisted10aFemale10aHeart Diseases/*psychology10aHumans10aMale10aMental Disorders/*psychology10aMiddle Aged10aModels, Statistical10aOtorhinolaryngologic Diseases/*psychology10aPersonality Assessment/statistics & numerical data10aPersonality Inventory/*statistics & numerical data10aPsychometrics/statistics & numerical data10aQuestionnaires10aReproducibility of Results10aSick Role1 aForkmann, T1 aBoecker, M1 aNorra, C1 aEberle, N1 aKircher, T1 aSchauerte, P1 aMischke, K1 aWesthofen, M1 aGauggel, S1 aWirtz, M uhttp://www.iacat.org/content/development-item-bank-assessment-depression-persons-mental-illnesses-and-physical-diseases02435nas a2200385 4500008004100000020004100041245009300082210006900175250001500244260000800259300001100267490000700278520128100285653003201566653002701598653002001625653002901645653001001674653000901684653001901693653003401712653001101746653001101757653000901768653001601777653004601793100001501839700001001854700001501864700001101879700001201890700001401902700001601916856011701932 2009 eng d a0962-9343 (Print)0962-9343 (Linking)00aReplenishing a computerized adaptive test of patient-reported daily activity functioning0 aReplenishing a computerized adaptive test of patientreported dai a2009/03/17 cMay a461-710 v183 aPURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT.10a*Activities of Daily Living10a*Disability Evaluation10a*Questionnaires10a*User-Computer Interface10aAdult10aAged10aCohort Studies10aComputer-Assisted Instruction10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods1 aHaley, S M1 aNi, P1 aJette, A M1 aTao, W1 aMoed, R1 aMeyers, D1 aLudlow, L H uhttp://www.iacat.org/content/replenishing-computerized-adaptive-test-patient-reported-daily-activity-functioning03436nas a2200481 4500008004100000020004600041245013800087210006900225250001500294260000800309300001200317490000700329520191400336653002702250653002302277653003102300653001502331653001602346653001002362653002102372653002402393653002302417653003802440653001102478653002202489653001102511653001102522653000902533653003702542653002102579653003102600653002602631653001702657653003202674653001602706653002802722100001602750700001502766700001002781700001502791700002502806856012302831 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aAssessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing selfcare and social function using a computer adaptive a2008/04/01 cApr a622-6290 v893 aOBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time.10a*Disability Evaluation10a*Social Adjustment10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild, Preschool10aComputer Simulation10aCross-Over Studies10aDisabled Children/*rehabilitation10aFemale10aFollow-Up Studies10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)10aReference Values10aReproducibility of Results10aRetrospective Studies10aRisk Factors10aSelf Care/*standards/trends10aSex Factors10aSickness Impact Profile1 aCoster, W J1 aHaley, S M1 aNi, P1 aDumas, H M1 aFragala-Pinkham, M A uhttp://www.iacat.org/content/assessing-self-care-and-social-function-using-computer-adaptive-testing-version-pediatric03041nas a2200481 4500008004100000020004600041245012200087210006900209250001500278260000800293300001200301490000700313520155700320653003201877653003101909653002201940653002001962653001001982653000901992653002202001653002802023653003302051653001102084653001102095653002502106653000902131653001602140653004602156653002202202653002402224653003002248653002902278100001502307700001402322700001502336700002402351700001802375700001102393700001602404700001002420700001502430856011402445 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes0 aComputerized adaptive testing for followup after discharge from a2008/01/30 cFeb a275-2830 v893 aOBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aGandek, B1 aSiebens, H1 aBlack-Schaffer, R M1 aSinclair, S J1 aTao, W1 aCoster, W J1 aNi, P1 aJette, A M uhttp://www.iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-ii02560nas a2200313 4500008004100000020004100041245011500082210006900197250001500266300001100281490000700292520149300299653002701792653001001819653001401829653005301843653001501896653001101911653003701922653001801959653003101977653002602008653001402034653003202048100001502080700001002095700001502105856012602120 2008 eng d a0963-8288 (Print)0963-8288 (Linking)00aEfficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning0 aEfficiency and sensitivity of multidimensional computerized adap a2008/02/26 a479-840 v303 aPURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden.10a*Disability Evaluation10aChild10aComputers10aDisabled Children/*classification/rehabilitation10aEfficiency10aHumans10aOutcome Assessment (Health Care)10aPsychometrics10aReproducibility of Results10aRetrospective Studies10aSelf Care10aSensitivity and Specificity1 aAllen, D D1 aNi, P1 aHaley, S M uhttp://www.iacat.org/content/efficiency-and-sensitivity-multidimensional-computerized-adaptive-testing-pediatric-physical03157nas a2200493 4500008004100000020002200041245008900063210006900152250001500221260000800236300001000244490000700254520169600261653003401957653002001991653001502011653001002026653000902036653002602045653003202071653003102103653001102134653001102145653000902156653003202165653001602197653002902213653004402242653002902286653003102315653003102346653001702377100001702394700001402411700001602425700001302441700001702454700002202471700001702493700001402510700001402524700001702538856010802555 2008 eng d a1075-2730 (Print)00aUsing computerized adaptive testing to reduce the burden of mental health assessment0 aUsing computerized adaptive testing to reduce the burden of ment a2008/04/02 cApr a361-80 v593 aOBJECTIVE: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. METHODS: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. RESULTS: Tests of competing models based on item response theory supported the scale's bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients--one with bipolar disorder and one without--on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. CONCLUSIONS: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden.10a*Diagnosis, Computer-Assisted10a*Questionnaires10aAdolescent10aAdult10aAged10aAgoraphobia/diagnosis10aAnxiety Disorders/diagnosis10aBipolar Disorder/diagnosis10aFemale10aHumans10aMale10aMental Disorders/*diagnosis10aMiddle Aged10aMood Disorders/diagnosis10aObsessive-Compulsive Disorder/diagnosis10aPanic Disorder/diagnosis10aPhobic Disorders/diagnosis10aReproducibility of Results10aTime Factors1 aGibbons, R D1 aWeiss, DJ1 aKupfer, D J1 aFrank, E1 aFagiolini, A1 aGrochocinski, V J1 aBhaumik, D K1 aStover, A1 aBock, R D1 aImmekus, J C uhttp://www.iacat.org/content/using-computerized-adaptive-testing-reduce-burden-mental-health-assessment01798nas a2200217 4500008004100000020004600041245017800087210006900265260002500334300001200359490000700371520094900378653001201327653004301339653001801382653000901400100001701409700001501426700001501441856012401456 2007 eng d a1062-7197 (Print); 1532-6977 (Electronic)00aThe effect of including pretest items in an operational computerized adaptive test: Do different ability examinees spend different amounts of time on embedded pretest items?0 aeffect of including pretest items in an operational computerized bLawrence Erlbaum: US a161-1730 v123 aThe purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aability10aoperational computerized adaptive test10apretest items10atime1 aFerdous, A A1 aPlake, B S1 aChang, S-R uhttp://www.iacat.org/content/effect-including-pretest-items-operational-computerized-adaptive-test-do-different-ability02875nas a2200313 4500008004100000020002200041245010100063210006900164250001500233260000800248300001200256490000700268520179500275653005102070653002002121653003702141653002602178653001902204653001102223653003002234653004602264653003502310653002802345653001302373100002102386700001702407700001502424856012202439 2007 eng d a0315-162X (Print)00aImproving patient reported outcomes using item response theory and computerized adaptive testing0 aImproving patient reported outcomes using item response theory a a2007/06/07 cJun a1426-310 v343 aOBJECTIVE: Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. METHODS: We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. RESULTS: Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance"items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. CONCLUSION: By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials.10a*Rheumatic Diseases/physiopathology/psychology10aClinical Trials10aData Interpretation, Statistical10aDisability Evaluation10aHealth Surveys10aHumans10aInternational Cooperation10aOutcome Assessment (Health Care)/*methods10aPatient Participation/*methods10aResearch Design/*trends10aSoftware1 aChakravarty, E F1 aBjorner, J B1 aFries, J F uhttp://www.iacat.org/content/improving-patient-reported-outcomes-using-item-response-theory-and-computerized-adaptive03104nas a2200445 4500008004100000020002200041245007100063210006900134250001500203300001200218490000700230520183100237653003802068653001902106653002102125653002002146653001402166653001102180653003002191653001102221653000902232653002502241653004602266653001802312653002602330100001302356700001402369700001702383700001302400700001502413700001502428700001702443700001402460700001802474700002302492700001602515700001602531700001502547856009602562 2007 eng d a0962-9343 (Print)00aIRT health outcomes data analysis project: an overview and summary0 aIRT health outcomes data analysis project an overview and summar a2007/03/14 a121-1320 v163 aBACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed.10a*Data Interpretation, Statistical10a*Health Status10a*Quality of Life10a*Questionnaires10a*Software10aFemale10aHIV Infections/psychology10aHumans10aMale10aNeoplasms/psychology10aOutcome Assessment (Health Care)/*methods10aPsychometrics10aStress, Psychological1 aCook, KF1 aTeal, C R1 aBjorner, J B1 aCella, D1 aChang, C-H1 aCrane, P K1 aGibbons, L E1 aHays, R D1 aMcHorney, C A1 aOcepek-Welikson, K1 aRaczek, A E1 aTeresi, J A1 aReeve, B B uhttp://www.iacat.org/content/irt-health-outcomes-data-analysis-project-overview-and-summary02212nas a2200229 4500008004100000020004600041245008500087210006900172260004500241300001000286490000600296520140300302653003401705653002301739653002601762653001701788653002601805100001701831700001201848700001501860856010701875 2007 eng d a1614-1881 (Print); 1614-2241 (Electronic)00aMethods for restricting maximum exposure rate in computerized adaptative testing0 aMethods for restricting maximum exposure rate in computerized ad bHogrefe & Huber Publishers GmbH: Germany a14-230 v33 aThe Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method (Revuelta & Ponsoda, 1998), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aitem bank security10aitem exposure control10aoverlap rate10aSympson-Hetter method1 aBarrada, J R1 aOlea, J1 aPonsoda, V uhttp://www.iacat.org/content/methods-restricting-maximum-exposure-rate-computerized-adaptative-testing02744nas a2200541 4500008004100000020002200041245017000063210006900233250001500302260000800317300001100325490000700336520116200343653001901505653002501524653002101549653002101570653001501591653001001606653000901616653001601625653002301641653003201664653001101696653001101707653000901718653001601727653004601743653001801789653002901807653001801836100001501854700001401869700001701883700001301900700001501913700001601928700001501944700001701959700001401976700001801990700001102008700001602019700001502035700001302050700001302063856012602076 2007 eng d a0025-7079 (Print)00aPsychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)0 aPsychometric evaluation and calibration of healthrelated quality a2007/04/20 cMay aS22-310 v453 aBACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.10a*Health Status10a*Information Systems10a*Quality of Life10a*Self Disclosure10aAdolescent10aAdult10aAged10aCalibration10aDatabases as Topic10aEvaluation Studies as Topic10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPsychometrics10aQuestionnaires/standards10aUnited States1 aReeve, B B1 aHays, R D1 aBjorner, J B1 aCook, KF1 aCrane, P K1 aTeresi, J A1 aThissen, D1 aRevicki, D A1 aWeiss, DJ1 aHambleton, RK1 aLiu, H1 aGershon, RC1 aReise, S P1 aLai, J S1 aCella, D uhttp://www.iacat.org/content/psychometric-evaluation-and-calibration-health-related-quality-life-item-banks-plans-patient02652nas a2200397 4500008004100000020002200041245013500063210006900198250001500267260000800282300001200290490000700302520140700309653002601716653003101742653001501773653001001788653000901798653002201807653002501829653003301854653001101887653001101898653000901909653001601918653004601934653003001980653003102010653001302041100001502054700001002069700001802079700001602097700001502113856012602128 2006 eng d a0895-4356 (Print)00aComputer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank0 aComputer adaptive testing improved accuracy and precision of sco a2006/10/10 cNov a1174-820 v593 aBACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings.10a*Recovery of Function10aActivities of Daily Living10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aRehabilitation/*standards10aReproducibility of Results10aSoftware1 aHaley, S M1 aNi, P1 aHambleton, RK1 aSlavin, M D1 aJette, A M uhttp://www.iacat.org/content/computer-adaptive-testing-improved-accuracy-and-precision-scores-over-random-item-selectio-003329nas a2200469 4500008004100000020002200041245011600063210006900179250001500248260000800263300001200271490000700283520189400290653003202184653003102216653002202247653002002269653001002289653000902299653002202308653002802330653003302358653001102391653001102402653002502413653000902438653001602447653004602463653002202509653002402531653003002555653002902585100001502614700001502629700001602644700001102660700002402671700001402695700001802709700001002727856012202737 2006 eng d a0003-9993 (Print)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes0 aComputerized adaptive testing for followup after discharge from a2006/08/01 cAug a1033-420 v873 aOBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aSiebens, H1 aCoster, W J1 aTao, W1 aBlack-Schaffer, R M1 aGandek, B1 aSinclair, S J1 aNi, P uhttp://www.iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-i-activity02567nas a2200349 4500008004100000020002200041245016600063210006900229250001500298260000800313300001100321490000700332520142900339653002701768653001601795653001501811653001001826653002101836653001401857653005201871653001501923653001101938653001101949653003701960653001801997653001402015100001502029700001002044700001602054700002502070856012202095 2006 eng d a0003-9993 (Print)00aMeasurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory0 aMeasurement precision and efficiency of multidimensional compute a2006/08/29 cSep a1223-90 v873 aOBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired.10a*Disability Evaluation10a*Pediatrics10aAdolescent10aChild10aChild, Preschool10aComputers10aDisabled Persons/*classification/rehabilitation10aEfficiency10aHumans10aInfant10aOutcome Assessment (Health Care)10aPsychometrics10aSelf Care1 aHaley, S M1 aNi, P1 aLudlow, L H1 aFragala-Pinkham, M A uhttp://www.iacat.org/content/measurement-precision-and-efficiency-multidimensional-computer-adaptive-testing-physical02433nas a2200241 4500008004100000020004600041245008600087210006900173260002500242300001200267490000700279520159900286653001801885653002401903653002001927653002801947653002101975653004001996653001402036100001802050700001202068856011102080 2006 eng d a0895-7347 (Print); 1532-4818 (Electronic)00aOptimal and nonoptimal computer-based test designs for making pass-fail decisions0 aOptimal and nonoptimal computerbased test designs for making pas bLawrence Erlbaum: US a221-2390 v193 aNow that many credentialing exams are being routinely administered by computer, new computer-based test designs, along with item response theory models, are being aggressively researched to identify specific designs that can increase the decision consistency and accuracy of pass-fail decisions. The purpose of this study was to investigate the impact of optimal and nonoptimal multistage test (MST) designs, linear parallel-form test designs (LPFT), and computer adaptive test (CAT) designs on the decision consistency and accuracy of pass-fail decisions. Realistic testing situations matching those of one of the large credentialing agencies were simulated to increase the generalizability of the findings. The conclusions were clear: (a) With the LPFTs, matching test information functions (TIFs) to the mean of the proficiency distribution produced slightly better results than matching them to the passing score; (b) all of the test designs worked better than test construction using random selection of items, subject to content constraints only; (c) CAT performed better than the other test designs; and (d) if matching a TIP to the passing score, the MST design produced a bit better results than the LPFT design. If an argument for the MST design is to be made, it can be made on the basis of slight improvements over the LPFT design and better expected item bank utilization, candidate preference, and the potential for improved diagnostic feedback, compared with the feedback that is possible with fixed linear test forms. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aadaptive test10acredentialing exams10aDecision Making10aEducational Measurement10amultistage tests10aoptimal computer-based test designs10atest form1 aHambleton, RK1 aXing, D uhttp://www.iacat.org/content/optimal-and-nonoptimal-computer-based-test-designs-making-pass-fail-decisions03123nas a2200385 4500008004100000020002200041245012900063210006900192250001500261260000800276300001000284490000700294520188400301653002502185653002702210653001502237653001002252653002102262653002802283653003802311653001102349653001102360653001102371653000902382653004602391653002702437653003002464653003202494100001502526700001602541700001602557700001502573700002502588856012402613 2005 eng d a0003-9993 (Print)00aAssessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing mobility in children using a computer adaptive testing a2005/05/17 cMay a932-90 v863 aOBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time.10a*Computer Simulation10a*Disability Evaluation10aAdolescent10aChild10aChild, Preschool10aCross-Sectional Studies10aDisabled Children/*rehabilitation10aFemale10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)/*methods10aRehabilitation Centers10aRehabilitation/*standards10aSensitivity and Specificity1 aHaley, S M1 aRaczek, A E1 aCoster, W J1 aDumas, H M1 aFragala-Pinkham, M A uhttp://www.iacat.org/content/assessing-mobility-children-using-computer-adaptive-testing-version-pediatric-evaluation-002791nas a2200469 4500008004100000020002200041245010400063210006900167250001500236260000800251300001200259490000700271520132800278653002201606653003101628653001501659653001601674653001001690653003401700653002101734653002401755653002501779653001501804653001101819653005301830653002901883653001101912653001101923653002001934653000901954653003101963653004601994653003102040653001402071653003202085100001502117700001002132700002502142700001702167700001302184856012402197 2005 eng d a0012-1622 (Print)00aA computer adaptive testing approach for assessing physical functioning in children and adolescents0 acomputer adaptive testing approach for assessing physical functi a2005/02/15 cFeb a113-1200 v473 aThe purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning.10a*Computer Systems10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild Development/*physiology10aChild, Preschool10aComputer Simulation10aConfidence Intervals10aDemography10aFemale10aGlycogen Storage Disease Type II/physiopathology10aHealth Status Indicators10aHumans10aInfant10aInfant, Newborn10aMale10aMotor Activity/*physiology10aOutcome Assessment (Health Care)/*methods10aReproducibility of Results10aSelf Care10aSensitivity and Specificity1 aHaley, S M1 aNi, P1 aFragala-Pinkham, M A1 aSkrinar, A M1 aCorzo, D uhttp://www.iacat.org/content/computer-adaptive-testing-approach-assessing-physical-functioning-children-and-adolescents02123nas a2200253 4500008004100000245007900041210006900120300001200189490000700201520113300208653002701341653004601368653005201414653002901466653001101495653005601506653002501562653004101587653004501628653006201673100001501735700001501750856010401765 2005 eng d00aContemporary measurement techniques for rehabilitation outcomes assessment0 aContemporary measurement techniques for rehabilitation outcomes a339-3450 v373 aIn this article, we review the limitations of traditional rehabilitation functional outcome instruments currently in use within the rehabilitation field to assess Activity and Participation domains as defined by the International Classification of Function, Disability, and Health. These include a narrow scope of functional outcomes, data incompatibility across instruments, and the precision vs feasibility dilemma. Following this, we illustrate how contemporary measurement techniques, such as item response theory methods combined with computer adaptive testing methodology, can be applied in rehabilitation to design functional outcome instruments that are comprehensive in scope, accurate, allow for compatibility across instruments, and are sensitive to clinically important change without sacrificing their feasibility. Finally, we present some of the pressing challenges that need to be overcome to provide effective dissemination and training assistance to ensure that current and future generations of rehabilitation professionals are familiar with and skilled in the application of contemporary outcomes measurement.10a*Disability Evaluation10aActivities of Daily Living/classification10aDisabled Persons/classification/*rehabilitation10aHealth Status Indicators10aHumans10aOutcome Assessment (Health Care)/*methods/standards10aRecovery of Function10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, Non-P.H.S.10aSensitivity and Specificity computerized adaptive testing1 aJette, A M1 aHaley, S M uhttp://www.iacat.org/content/contemporary-measurement-techniques-rehabilitation-outcomes-assessment03707nas a2200481 4500008004100000245005200041210005200093300001200145490000700157520221100164653001902375653002902394653005802423653001002481653005302491653000902544653001102553653002502564653002602589653003302615653001102648653001002659653000902669653001602678653002402694653007402718653001802792653002902810653005802839653003102897653003202928653003602960653003202996100001503028700001603043700001603059700001603075700001003091700001403101700001803115700001503133856007703148 2004 eng d00aActivity outcome measurement for postacute care0 aActivity outcome measurement for postacute care aI49-1610 v423 aBACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAftercare/*standards/statistics & numerical data10aAged10aBoston10aCognition/physiology10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHuman10aMale10aMiddle Aged10aMovement/physiology10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aPsychometrics10aQuestionnaires/standards10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aSensitivity and Specificity10aSupport, U.S. Gov't, Non-P.H.S.10aSupport, U.S. Gov't, P.H.S.1 aHaley, S M1 aCoster, W J1 aAndres, P L1 aLudlow, L H1 aNi, P1 aBond, T L1 aSinclair, S J1 aJette, A M uhttp://www.iacat.org/content/activity-outcome-measurement-postacute-care04032nas a2200433 4500008004100000245012300041210006900164260000800233300001200241490000700253520252400260653001902784653002902803653005802832653001002890653000902900653002202909653002602931653003302957653001102990653001103001653000903012653001603021653007403037653003003111653003603141653005803177653003103235653004503266653004103311653003203352100001603384700001503400700001603415700001603431700001403447700001203461856012503473 2004 eng d00aRefining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain0 aRefining the conceptual basis for rehabilitation outcome measure cJan aI62-1720 v423 aBACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAged10aAged, 80 and over10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aQuestionnaires/*standards10aRecovery of Function/physiology10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aResearch Support, U.S. Gov't, Non-P.H.S.10aResearch Support, U.S. Gov't, P.H.S.10aSensitivity and Specificity1 aCoster, W J1 aHaley, S M1 aAndres, P L1 aLudlow, L H1 aBond, T L1 aNi, P S uhttp://www.iacat.org/content/refining-conceptual-basis-rehabilitation-outcome-measurement-personal-care-and-instrumental02888nas a2200301 4500008004100000020002200041245013700063210006900200250001500269260000800284300001000292490000700302520186600309653001102175653003302186653001102219653004602230653002402276653002902300653003002329653002902359100001502388700001602403700001602419700001602435700001002451856012502461 2004 eng d a0003-9993 (Print)00aScore comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care0 aScore comparability of short forms and computerized adaptive tes a2004/04/15 cApr a661-60 v853 aOBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals.10aBoston10aFactor Analysis, Statistical10aHumans10aOutcome Assessment (Health Care)/*methods10aProspective Studies10aQuestionnaires/standards10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aCoster, W J1 aAndres, P L1 aKosinski, M1 aNi, P uhttp://www.iacat.org/content/score-comparability-short-forms-and-computerized-adaptive-testing-simulation-study-activity01774nas a2200289 4500008004100000245007700041210006900118300001400187490000700201520080000208653002501008653003101033653003701064653003801101653001901139653001001158653002701168653004601195653002001241653002801261653003201289653001801321100001401339700001701353700001501370856009901385 2000 eng d00aItem response theory and health outcomes measurement in the 21st century0 aItem response theory and health outcomes measurement in the 21st aII28-II420 v383 aItem response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods.10a*Models, Statistical10aActivities of Daily Living10aData Interpretation, Statistical10aHealth Services Research/*methods10aHealth Surveys10aHuman10aMathematical Computing10aOutcome Assessment (Health Care)/*methods10aResearch Design10aSupport, Non-U.S. Gov't10aSupport, U.S. Gov't, P.H.S.10aUnited States1 aHays, R D1 aMorales, L S1 aReise, S P uhttp://www.iacat.org/content/item-response-theory-and-health-outcomes-measurement-21st-century01772nas a2200265 4500008004100000245004900041210004800090300001000138490000600148520092600154653002501080653005701105653001501162653001201177653001001189653002101199653006401220653001101284653002201295653001101317653000901328653007801337100001701415856007401432 1999 eng d00aCompetency gradient for child-parent centers0 aCompetency gradient for childparent centers a35-520 v33 aThis report describes an implementation of the Rasch model during the longitudinal evaluation of a federally-funded early childhood preschool intervention program. An item bank is described for operationally defining a psychosocial construct called community life-skills competency, an expected teenage outcome of the preschool intervention. This analysis examined the position of teenage students on this scale structure, and investigated a pattern of cognitive operations necessary for students to pass community life-skills test items. Then this scale structure was correlated with nationally standardized reading and math achievement scores, teacher ratings, and school records to assess its validity as a measure of the community-related outcome goal for this intervention. The results show a functional relationship between years of early intervention and magnitude of effect on the life-skills competency variable.10a*Models, Statistical10aActivities of Daily Living/classification/psychology10aAdolescent10aChicago10aChild10aChild, Preschool10aEarly Intervention (Education)/*statistics & numerical data10aFemale10aFollow-Up Studies10aHumans10aMale10aOutcome and Process Assessment (Health Care)/*statistics & numerical data1 aBezruczko, N uhttp://www.iacat.org/content/competency-gradient-child-parent-centers01910nas a2200229 4500008004100000245008600041210006900127300001000196490000700206520111400213653003201327653003701359653001001396653003401406653003001440653002901470653003201499100001601531700001801547700001301565856010201578 1999 eng d00aThe use of Rasch analysis to produce scale-free measurement of functional ability0 ause of Rasch analysis to produce scalefree measurement of functi a83-900 v533 aInnovative applications of Rasch analysis can lead to solutions for traditional measurement problems and can produce new assessment applications in occupational therapy and health care practice. First, Rasch analysis is a mechanism that translates scores across similar functional ability assessments, thus enabling the comparison of functional ability outcomes measured by different instruments. This will allow for the meaningful tracking of functional ability outcomes across the continuum of care. Second, once the item-difficulty order of an instrument or item bank is established by Rasch analysis, computerized adaptive testing can be used to target items to the patient's ability level, reducing assessment length by as much as one half. More importantly, Rasch analysis can provide the foundation for "equiprecise" measurement or the potential to have precise measurement across all levels of functional ability. The use of Rasch analysis to create scale-free measurement of functional ability demonstrates how this methodlogy can be used in practical applications of clinical and outcome assessment.10a*Activities of Daily Living10aDisabled Persons/*classification10aHuman10aOccupational Therapy/*methods10aPredictive Value of Tests10aQuestionnaires/standards10aSensitivity and Specificity1 aVelozo, C A1 aKielhofner, G1 aLai, J-S uhttp://www.iacat.org/content/use-rasch-analysis-produce-scale-free-measurement-functional-ability