Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study. | IACAT

Submitted by bbriley on Mon, 09/09/2013 - 10:48

Title	Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study.
Publication Type	Journal Article
Year of Publication	2012
Authors	Riley, BB, Carle, AC
Journal	BMC Med Res Methodol
Volume	12
Pagination	124
Date Published	2012
Publication Language	eng
ISSN	1471-2288
Keywords	Bayes Theorem, Data Interpretation, Statistical, Humans, Mathematical Computing, Monte Carlo Method, Outcome Assessment (Health Care)
Abstract	BACKGROUND: Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected. METHOD: Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust Z (RZ) test, and (2) 95% credible intervals (CrI) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in θ estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition. RESULTS: Both methods evidenced good to excellent false positive control, with RZ providing better control of false positives and with slightly higher power for CrI, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. RZ outperformed CrI, due to better control of false positive DIF. CONCLUSIONS: Whereas false positives were well controlled, particularly for RZ, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.
DOI	10.1186/1471-2288-12-124
Alternate Journal	BMC Med Res Methodol
PubMed ID	22900979
PubMed Central ID	PMC3552735
Grant List	R21 DA 025371 / DA / NIDA NIH HHS / United States