03113nas a2200145 4500008004100000245004900041210004500090260005500135520264100190653002802831653000802859653002102867100001702888856006202905 2017 eng d00aIs CAT Suitable for Automated Speaking Test?0 aCAT Suitable for Automated Speaking Test aNiigata, JapanbNiigata Seiryo Universityc08/20173 a
We have developed automated scoring system of Japanese speaking proficiency, namely SJ-CAT (Speaking Japanese Computerized Adaptive Test), which is operational for last few months. One of the unique features of the test is an adaptive test base on polytomous IRT.
SJ-CAT consists of two sections; Section 1 has sentence reading aloud tasks and a multiple choicereading tasks and Section 2 has sentence generation tasks and an open answer tasks. In reading aloud tasks, a test taker reads a phoneme-balanced sentence on the screen after listening to a model reading. In a multiple choice-reading task, a test taker sees a picture and reads aloud one sentence among three sentences on the screen, which describe the scene most appropriately. In a sentence generation task, a test taker sees a picture or watches a video clip and describes the scene with his/her own words for about ten seconds. In an open answer tasks, the test taker expresses one’s support for or opposition to e.g., a nuclear power generation with reasons for about 30 seconds.
In the course of the development of the test, we found many unexpected and unique characteristics of speaking CAT, which are not found in usual CATs with multiple choices. In this presentation, we will discuss some of such factors that are not previously noticed in our previous project of developing dichotomous J-CAT (Japanese Computerized Adaptive Test), which consists of vocabulary, grammar, reading, and listening. Firstly, we will claim that distribution of item difficulty parameters depends on the types of items. An item pool with unrestricted types of items such as open questions is difficult to achieve ideal distributions, either normal distribution or uniform distribution. Secondly, contrary to our expectations, open questions are not necessarily more difficult to operate in automated scoring system than more restricted questions such as sentence reading, as long as if one can set up suitable algorithm for open question scoring. Thirdly, we will show that the speed of convergence of standard deviation of posterior distribution, or standard error of theta parameter in polytomous IRT used for SJCAT is faster than dichotomous IRT used in J-CAT. Fourthly, we will discuss problems in equation of items in SJ-CAT, and suggest introducing deep learning with reinforcement learning instead of equation. And finally, we will discuss the issues of operation of SJ-CAT on the web, including speed of scoring, operation costs, security among others.
10aAutomated Speaking Test10aCAT10alanguage testing1 aImai, Shingo uhttp://www.iacat.org/cat-suitable-automated-speaking-test00583nas a2200121 4500008004600000245017700046210006900223300001100292490000600303100001100309700001700320856012400337 2013 Engldish 00aA Comparison of Four Methods for Obtaining Information Functions for Scores From Computerized Adaptive Tests With Normally Distributed Item Difficulties and Discriminations0 aComparison of Four Methods for Obtaining Information Functions f a88-1070 v11 aIto, K1 aSegall, D.O. uhttp://www.iacat.org/content/comparison-four-methods-obtaining-information-functions-scores-computerized-adaptive-tests00616nas a2200121 4500008004100000245012600041210006900167260009700236100001100333700001700344700001400361856011900375 2009 eng d00aAn evaluation of a new procedure for computing information functions for Bayesian scores from computerized adaptive tests0 aevaluation of a new procedure for computing information function aD. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing.1 aIto, K1 aPommerich, M1 aSegall, D uhttp://www.iacat.org/content/evaluation-new-procedure-computing-information-functions-bayesian-scores-computerized00633nas a2200181 4500008004100000245006000041210005700101260009700158100001200255700001100267700001600278700001500294700001300309700001600322700001300338700001600351856008400367 2009 eng d00aFeatures of J-CAT (Japanese Computerized Adaptive Test)0 aFeatures of JCAT Japanese Computerized Adaptive Test aD. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing.1 aImai, S1 aIto, S1 aNakamura, Y1 aKikuchi, K1 aAkagi, Y1 aNakasono, H1 aHonda, A1 aHiramura, T uhttp://www.iacat.org/content/features-j-cat-japanese-computerized-adaptive-test01614nas a2200145 4500008003900000245009500039210006900134300001200203490000700215520113800222100001801360700001801378700001901396856005301415 2008 d00aA Strategy for Controlling Item Exposure in Multidimensional Computerized Adaptive Testing0 aStrategy for Controlling Item Exposure in Multidimensional Compu a215-2320 v683 aAlthough computerized adaptive tests have enjoyed tremendous growth, solutions for important problems remain unavailable. One problem is the control of item exposure rate. Because adaptive algorithms are designed to select optimal items, they choose items with high discriminating power. Thus, these items are selected more often than others, leading to both overexposure and underutilization of some parts of the item pool. Overused items are often compromised, creating a security problem that could threaten the validity of a test. Building on a previously proposed stratification scheme to control the exposure rate for one-dimensional tests, the authors extend their method to multidimensional tests. A strategy is proposed based on stratification in accordance with a functional of the vector of the discrimination parameter, which can be implemented with minimal computational overhead. Both theoretical and empirical validation studies are provided. Empirical results indicate significant improvement over the commonly used method of controlling exposure rate that requires only a reasonable sacrifice in efficiency.
1 aLee, Yi-Hsuan1 aIp, Edward, H1 aFuh, Cheng-Der uhttp://epm.sagepub.com/content/68/2/215.abstract03157nas a2200493 4500008004100000020002200041245008900063210006900152250001500221260000800236300001000244490000700254520169600261653003401957653002001991653001502011653001002026653000902036653002602045653003202071653003102103653001102134653001102145653000902156653003202165653001602197653002902213653004402242653002902286653003102315653003102346653001702377100001702394700001402411700001602425700001302441700001702454700002202471700001702493700001402510700001402524700001702538856010802555 2008 eng d a1075-2730 (Print)00aUsing computerized adaptive testing to reduce the burden of mental health assessment0 aUsing computerized adaptive testing to reduce the burden of ment a2008/04/02 cApr a361-80 v593 aOBJECTIVE: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. METHODS: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. RESULTS: Tests of competing models based on item response theory supported the scale's bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients--one with bipolar disorder and one without--on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. CONCLUSIONS: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden.10a*Diagnosis, Computer-Assisted10a*Questionnaires10aAdolescent10aAdult10aAged10aAgoraphobia/diagnosis10aAnxiety Disorders/diagnosis10aBipolar Disorder/diagnosis10aFemale10aHumans10aMale10aMental Disorders/*diagnosis10aMiddle Aged10aMood Disorders/diagnosis10aObsessive-Compulsive Disorder/diagnosis10aPanic Disorder/diagnosis10aPhobic Disorders/diagnosis10aReproducibility of Results10aTime Factors1 aGibbons, R D1 aWeiss, DJ1 aKupfer, D J1 aFrank, E1 aFagiolini, A1 aGrochocinski, V J1 aBhaumik, D K1 aStover, A1 aBock, R D1 aImmekus, J C uhttp://www.iacat.org/content/using-computerized-adaptive-testing-reduce-burden-mental-health-assessment00655nas a2200121 4500008004100000245015200041210006900193260009700262100001700359700001700376700001400393856012600407 2007 eng d00aPatient-reported outcomes measurement and computerized adaptive testing: An application of post-hoc simulation to a diagnostic screening instrument0 aPatientreported outcomes measurement and computerized adaptive t aD. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing.1 aImmekus, J C1 aGibbons, R D1 aRush, J A uhttp://www.iacat.org/content/patient-reported-outcomes-measurement-and-computerized-adaptive-testing-application-post-hoc10201nas a2200553 4500008004100000245016300041210006900204300001400273490000700287520862500294653001808919653003208937100001608969700001608985700001409001700001609015700001409031700001509045700001609060700001409076700001509090700001509105700001409120700001709134700001709151700001709168700001609185700001709201700001609218700001509234700001609249700001709265700002309282700001609305700002209321700001609343700001609359700001709375700001309392700001509405700001309420700001609433700001209449700001409461700001609475700002009491700001209511856012409523 2005 eng d00aToward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire0 aToward efficient and comprehensive measurement of the alcohol pr a1180-11890 v293 aBackground: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association10aPsychometrics10aSubstance-Related Disorders1 aKahler, C W1 aStrong, D R1 aRead, J P1 aDe Boeck, P1 aWilson, M1 aActon, G S1 aPalfai, T P1 aWood, M D1 aMehta, P D1 aNeale, M C1 aFlay, B R1 aConklin, C A1 aClayton, R R1 aTiffany, S T1 aShiffman, S1 aKrueger, R F1 aNichol, P E1 aHicks, B M1 aMarkon, K E1 aPatrick, C J1 aIacono, William, G1 aMcGue, Matt1 aLangenbucher, J W1 aLabouvie, E1 aMartin, C S1 aSanjuan, P M1 aBavly, L1 aKirisci, L1 aChung, T1 aVanyukov, M1 aDunn, M1 aTarter, R1 aHandel, R W1 aBen-Porath, Y S1 aWatt, M uhttp://www.iacat.org/content/toward-efficient-and-comprehensive-measurement-alcohol-problems-continuum-college-students00618nas a2200157 4500008004100000245012800041210006900169300001300238490000700251100001700258700001600275700001500291700002100306700001300327856012000340 2002 eng d00aCan examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? 0 aCan examinees use judgments of item difficulty to improve profic a 311-3300 v391 aVispoel, W P1 aClough, S J1 aBleiler, T1 aHendrickson, A B1 aIhrig, D uhttp://www.iacat.org/content/can-examinees-use-judgments-item-difficulty-improve-proficiency-estimates-computerized00562nas a2200121 4500008004100000245009500041210006900136260008200205100001300287700001200300700001300312856011500325 2002 eng d00aA strategy for controlling item exposure in multidimensional computerized adaptive testing0 astrategy for controlling item exposure in multidimensional compu aAvailable from http://www3. tat.sinica.edu.tw/library/c_tec_rep/c-2002-11.pdf1 aLee, Y H1 aIp, E H1 aFuh, C D uhttp://www.iacat.org/content/strategy-controlling-item-exposure-multidimensional-computerized-adaptive-testing00572nas a2200133 4500008004100000245012700041210006900168260001600237100001700253700001600270700001700286700001300303856012200316 2001 eng d00aCan examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests?0 aCan examinees use judgments of item difficulty to improve profic a Seattle WA1 aVispoel, W P1 aClough, S J1 aBleiler, A B1 aIhrig, D uhttp://www.iacat.org/content/can-examinees-use-judgments-item-difficulty-improve-proficiency-estimates-computerized-000627nas a2200157 4500008004100000245011800041210006900159260002100228100001700249700001900266700001500285700001600300700001500316700001300331856012500344 1999 eng d00aLimiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results0 aLimiting answer review and change on computerized adaptive vocab aMontreal, Canada1 aVispoel, W P1 aHendrickson, A1 aBleiler, T1 aWidiatmo, H1 aShrairi, S1 aIhrig, D uhttp://www.iacat.org/content/limiting-answer-review-and-change-computerized-adaptive-vocabulary-tests-psychometric-and-000488nas a2200121 4500008004100000245008200041210006900123260002100192100001700213700001900230700001500249856010200264 1999 eng d00aStudy of methods to detect aberrant response patterns in computerized testing0 aStudy of methods to detect aberrant response patterns in compute aMontreal, Canada1 aIwamoto, C K1 aNungester, R J1 aLuecht, RM uhttp://www.iacat.org/content/study-methods-detect-aberrant-response-patterns-computerized-testing00401nas a2200097 4500008004100000245007400041210006900115100001300184700001100197856009500208 1995 eng d00aEstimation of item difficulty from restricted CAT calibration samples0 aEstimation of item difficulty from restricted CAT calibration sa1 aSykes, R1 aIto, K uhttp://www.iacat.org/content/estimation-item-difficulty-restricted-cat-calibration-samples00512nas a2200109 4500008004100000245013000041210006900171260001600240100001100256700001500267856012000282 1994 eng d00aThe effect of restricting ability distributions in the estimation of item difficulties: Implications for a CAT implementation0 aeffect of restricting ability distributions in the estimation of aNew Orleans1 aIto, K1 aSykes, R C uhttp://www.iacat.org/content/effect-restricting-ability-distributions-estimation-item-difficulties-implications-cat00525nas a2200121 4500008003900000245008900039210006900128260004700197100001900244700001600263700001500279856010900294 1988 d00aThe four generations of computerized educational measurement (Research Report 98-35)0 afour generations of computerized educational measurement Researc aPrinceton NJ: Educational Testing Service.1 aBunderson, C V1 aInouye, D K1 aOlsen, J B uhttp://www.iacat.org/content/four-generations-computerized-educational-measurement-research-report-98-3500522nas a2200121 4500008004100000245006500041210006100106260009600167100001900263700001600282700001500298856008700313 1986 eng d00aThe four generations of computerized educational measurement0 afour generations of computerized educational measurement aIn R. L. Linn (Ed.), Educational Measurement (3rd ed and pp. 367-407). New York: Macmillan.1 aBunderson, C V1 aInouye, D K1 aOlsen, J B uhttp://www.iacat.org/content/four-generations-computerized-educational-measurement00533nam a2200097 4500008004100000245013000041210006900171260005700240100001700297856012100314 1977 eng d00aAn application of the Rasch one-parameter logistic model to individual intelligence testing in a tailored testing environment0 aapplication of the Rasch oneparameter logistic model to individu aDissertation Abstracts International, 37 (9-A), 57661 aIreland, C M uhttp://www.iacat.org/content/application-rasch-one-parameter-logistic-model-individual-intelligence-testing-tailored