PRESIDENTIAL ADDRESS: How Adaptive Is an Adaptive Test: Are all Adaptive Tests Adaptive?
Prof. Mark Reckase, Distinguished Professor Emeritus, Michigan State University & Incoming IACAT President
Abstract: There are many different kinds of adaptive tests but they all have the characteristic that some feature of the test is customized to the purpose of the test. In the time allotted, it is impossible to consider the adaptation of all of this types so this address will focus on the “classic” adaptive test that matches the difficulty of the test to the capabilities of the person being tested. This address will first present information on the maximum level of adaptation that can occur and then compare the amount of adaptation that typically occurs on an operational adaptive test to the maximum level of adaptation. An index is proposed to sumarize the amount of adaptation and it is argued that this type of index should be reported for operational adaptive tests to show the amount of adaptation that typically occurs.
Bio: Mark D. Reckase is a University Distinguished Professor Emeritus at Michigan State University where he has taught courses in psychometric theory and various apsects of item response theory. He has also done work on standard setting procedures for educational and licensure tests, the use of statistical models for evaluating the performance of teachers, international studies of the preparation of teachers of mathematics, and the design and implementation of computerized adaptive tests. He has been the editor of Applied Psychological Measurement and the Journal of Educational Measurement. He has been the president of the National Council on Measurement in Education (NCME), the vice president of Division D of the American Educational Research Association, and the secretary of the Psychometric Society.
KEYNOTE 1: From blueprints to systems: an integrated approach to adaptive testing
Gage Kingsbury, Psychometric Consultant, Tony Zara, Pearson Vue
Abstract: For years, test blueprints have told test developers how many items and what types of items will be included in a test. Adaptive testing adopted this approach from paper testing, and it is reasonably useful. Unfortunately, 'how many items and what types of items' are not all the elements one should consider when choosing items for an adaptive test. To fill in gaps, practitioners have developed tools to allow an adaptive test to behave appropriately (i.e. examining exposure control, content balancing, item drift procedures, etc.). Each of these tools involves the use of a separate process external to the primary item selection process.
The use of these subsidiary processes makes item selection less optimal and makes it difficult to prioritize aspects of selection. This discussion describes systems-based adaptive testing. This approach uses metadata concerning items, test takers and test elements to select items. These elements are weighted by the stakeholders to shape an expanded blueprint designed for adaptive testing.
Bio: Gage Kingsbury is a private consultant providing advice and development work in the application of modern psychometrics and technology to practical assessment situations. He has developed procedures for item selection that are us use in many operational adaptive tests. Gage helped to design the NWEA MAP system which delivers over 50,000,000 adaptive achievement tests yearly to K-12 students.
Bio: Tony Zara has been working with computerized adaptive testing for almost 30 years and has served as a vice president for Pearson VUE for the last 16 years. He was responsible for one of the first large-scale, high-stakes applications of computerized adaptive testing to professional licensing. His creative development and management has consistently aided the development and growth of adaptive testing from a theoretical concept to a widely accepted assessment tool around the world.
KEYNOTE 2: Grow a tiger out of your cat
Angela Verschoor, CITO
Abstract: The main focus in the community of test developers and researchers is on improving adaptive test procedures and methodologies. Yet, the transition from research projects to larger-scale operational CATs is facing its own challenges. Usually, these operational CATs find their origin in government tenders. “Scalability”, “Interoperability” and “Transparency” are three keywords often found in these documents. Scalability is concerned with parallel system architectures which are based upon stateless selection algorithms. Design capacities often range from 10,000 to well over 100,000 concurrent students. Interoperability is implemented in standards like QTI, standards that were not designed with adaptive testing in mind. Transparency is being realized by open source software: the adaptive test should not be a black box. These three requirements often complicate the development of an adaptive test, or sometimes even conflict.
Bio: Angela Verschoor is Senior Researcher at CITO, the Netherlands. With a background in discrete optimization, her interest is the development and application of automated test assembly (ATA), optimal design and computerized adaptive testing (CAT). She has been responsible for the design of pretests for large-scale projects such as the Final Primary Education Test in the Netherlands. Other recent projects included the introduction of ATA and CAT in, amongst others, the Netherlands, Georgia, Russia, Kazakhstan, the Philippines and Switzerland.
KEYNOTE 3: item selection rules and test security in computerized adaptive testing
Juan Barrada, University of Zaragoza
Abstract: The four objectives that must be optimized by a CAT are (a) accuracy, (b) item bank security, (c) content balance, and (d) test maintenance. Probably the larger amount of research has been related to item selection rules to increase test accuracy. In this session, I will review the differences among the proposed methods and I will present to new selection rules. I will discuss the possibility that there is option to increase accuracy beyond the accuracy obtained with more traditional selection rules. With respect to test security, I will review the different proposed item selection rules, a general limitation for obtaining the optimal rule when we simultaneously consider accuracy and security, a relevant limitation when assessing test security, and, finally, I will present a new method for improving test security.
Bio: Juan Ramón Barrada is Assistant Professor in the Universidad de Zaragoza (Spain). Dr. Barrada’s expertise lies in psychometrics and psychological testing, with a particular focus on computerized adaptive testing and the development and validation of instruments for psychological assessment. In the field of CATs, his research has focused in item selection rules and test security, with item response theory and cognitive diagnostic model as psychometric models. Since version 3, he is co-author with David Magis of the library catR of R. He is member of the editorial board of the Journal of Computerized Adaptive Testing, the Journal of Educational Measurement, and the International Journal of Testing.
KEYNOTE 4: computerized adaptive testing and English for specific purposes
Yukie Koyama, Nagoya Institute of Technology
Abstract: Computerized Adaptive Testing (CAT) in the language learning field has a history of approximately 30 years now, while many different kinds of CAT developments for the second language (L2) assessment have been reported (Chalhoub-Deville, 2001). In the meantime, English for Specific Purposes (ESP) has also attracted attention as an efficient way of L2 learning and teaching, so with this point in mind, this speech introduces a method of CAT development which utilizes ESP corpus analysis for item writing in order to ensure content validity.
Despite significant progress in CAT, and its advantages widely known by the members of testing and language learning professionals, CAT developments for L2, in most cases, have been conducted by large test developers such as Educational Testing Service (ETS) and University of Cambridge Local Examinations Syndicate (UCLES). This is because, for a small institute or a class-room unit, the process of CAT development seems more complicated and needs more time and expertise than they can deal with (Dunkel, 1997).
In order to give a clue to solve these problems, this speech introduces a small-scale in-house Computerized Adaptive Test, which should be manageable for ordinary class-room teachers. Using this method, one can construct an ESP adaptive test with clear purposes of the institute or of the course if appropriate corpora of the genre are compiled or selected to be analyzed. This kind of small-scale adaptive test gives specific information of the learners’ ability and, thus, possibly contributes to clarify ESP curriculum contents. Based on its implementation, some limitations and future implications of the test are also discussed.
Bio: Yukie Koyama is a Professor Emeritus of Nagoya Institute of Technology. Since she had been teaching at two engineering universities for eighteen years, English for Specific Purposes, especially for engineering students, is her main research interest. Her other interests include language testing including CAT development, corpus linguistics and its application, and intercultural understanding through language classes. She has developed computerized adaptive vocabulary tests based on corpus analysis for engineering students several times.