Utility of a multimodal computer-based assessment format for assessment with a higher degree of reliability and validity

Abstract

Utility of a multimodal computer-based assessment format for assessment with a higher degree of reliability and validity

Informatie

Auteurs

C. P. M. van der Vleuten

C.F. Collares

J.W. Renes

Organisatie

Universiteit Maastricht

Congres

Samen leren, samen werken - Congres 2023

Context / probleemstelling of aanleiding

Probleemstelling (inclusief theoretische onderbouwing en onderzoeksvraag/vragen):

The theory that assessment drives learning is well recognized and still holds (Schuwirth & Van der Vleuten 2020). Multiple choice questions (MCQs) are often used for large group assessments because they efficiently assess cognitive knowledge in an objective manner with high validity and reliability. However, MCQs suffer from cueing, item quality and factual knowledge testing. If the aim of testing is less on recognizing correct answers and more on synthesizing correct answers, then other test formats besides MCQs might provide greater validity. Computer-based assessment (CBA) offers the technical possibility to present a variety of content-rich item types with the addition of different multimedia types like videos and images. Here we present a novel multimodal test containing alternative item types in a computer-based assessment (CBA) format, designated as Proxy-CBA. Two research questions were postulated: 1) how do the MCQ-CBA and the Proxy-CBA formats relate to each other concerning reliability and validity and 2) how do test-takers experience the cognitive testing possibilities with both CBA formats?

Methode:

The MCQ-CBA was conducted with 65 MCQ items in plain text formats, while the Proxy-CBA was conducted with 65 items, consisting of 47 very short answer questions, from which 5 with more than 1 answer option, 8 MCQs, 5 multiple response questions, 4 key feature questions and 1 extended matching question. The Proxy-CBA was compared to the MCQ-CBA, regarding validity, reliability, standard error of measurement and cognitive load, using a quasi-experimental cross-over design. Biomedical students were randomized into 2 groups to sit a 65-item formative exam starting with the MCQ-CBA followed by the Proxy-CBA (group 1, n=38), or the reverse (group 2, n=35). Subsequently, a questionnaire on perceived cognitive load was taken, answered by 71 participants, by using an adapted Assessment Preference Inventory. Results from both CBA formats and questionnaire were analysed according to parameters of the Classical Test Theory and the Rasch model.

Resultaten (en conclusie):

According to CTT percentage of correct answers (t(64) = -3.20, p = 0.002, Cohen’s d = 0.397) and Rasch difficulty-based b parameters (t(63) = -7.61, p < 0.001, Cohen's d = -0.951), items from the Proxy-CBA were more difficult compared to the MCQ-CBA. Based on comparison of item-rest correlations (t(64) = -2.67, p = 0.010, Cohens’s d = 0.331) the Proxy-CBA had higher discrimination compared to the MCQ-CBA. Compared to the MCQ-CBA, the Proxy-CBA had lower raw scores (p < 0.001, η² = 0.276), higher reliability estimates (p < 0.001, η²= 0.498), lower SEM estimates (p < 0.001, η²= 0.807) and lower theta ability scores (p < 0.001, η² = 0.288). The questionnaire revealed no significant differences between both CBA tests regarding perceived cognitive load.

Discussie (beschouwing resultaten en conclusie in het kader van de theorie):

Results from this study showed the increased validity and reliability properties of the Proxy-CBA compared to the MCQ-CBA, without a higher cognitive load. As such, this study provides evidence that the Proxy-CBA may be used as a multimodal assessment instrument with the potential to improve existing assessment programs. The use of the Rasch model with samples smaller than 50 might lead to paradoxal results in comparison to larger samples, which limits the generalizability of the findings. Still, despite the difference in the methods, the results obtained in this study are aligned with results from Sam et al. (2018), in terms of demonstrating lower performance of MCQs.

Referenties:

Schuwirth LW, van der Vleuten CP. 2020. A history of assessment in medical education. Adv Health Sci Educ. 25:1045-1056.

Sam AH, Field SM, Collares CF, van der Vleuten CPM, Wass VJ, Melville C, Harris J, Meeran K. 2018. Very‐short‐answer questions: reliability, discrimination and acceptability. Med Educ. 52:447-455.

Terug naar het overzicht

NVMO Congres 2026

Toekomstbestendig onderwijs: op naar een duurzame planeet

28 en 29 mei 2026

Lees meer

Meer NVMO

Toekomstbestendig onderwijs: op naar een duurzame planeet

28 en 29 mei in Groningen

Meer over het NVMO congres