Evaluating the reliability of gestalt quality ratings of medical education podcasts: A METRIQ study

Andrew Tagg
Brent Brent Thoma
Damian Roland
Jason M. Jason M. Woods
Jeff Riddell
Teresa M. Chan
Soort article
Original Research
Verscheen in

Introduction Podcasts are increasingly being used for medical education. Studies have found that the assessment of the quality of online resources can be challenging. We sought to determine the reliability of gestalt quality assessment of education podcasts in emergency medicine.

Methods An international, interprofessional sample of raters was recruited through social media, direct contact, and the extended personal network of the study team. Each participant listened to eight podcasts (selected to include a variety of accents, number of speakers, and topics) and rated the quality of that podcast on a seven-point Likert scale. Phi coefficients were calculated within each group and overall. Decision studies were conducted using a phi of 0.8.

Results A total of 240 collaborators completed all eight surveys and were included in the analysis. Attendings, medical students, and physician assistants had the lowest individual-level variance and thus the lowest number of required raters to reliably evaluate quality (phi >0.80). Overall, 20 raters were required to reliably evaluate the quality of emergency medicine podcasts.

Discussion Gestalt ratings of quality from approximately 20 health professionals are required to reliably assess the quality of a podcast. This finding should inform future work focused on developing and validating tools to support the evaluation of quality in these resources.


Zorgverleners voor de wereld van morgen

15 en 16 mei Hotel Zuiderduin in Egmond aan Zee