Back to the list
Congress: ECR25
Poster Number: C-25294
Type: Poster: EPOS Radiologist (scientific)
Authorblock: J. F. Ojeda Esparza1, D. Botta1, A. Fitisiori1, C. Santarosa1, M. Pucci1, C. Meinzer2, Y-C. Yun2, K-O. Loevblad1, F. T. Kurz1; 1Geneva/CH, 2Heidelberg/DE
Disclosures:
Jose Federico Ojeda Esparza: Nothing to disclose
Daniele Botta: Nothing to disclose
Aikaterini Fitisiori: Nothing to disclose
Corrado Santarosa: Nothing to disclose
Marcella Pucci: Nothing to disclose
Clara Meinzer: Nothing to disclose
Yeong-Chul Yun: Nothing to disclose
Karl-Olof Loevblad: Nothing to disclose
Felix T Kurz: Nothing to disclose
Keywords: Artificial Intelligence, CNS, Catheter arteriography, CT, MR, Computer Applications-General, Diagnostic procedure, Technology assessment, Education and training, Image verification
Conclusion

Expert neuroradiologists outperformed all other groups. However, global response data appeared more accurate than that of residents, possibly due to variability in response volume. LLM-GPT achieved higher accuracy than residents, while LLM-GAG showed the lowest performance, emphasizing the need for further improvements in AI tools. These findings are consistent with previous studies evaluating the accuracy of large language models (LLMs) in medical question-answering tasks [5–7]. In this study, the inclusion of images in the questions did not appear to affect their success rate, either positively or negatively.

The performance difference between LLM-GPT and LLM-GAG may be attributed to the longer development period of LLM-GPT compared to LLM-GAG, which is a more recent release. This time difference likely impacts the models' maturity and optimization.

It is important to note that the small sample size of experts and residents may limit the generalizability of these results.

 

 

GALLERY