Chest radiography plays an essential role in the diagnostic framework of emergency departments (EDs), providing critical insights into a wide range of medical conditions. However, the task of interpreting these images poses significant challenges for radiology departments, primarily due to the increasing volume of radiographs and the consequent workload. Additionally, the necessity for specialized expertise and stringent quality control measures to minimize diagnostic errors becomes particularly acute during off-hours, when access to qualified radiologists may be limited. This shortfall in the availability of expert radiological evaluation highlights the potential benefits of artificial intelligence (AI) in improving both the accuracy and efficiency of chest radiograph assessments. By employing AI to effectively triage chest radiographs, there is an opportunity to alleviate the workload on radiologists, enabling them to prioritize the most critical cases. This approach not only reduces the pressure on radiology departments but also minimizes the risk of mismanagement of patients, thereby enhancing overall patient care.
In recent years, the development and regulatory approval of AI tools for tasks such as diagnosis and worklist triage in radiology signify a pivotal shift towards integrating AI into clinical practice. However, the deployment of these technologies in real-world clinical settings necessitates rigorous evaluation to ascertain their effectiveness and reliability. Previous studies have highlighted the potential of AI to improve diagnostic performance, especially among less experienced readers. Nonetheless, the clinical application of AI in radiology is still in its nascent stages, with a notable gap in the evaluation of these tools in consecutive, real-life patient samples. Consequently, there is a pressing need to assess the diagnostic accuracy of AI tools in real-life scenarios, which more closely reflect the conditions under which they would be deployed in clinical environments.
Prior to investigating the impact on various end users, we aimed to bridge this gap by evaluating the standalone diagnostic performance of the AI solution (MilvueSuite v2.1) in efficiently identifying 5 common chest findings classified as critical (pneumothorax, fracture and pleural effusion) and relevant (non-nodular pulmonary opacity and pulmonary nodule), within a real-life emergency department context from a large university hospital in France, independent from the training and validation of the AI solution.