We first evaluated our approach on the test data comprising 300 samples. We achieved an overall accuracy of 90% and an area under the receiver operating characteristic (AUROC) of 70.8% for predicting all 23 labels. We achieved 97% accuracy and 84.1% AUROC for the eight top-performing labels, indicating its strong ability to differentiate between normal and abnormal cases with a high degree of confidence.
To further validate the reliability of our method in a real-world clinical setting, we tested it on a private dataset of 60 CT scans. A radiologist thoroughly reviewed the AI-generated results, comparing them with expert assessments. The validation confirmed that our approach provides clinically relevant and accurate predictions, supporting its potential integration into diagnostic workflows.
Figure 2 presents an example of an AI-generated report that includes comprehensive findings on chest abnormalities along with lung nodule detection results. This automated report is designed to assist radiologists by providing a structured summary of detected abnormalities, enhancing efficiency in clinical decision-making. By combining detailed imaging insights with AI-driven analysis, the system helps streamline the diagnostic process and ensures that critical findings are easily accessible for review.