Back to the list
Congress: ECR25
Poster Number: C-27006
Type: Poster: EPOS Radiologist (scientific)
Authorblock: M. Balaguer-Montero1, A. Marcos Morales1, M. Ligero2, D. Leiva1, L. M. Atlagich1, N. Staikoglou1, C. Zatse1, C. Monreal1, R. Perez Lopez1; 1Barcelona/ES, 2Dresden/DE
Disclosures:
Maria Balaguer-Montero: Nothing to disclose
AdriĆ  Marcos Morales: Nothing to disclose
Marta Ligero: Nothing to disclose
David Leiva: Nothing to disclose
Luz Maria Atlagich: Nothing to disclose
Nikolaos Staikoglou: Nothing to disclose
Christina Zatse: Nothing to disclose
Camilo Monreal: Nothing to disclose
Raquel Perez Lopez: Nothing to disclose
Keywords: Artificial Intelligence, Liver, Oncology, CT, Computer Applications-General, Observer performance, Segmentation, Cancer
Results

Accuracy of SALSA

The nnU-Net framework clearly outperformed the other two architectures in segmentation capabilities (Figure 3), making it the representative model for SALSA.

Fig 3: Delineation performance calculated using the Dice Similarity Coefficient (DSC) for the three trained architectures in the test (orange) and external validation sets (blue), for both patient-wise (left) and at the tumor-wise level (right).

Our proposed model shows high accuracy in liver tumor detection with a patient-wise precision of 99.65%, and a recall of 94.17% for the external validation cohort. When considering each lesion individually, SALSA obtained a lesion-by-lesion detection precision of 81.72% and a recall of 57.92% in the same dataset (Figure 4, Table 2).

Fig 4: Results for all models and approaches. Spider plots displaying all metrics for the evaluation of detection (precision, recall and F1-score) and delineation (DSC, JI).

Table 2: Liver tumor detection and delineation performance of SALSA at both the patient and tumor-wise levels. Dice Similarity Coefficient (DSC); Jaccard's Index (JI)

In parallel, the tumor masks automatically generated by SALSA exhibit good overlap with the ground truth (Figure 5). Both approaches reported high values in segmentation metrics for both the test (patient-wise Dice Similarity Coefficient (DSC) of 0.738 and tumor-wise DSC of 0.761) and external validation (patient-wise DSC of 0.737 and tumor-wise DSC of 0.760) cohorts (Table 2).

Fig 5: Visual inspection of the automatically delineated contours. Representative cases of liver tumors delineated by SALSA (red) alongside the ground truth (blue) segmented masks, yellow dashed boxes indicate the magnification done for better visualization.

 

Benchmark to the state-of-the-art

Moreover, we benchmarked our tool with the current best model in the literature [2] and outperformed it in both detection and segmentation tasks. With a precision for liver tumor detection of 28.06%, 82.61% recall, and 0.714 DSC in our test set and 54.03%, 85.47%, and 0.690, respectively, in the external validation cohort (Table 3), SALSA has been proven to benchmark the top-performing model, benefitting from having a larger, more heterogeneous and real-world set of cohorts of liver tumors.

Table 3: SALSA benchmarks against the LiTS top-performing model in both detecting and delineating liver tumors. (*) The external validation set considered for Table 3 excluded the LiTS training dataset for these calculations, meaning that the external validation dataset is comprised of the MSD Hepatic Vessels dataset, the TCIA-CRLM and the HCC-TACE-Seg dataset, accounting for a total of 582 scans. This is done to avoid bias on the results, as the LiTS model has been trained on the excluded data.

 

Radiologists’ inter-variability assessment

To explore the variability among radiologists in detecting and delineating liver tumors, we randomly selected a group of 25 patients from our test cohort. Three radiologists, blinded to the ground truth, delineated all liver tumors in each case. All outlines created by both the radiologists and the models were measured against a ground truth, specifically, masks segmented manually by the reference expert radiologist (Rad 1), allowing us to assess the intra and inter-variability among radiologists.

Our findings revealed that SALSA's performance in outlining liver tumors (0.763 DSC) was comparable to, or even better than, the level of agreement observed by each of the two blinded radiologists (Rad 2 and Rad 3) used for the inter-radiologists’ variability assessment, obtaining a DSC of 0.777 and 0.703, respectively (Figure 6 and Table 4).

Fig 6: Comparison of tumor delineation overlaps in the variability substudy, made by three expert radiologists and the three trained models.

Table 4: Comparison of tumor segmentations overlaps (DSC) between tumor delineations conducted by the same radiologist (Radiologist 1 on two separate occasions) and compared to those performed by two different radiologists (Radiologist 2 and Radiologist 3), who were blinded to the ground-truth. The explored computational models' outputs are also depicted to provide a comparison to the manual delineations. Dice Similarity Coefficient (DSC); Jaccard's Index (JI)

Moreover, for expert validation purposes, a user-friendly web application was developed to allow direct comparison of radiologist preferences between manual segmentations and those generated by the SALSA tool. The application, available at https://radiomics.vhio.net/salsa/, featured the entire liver volume as a scrollable element and allowed for window adjustment and navigation to aid radiologists in accurately evaluating the quality of the contours, depicted over the scan using random colors in order to avoid biasing the choice. Also, a second module for the use of SALSA on any scan is available at the same domain.

Prognostic power of automatic tumor burden quantification

The prognostic value of liver cancer burden was assessed using the automatic quantification of Total Tumor Volume (TTV) generated by SALSA. Analysis was conducted on data including the test set, 141 patients, and all the cases from the TCIA-CRLM external validation cohort, 197 patients. The association between TTV and clinical outcome in the form of Overall Survival (OS) was studied. The results revealed that a higher liver cancer burden is associated with a poorer prognosis (p=0.028, HR; 95% CI = 1.692; 1.055, 2.715) in both datasets explored (Figure 7).

Fig 7: Total Tumor Volume as a prognostic biomarker. Kaplan-Meier curves and log-rank test results for overall survival (OS) in the 141 patients from the test set (left) and the 197 patients from The Cancer Imaging Archive (TCIA) - Colorectal Liver Metastases (CRLM) dataset of the external validation cohort (right).

GALLERY