Patient Demographics
A total of 120 patients with IPMNs (mean age, 65.90 years ± 10.49 [standard deviation]) were included in our study, of whom 67 (55.83%) had benign IPMNs, while 53 (44.16%) had malignant IPMNs (22 (18.33%). (Table 2). The mean interval between CT examination and surgery or biopsy was 31 days (range, 1–80 days). Of the 120 patients, 104 (86.67%) underwent surgery and 16 (13.33%) underwent endoscopic US-guided biopsy. All 16 patients with US-confirmed IPMNs had high-grade dysplasia or invasive carcinoma.
Interobserver Agreement and Number of CT Features of Pancreatic IPMNs
In regards to high-risk stigmata (Table 2), the interobserver agreement demonstrated substantial consistency for an MPD size of ≥ 10 mm (κ = 0.64) and moderate consistency for enhancing mural nodules ≥ 5 mm (κ = 0.51). Interobserver agreement was moderate for infiltrative masses (κ = 0.54).
In regards to worrisome features, agreement was substantial for MPD size of 5–9 mm (κ = 0.52), moderate for cyst size ≥ 3 cm (κ = 0.44), fair for abrupt MPD caliber change with distal pancreatic atrophy (κ = 0.30), pancreatitis (κ = 0.27), and thickened or enhancing cyst walls (κ = 0.25), and poor for enhancing mural nodules < 5 mm (κ = -0.012). Of note, enhancing mural nodules < 5 mm in size exhibited the highest concordance rate (81.7%), whereas thickened or enhanced cyst walls displayed the lowest (10%).
The κ values for enhancing mural nodule ≥ 5 mm (κ = 0.72 vs. 0.44; P < .001), cyst size ≥ 3 cm (κ = 0.62 vs. 0.32; P < .001), infiltrative mass (κ = 0.63 vs. 0.54; P < .001), lymphadenopathy (κ = 0.51 vs. 0.41; P < .001), pancreatitis (κ = 0.39 vs. 0.17; P < .001), and abrupt MPD caliber change with distal pancreatic atrophy (κ = 0.38 vs. 0.27; P < .001) were higher for the four more experienced reviewers than their less experiences counterparts.
In regards to the number of imaging features (Table 3), there was substantial agreement for at least one high-risk stigmata (κ = 0.65). Interobserver agreement ranged from fair-to-moderate for at least one, two, three, or four worrisome features with no high-risk stigmata (κ = 0.23–0.41). Of note, more experienced reviewers demonstrated superior interobserver agreement for at least one high-risk stigmata than less experienced reviewers (κ = 0.75 vs. 0.64; P < .001).
Interobserver Agreement on the Five-Point Scale for Diagnosis of Malignant IPMNs Based on IAP Guidelines and Reviewer Experience
Based on the 2071 IAP guidelines, agreement was substantial for the five-point scale (κ = 0.66), binary scale I (κ = 0.65), and binary scale II (κ = 0.61) for the diagnosis of malignant IPMNs. More experienced reviewers exhibited superior agreement compared to their less experienced counterparts (five-point scale, κ = 0.73 vs. 0.66; binary scale I, κ = 0.74 vs. 0.64; binary scale II, κ = 0.70 vs. 0.57; all P < .001) (Table 3).
Based on reviewer experience, agreement was moderate for the five-point scale (κ = 0.58) and binary scale I (κ = 0.55; 95% CI: 0.52, 0.58), and fair for binary scale II (κ = 0.34) for the diagnosis of malignant IPMNs. More experienced reviewers demonstrated significantly better agreement compared to their less experienced counterparts (five-point scale, κ = 0.81 vs. 0.43; binary scale I, κ = 0.77 vs. 0.48; binary scale II, κ = 0.68 vs. 0.08; all P < .001) (Table 3).
Diagnostic Performance for Predicting Malignant IPMNs Based on IAP Guidelines and Reviewer Experience
Based on IAP guidelines, the AUC ranged 0.75–0.89 among all eight reviewers when using the five-point scale for diagnosing malignant IPMNs, with a median value of 0.84 (Table 4). The AUC values of the more experienced reviewers tended to be slightly better than those of the radiology residents (AUC range, 0.82–0.89 vs. 0.75–0.85).
Based on reviewer experience, the AUC ranged 0.71–0.92 among all eight reviewers when using the five-point scale for diagnosing malignant IPMNs, with a median value of 0.84. The AUC values of the more experienced reviewers tended to slightly outperform those of the radiology residents (AUC range, 0.88–0.92 vs. 0.71–0.80).
Based on the IAP guidelines, the median accuracies were 83% and 77% with a binary scale of I or II, respectively. When applying binary scale I, the median sensitivity was 87% (range, 64–96%), and the median specificity was 80%. For binary scale II, the median sensitivity was 90%, and the median specificity was 69%. Based on the reviewer’s experience and a binary scale of I or II, the median accuracy was 78% and 71%, respectively. When applying binary scale I, the median sensitivity was 79% and the median specificity was 82%. For binary scale II, the median sensitivity was 94%, and the median specificity was 54%.