In total, 170 lesions were present, of those 28 (16%), 77 (45%), 31 (18%), 29 (17%), and 5 (3%) were of GGG 1, 2, 3, 4, and 5 respectively. The train (test) set comprised of 78 (34) patients with 119 (51) lesions, of those 17 (11), 55 (22), 22 (9), 22 (7) and 3 (2) lesions were of GGG 1, 2, 3, 4, and 5 respectively.
On the lesion level, EEE attained AUC [95% CI] of 0.89 [0.82,0.96] and ICC(3,1) [95% CI] of 0.94 [0.89,0.96] which is considered of excellent repeatability. Its performance was significantly better than Shannon entropy of ADCm (AUC=0.82 [0.74,0.95], ICC=0.92 [0.86,0.95]), Shannon entropy of ADCk (AUC=0.83 [0.75,0.96], ICC=0.93 [0.87,0.96]), and Shannon entropy of K (AUC=0.83 [0.75,0.95], ICC=0.92 [0.86,0.95]).
On the patient level, EEE attained AUC [95% CI] of 0.94 [0.87,0.97] and ICC(3,1) [95% CI] of 0.95 [0.90,0.97] which is considered of excellent repeatability. Its performance was significantly better than Shannon entropy of ADCm (AUC=0.75 [0.61,0.95], ICC=0.89 [0.80,0.95]), Shannon entropy of ADCk (AUC=0.78 [0.64,0.96], ICC=0.93 [0.87,0.96]), and Shannon entropy of K (AUC=0.79 [0.67,0.95], ICC=0.90 [0.81,0.95]).
Referring to a previous work [15] which uses the exactly same dataset and the exactly same 70%-30% train-test sets, it can be seen that EEE is at least on the same level of performance as machine learning (ML) of first- and second-order statistics of both ADCk and K. In [15], 1694 radiomic features including Sobel, Kirch, Gradient, Zernike Moments, Gabor, Haralick, CoLIAGe, Haar wavelet coefficients, 3D analogue to Laws features, 2D contours, and corner detectors were studied. Its best AUC [95% CI] of patient level detection of GGG > 1 is 0.77 [0.64, 0.89] which is lower than EEE's. This suggests that EEE is highly efficient for PCa detection even without ML.
There are some limitations of current work. We did not correct for possible correlation between multiple lesions in individual patients. In 46 (41%, 46/112) patients, more than one lesion was present. Besides, the dataset used is not the largest to afford more data variability for more comprehensive assessment (though it is the largest dataset with short term repeatable DWI). Furthermore, we did not consider other DWI postprocessing models such as biexponential and stretched exponential.