Back to the list
Congress: ECR24
Poster Number: C-19422
Type: EPOS Radiologist (scientific)
Authorblock: T. Santner1, C. Ruppert2, S. Gianolini3, J-G. Stalheim4, S. Frei5, M. Hondl Adametz6, V. Fröhlich7, S. Hofvind8, G. Widmann1; 1Innsbruck/AT, 2Zürich/CH, 3Glattpark/CH, 4Bergen/NO, 5Lausanne/CH, 6Vienna/AT, 7Wiener Neustadt/AT, 8Oslo/NO
Disclosures:
Tina Santner: Nothing to disclose
Carlotta Ruppert: Employee: b-rayZ AG
Stefano Gianolini: Nothing to disclose
Johanne-Gro Stalheim: Nothing to disclose
Stephanie Frei: Nothing to disclose
Michaela Hondl Adametz: Nothing to disclose
Vanessa Fröhlich: Nothing to disclose
Solveig Hofvind: Nothing to disclose
Gerlig Widmann: Nothing to disclose
Keywords: Artificial Intelligence, Breast, Mammography, Screening, Quality assurance
Methods and materials

Background:

The fact that breast diagnostics, especially breast cancer screening, requires highest possible quality in all aspects of the diagnostics pathway to achieve the necessary cancer detection rate and overall reliability, has already been extensively discussed.

With regard to the technical image quality, there are usually clear guidelines on how, by whom, when and to what extent checks must be carried out (e.g., constancy check). However, it is much more difficult to measure a diagnostic image quality – involving the ideal positioning of the breast and tissue visualisation on the images. The skills required to perform optimal mammographies are high. When interpreting a mammogram for diagnosis, incorrect positioning is the most common problem [2]. Radiographers are therefore not only responsible for producing the images, they should also be able to observe their performance in order to achieve and maintain a high level of quality over time.

A quality check can be performed at the time of screening to provide immediate feedback, or, following the original concept of PGMI, retrospectively, for quality assurance purposes or for a regular monitoring of image quality. When carried out comprehensively, PGMI is a time-consuming task, and only a sample of each radiographer's images can be reviewed. In addition, there is always a subjective component to image quality grading, as the assessment differs between assessors [3]. Even when attempts have been made to define the rules precisely, it has sometimes proved difficult to make consistently clear statements and to obtain repeatable values. These uncertainties and variability have understandably led to much debate about such assessment tools [4,5]. However, this quality aspect cannot simply be left out. Inhomogeneous performance in this regard would have a negative impact on the sensitivity of the examination and the entire screening process.

The use of artificial intelligence (AI) is currently playing an increasing role in breast diagnostics. Convolutional Neural Networks (CNN) have already been shown to mimic human decision-making and to detect subtle features in mammograms [6]. It is obvious that a quality analysis based on pattern recognition should also be possible, and the first programmes offering such an assessment are being developed. The problem of subjectivity and variability could be eliminated, as well as that of extensive human and time resources. At the same time, all images per employee can be included in the evaluation, making the result more representative and allowing potential problems to be identified more quickly.

Methods:

Image data collection

A retrospective data collection with 520 anonymised standard mammograms and tomosynthesis (consisting of images in caudo-cranial and mediolateral-oblique view) randomly selected from representative subsets from 13 imaging centres within two European countries (Austria, Switzerland) was created. The images originate from fully digital devices of different generations of 3 different manufacturers (GE, Hologic, Siemens). When selecting the participating sites, great importance was attached to a certain range in terms of unit size, throughput, team constellation and clientele in order to allow a heterogeneity of the data that corresponds to reality.

Human reading

5 experts from 3 European countries with different screening traditions (Austria, Norway, Switzerland) were chosen to manually evaluate the diagnostic image quality of the cases. In order to document the individual background of the participants and to identify any connections or patterns in this regard, all readers were asked to fill out a questionnaire, which should primarily record their experience and working environment. All 5 readers were provided with the same image data set and a corresponding case report form to fill in their results. Everyone received comprehensive training to ensure that there was a uniform approach.

AI reading

As a sixth reader, a dedicated AI software (b-rayZ AG, Switzerland) was used. For each distinct quality feature, the software integrates a tailored solution, utilizing primarily Deep Learning techniques complemented by classical image processing. The software modules were adjusted to adhere to the rules guiding the report form completed by human readers, ensuring alignment. Ultimately, the software predicted the overall image quality based on the predictions of the individual quality features.

Statistics and Evaluation

A comprehensive evaluation was undertaken, using metrics such as accuracy, Cohen’s kappa, and confusion matrices to compare the computations of the AI software against the individual assessment of the readers, as well as potential discrepancies between them.

GALLERY