Millions of radiological examinations are conducted annually in developed countries, accumulating large repositories of digital health data and diagnostic images.
These data, stored in Electronic Medical Records (EMR) and Picture Archive and Communication Systems (PACS) systems, represent a significant resource for medical research and clinical practice.
However, this potential is largely underutilized due to the limited adoption of standardized structured reporting in radiology and the challenges associated with retrieving categorized information from documents written in natural, non-standardized language.
The development of automatic categorization systems would enable the effective utilization of this data for research, clinical practice, and healthcare policy planning.
In recent years, there has been increasing interest in using Artificial Intelligence (AI) and Natural Language Processing (NLP) models to analyze radiological reports for systematic organizing and labeling of radiological findings in a standardized format.
A significant challenge in this field is the absence of generally accessible NLP models for Italian medical and radiological language.
This work aims to address the challenge of categorizing unstructured radiological reports written in the Italian language, in the domain of Thoracic CT, developing an AI model through the fine-tuning of BERT (Bidirectional Encoder Representations from Transformers) model on Italian radiological chest CT reports to classify and extract relevant information.