Generative AI encompasses techniques designed to learn patterns from datasets and generate new, plausible data points. Generative AI refers to a set of AI techniques and models designed to learn the underlying patterns and structure of a dataset and generate new data points that plausibly could be part of the original dataset. Specifically, during training, a generative model tries to estimate the probability distribution of data. So it provides powerful strength in learning complex data distributions such as high dimensional imaging data. In recent developments, new generative AI model frameworks have been introduced, expanding beyond traditional models like generative adversarial networks (GANs) and variational autoencoders (VAEs).
1) Variational autoencoders (VAEs) and generative adversarial networks (GANs)
Generative Adversarial Networks (GANs) are one of the most well-known generative models, introduced by Ian Goodfellow in 2016. GANs consist of two neural networks, a Discriminator and a Generator, which compete against each other during training. The Generator is responsible for generating data that it has never seen before, while the Discriminator's role is to classify between real and fake data. As training progresses, the Discriminator trains to classify real data and fake data generated by the Generator. Simultaneously, the Generator trains to produce fake data that can fool the Discriminator into classifying it as real. Through this adversarial learning process, the Discriminator becomes adept at differentiating between real and fake data, while the Generator produces data that closely resembles real data. When the Generator's output is classified as real by the Discriminator, the Generator is effectively creating realistic data and can be used as a generative model (Fig 1).
One of the variations of GANs, the Swin Transformer GAN, is a generative adversarial network based on the Swin Transformer (Fig 2). This model excels in generating and transforming high-resolution images. The Swin Transformer processes image data using local attention mechanisms and a hierarchical structure, which is integrated into the Generator and Discriminator of the GAN to improve the quality of generated data. Specifically, the Generator leverages the representational power of the Swin Transformer to train multi-resolution features, enabling the creation of detailed high-resolution images. Meanwhile, the Discriminator utilizes the Swin Transformer's local attention mechanism to more accurately distinguish between generated and real images, resulting in significantly enhanced image quality and detail compared to the Vanilla GAN model.
Variational Autoencoders (VAEs), proposed by Kingma and Welling in 2013, are another representative generative model characterized by their probabilistic approach to data generation. VAEs are composed of two networks: an Encoder and a Decoder (Fig 3). The Encoder transforms input data into a lower-dimensional latent space with a probabilistic distribution, while the Decoder reconstructs data similar to the original input based on samples from this latent space. VAEs constrain the latent space to follow a specific distribution, such as a normal distribution, enabling the generation of new data. During training, VAEs minimize two types of losses: Reconstruction Loss and KL Divergence Loss. Reconstruction Loss reduces the difference between the input data and the data reconstructed by the Decoder, while KL Divergence Loss ensures that the latent space conforms to the specified probabilistic distribution. Once training is complete, the Decoder can generate data resembling the original input by sampling from the latent space.
2) Diffusion model and variant models based on the diffusion model
Diffusion Models, proposed by Ho et al. in 2020, are generative models that make the data generation process based on stochastic differential equations. These models consist of a forward process, which progressively adds noise to the data, and a reverse process, which removes this noise to reconstruct the original data. In the forward process, noise is incrementally added to the data over multiple steps, transforming it into a near-normal distribution. In the reverse process, the trained probabilistic model removes noise step by step to reconstruct the original data. The training process of Diffusion Models minimizes the difference between the data generated in the forward process and the data restored in the reverse process. Once trained, Diffusion Models can be used as generative models that generate new data by progressively removing noise from samples drawn from a normal distribution. Diffusion Models address some of the limitations of earlier models like GANs, such as unstable training due to the equilibrium collapse between the two networks. As a result, Diffusion Models exhibit better training stability and can generate high-quality data. However, this comes at the cost of increased computational requirements.
DDPM (Denoising Diffusion Probabilistic Models) are a variant of Diffusion Models that improve computational efficiency and enhance training and generation performance. DDPM follows the basic structure of the original Diffusion Model, which progressively adds and removes noise to generate data (Fig 4). However, it introduces improvements in the reverse noise removal process and the loss function, thereby enhancing the quality of generated data, as well as the efficiency and convergence of the training process.
3) L-Former
L-Former is a Transformer-based generative model designed to effectively train and generate high-dimensional data. Based on the representational learning capabilities of traditional Transformers, L-Former is optimized for generative tasks through its specialized architecture and attention mechanisms. To train generating data, L-Former employs techniques such as multi-head attention and layer normalization, and during the generation process, it utilizes the learned data distribution to create new data (Fig 5). One of the key strengths of L-Former lies in its ability to efficiently handle dependencies between sequences during generation, enabling it to learn complex patterns in the data. Furthermore, its computational efficiency allows it to excel in a variety of generative tasks, including natural language generation, image generation, and video generation, especially in domains where sequential characteristics play a significant role.
4) StableDiffusionImg2ImgPipeline
Generative models are widely used not only for data augmentation and the generation of new data but also for processes such as data transformation. StableDiffusionImg2ImgPipeline is a Diffusion-based generative model specifically designed for image-to-image transformation. This pipeline takes an existing image as input and generates a new image with a different style or transformation. The model operates within the latent space of Stable Diffusion, preserving the structural information of the input image while transforming it according to the desired style, details, or conditions specified by the user. The process involves passing the latent representation of the input image through a trained Diffusion network, which progressively removes noise to create a transformed image (Fig 6). This capability makes the StableDiffusionImg2ImgPipeline highly effective for various image transformation tasks.
Application in Medical Imaging
The use of generative AI in medical imaging is rapidly evolving. Applications of Variational Autoencoders (VAEs) have grown significantly, with an 81% increase from 2017 to 2022, while research involving GANs has plateaued. More complex architectures that combine multiple generative models are emerging, offering improved quality and mode coverage. Additionally, the use of diffusion models has surged since 2022, demonstrating strong potential for synthesizing high-quality images with diverse features.
Medical imaging datasets often suffer from limitations or imbalances, making it challenging to obtain sufficient training samples that represent a wide range of cases. Generative AI addresses this by synthesizing additional data from existing datasets, creating diverse and high-quality samples. This not only helps mitigate data scarcity but also enhances the performance of predictive AI models in medical applications. Table 1 shows variable application of generative AI to solve lack of data in medical imaging field.
Future Directions for Generative AI in Medical Imaging
Despite its promise, generative AI in medical imaging faces critical challenges. Misinterpretation of models can lead to unfair or biased outcomes, while accountability for harmful outputs remains ambiguous. Ethical concerns, such as the misuse of AI for deepfakes or misinformation, further underscore the need for safeguards. Robust validation, ethical guidelines, and regulatory frameworks are essential to address these issues.
Future advancements may focus on integrating multimodal data, such as genomics and electronic medical records, to provide a comprehensive view of patient health. By doing so, generative AI could transform medical imaging, enabling faster and more accurate diagnoses and treatments while fostering holistic patient care.