Class-Conditioned Image Synthesis with Diffusion for Imbalanced Diabetic Retinopathy Grading

Published in International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2025

Public diabetic retinopathy (DR) datasets (e.g., DDR) are severely class-imbalanced, leading state-of-the-art classifiers to achieve high overall accuracy but poor balanced accuracy, particularly on rare DR stages. This work addresses the issue by using diffusion models to generate synthetic fundus images to balance the training set.

T2I Diffusion Finetuning. We efficiently fine-tune a text-to-image diffusion model on fundus data using a DreamBooth-style framework, where $MSE_{DDR}$ serves as the subject-instance loss and $MSE_{prior}$ as the class-specific prior preservation loss, enabling stable adaptation to the fundus domain.

Overview of our text-to-image diﬀusion model finetuning framework. The top section illustrates the process of stabilizing diﬀusion model finetuning on fundus dataset with limited data. The bottom right depicts our proposes semantic quality evaluation and filtering pipeline. The entire image demonstrates how semantic quality is enhanced through explicit class conditioning during diﬀusion model training.

Semantic Quality–Based Filtering. Since prior studies show that higher visual realism does not necessarily translate to better classifier performance, we introduce a semantic quality–based filtering strategy. An ensemble of pretrained DR classifiers evaluates generated samples, scoring each image by the average class likelihood. Low-confidence samples harm training, while near-perfect samples add little new information; thus, only samples with scores in ([0.7, 0.9]) are selected.

Explicit Class Conditioning. We further integrate semantic quality control directly into diffusion fine-tuning via a self-supervised explicit class conditioning (ECC) scheme, allowing the model to generate high-semantic-quality samples without hard filtering.

Experimental results show that diffusion-generated data outperforms oversampling, while naive diffusion without semantic control performs worst. Semantic filtering significantly improves performance, and ECC achieves comparable or better gains without strict filtering, indicating that soft semantic control is preferable. Overall, balanced accuracy consistently increases while overall accuracy slightly decreases, reflecting reduced class-prior bias.

In the first row, the samples on the left achieve a higher semantic score and display microaneurysms—key diagnostic features of mild DR. In contrast, the samples on the right either lack visible microaneurysms or suffer from poor image quality. In the second row, the left-side samples demonstrate characteristic features of severe nonproliferative DR, including diffuse retinal hemorrhages, microaneurysms across all four quadrants, and venous beading in at least two quadrants. In comparison, the right-side samples are more indicative of proliferative DR, as they show signs of neovascularization with intra- or subretinal hemorrhage.

Qualitative results confirm that high-semantic-quality samples exhibit clinically meaningful features, such as microaneurysms in mild DR and diffuse hemorrhages and venous beading in severe DR, whereas low-quality samples either miss key lesions or display ambiguous pathology.

Paper | Poster | Code

Citation:

@inproceedings{zhang2025class,
  title={Class-Conditioned Image Synthesis with Diffusion for Imbalanced Diabetic Retinopathy Grading},
  author={Zhang, Haochen and Heinke, Anna and Nagel, Ines D and Bartsch, Dirk-Uwe G and Freeman, William R and Nguyen, Truong Q and An, Cheolhong},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  pages={56--66},
  year={2025}
}

Share on

Twitter Facebook LinkedIn

Haochen Zhang

Share on