Class-Conditioned Image Synthesis with Diffusion for Imbalanced Diabetic Retinopathy Grading
Published in International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2025
Public diabetic retinopathy (DR) datasets (e.g., DDR) are severely class-imbalanced, leading state-of-the-art classifiers to achieve high overall accuracy but poor balanced accuracy, particularly on rare DR stages. This work addresses the issue by using diffusion models to generate synthetic fundus images to balance the training set.
T2I Diffusion Finetuning. We efficiently fine-tune a text-to-image diffusion model on fundus data using a DreamBooth-style framework, where $MSE_{DDR}$ serves as the subject-instance loss and $MSE_{prior}$ as the class-specific prior preservation loss, enabling stable adaptation to the fundus domain.

Semantic Quality–Based Filtering. Since prior studies show that higher visual realism does not necessarily translate to better classifier performance, we introduce a semantic quality–based filtering strategy. An ensemble of pretrained DR classifiers evaluates generated samples, scoring each image by the average class likelihood. Low-confidence samples harm training, while near-perfect samples add little new information; thus, only samples with scores in ([0.7, 0.9]) are selected.
Explicit Class Conditioning. We further integrate semantic quality control directly into diffusion fine-tuning via a self-supervised explicit class conditioning (ECC) scheme, allowing the model to generate high-semantic-quality samples without hard filtering.

Experimental results show that diffusion-generated data outperforms oversampling, while naive diffusion without semantic control performs worst. Semantic filtering significantly improves performance, and ECC achieves comparable or better gains without strict filtering, indicating that soft semantic control is preferable. Overall, balanced accuracy consistently increases while overall accuracy slightly decreases, reflecting reduced class-prior bias.

Qualitative results confirm that high-semantic-quality samples exhibit clinically meaningful features, such as microaneurysms in mild DR and diffuse hemorrhages and venous beading in severe DR, whereas low-quality samples either miss key lesions or display ambiguous pathology.
Citation:
@inproceedings{zhang2025class,
title={Class-Conditioned Image Synthesis with Diffusion for Imbalanced Diabetic Retinopathy Grading},
author={Zhang, Haochen and Heinke, Anna and Nagel, Ines D and Bartsch, Dirk-Uwe G and Freeman, William R and Nguyen, Truong Q and An, Cheolhong},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={56--66},
year={2025}
}
