Researchers Develop Efficient Controllable Image Generation Method for Diffusion Transformers,image generation solution;Diffusion Transformers;control–condition relevance" >

HOME

Researchers Develop Efficient Controllable Image Generation Method for Diffusion Transformers

Dec 09, 2025 | By ZHANG Jie; ZHAOWeiwei

A research team led by XIE Chengjun and ZHANG Jie at the Hefei Institutes of Physical Science of the Chinese Academy of Sciences has developed a new controllable image generation solution for Diffusion Transformers (DiT). 

Their method, based on an analysis of control–condition relevance, has been accepted by The Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-26).

Diffusion Transformers have recently become a core backbone for tasks such as text-to-image and text-to-video generation, thanks to their strong scalability. However, existing controllable DiT approaches often rely on heavy control branches, leading to significant parameter expansion and computational overhead. They also tend to overlook that different Transformer layers respond differently to control signals, resulting in inefficient resource allocation and suboptimal compute usage.

In this study, the researchers propose RelaCtrl, a relevance-guided, efficient controllable generation framework that integrates control signals into DiT more compactly and intelligently. RelaCtrl introduces a ControlNet relevance score that evaluates the impact of control applied at each layer. By gradually skipping control branches during inference, it measures how each layer contributes to image quality and control accuracy. This score then guides the adaptive selection of control-layer locations, parameter scaling, and modeling capacity, effectively eliminating redundancy and striking a dynamic balance between controllability and efficiency.

Architecturally, the team further replaces the conventional self-attention and feed-forward network components in replication modules with a tailored Two-Dimensional Stochastic Mixing (TDSM) module. TDSM builds efficient token and channel mixers, substantially reducing computation while preserving expressive power.

Experiments show that compared with typical ControlNet-style methods, RelaCtrl achieves superior generation performance using only about 15% of the parameters and computational complexity. The approach demonstrates clear advantages across various qualitative and quantitative benchmarks.

"By reducing model size and compute cost, our work provides a lightweight and efficient controllable generation solution for the AIGC community," said Prof. Xie.   

The proposed relevance-guided controllable generation framework. (Image by ZHANG Jie)


Attachments Download:
Contact

Reference
Related Articles
Copyright © Hefei Institutes of Physical Science, CAS All Rights Reserved