Researchers Advance Cross-Modality Smart Security with Transformer Model

Aug 27, 2025 | By WANG Hongqiang; ZHAO Weiwei

Recently, the research team led by Professor Wang Hongqiang from the Hefei Institutes of Physical Science of the Chinese Academy of Sciences proposed a Global-Local Alignment Attention model based on an Asymmetric Siamese Transformer (AST-GLAA), which markedly enhances the performance of Visible-X-ray cross-modality package re-identification tasks.

This research was published in IEEE Transactions on Information Forensics and Security.

Visible-X-ray cross-modality package re-identification is a core technology in security inspection. Its challenge lies in the significant pixel-level differences between the two modal images, making it difficult for traditional methods to extract robust cross-modality invariant features.

In this study, the team introduced an asymmetric design concept into the Siamese Transformer architecture, proposing a Cross-modality Asymmetric Siamese Transformer structure. By embedding LayerNorm layers and modality-aware encoding in one branch, the model's ability to extract cross-modality invariant features is effectively enhanced.

They further designed a Global-Local Cross-modality Alignment Attention module. By modeling the interaction between global and local features, it enhances fine-grained feature representation while addressing the spatial misalignment issue in cross-modality images.

Experimental results show that the key metrics of this model on a dedicated cross-modality package re-identification dataset show significant improvement over the current state-of-the-art methods, providing reliable technical support for the intelligentization of security inspection.

This research work is the first to introduce the Transformer architecture into the cross-modality package re-identification task, breaking through the limitations of existing methods that rely on symmetric convolutional networks, according to the team.

Illustration of the VX-ReID task and main idea (Image by WANG Hongqiang)

An overview of the proposed Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA) (Image by WANG Hongqiang)

Attachments Download:

Contact

Reference

An Asymmetric Siamese Transformer With Global-Local Alignment Attention for Visible-X-Ray Cross-Modality Package Re-Identification

ABOUT

RESEARCH

NEWSROOM

CAREER

STUDY

PARTNERSHIP

HOME