Recently, the research team led by Professor Wang Hongqiang from the Hefei Institutes of Physical Science of the Chinese Academy of Sciences proposed a Global-Local Alignment Attention model based on an Asymmetric Siamese Transformer (AST-GLAA), which markedly enhances the performance of Visible-X-ray cross-modality package re-identification tasks.
This research was published in IEEE Transactions on Information Forensics and Security.
Visible-X-ray cross-modality package re-identification is a core technology in security inspection. Its challenge lies in the significant pixel-level differences between the two modal images, making it difficult for traditional methods to extract robust cross-modality invariant features.
In this study, the team introduced an asymmetric design concept into the Siamese Transformer architecture, proposing a Cross-modality Asymmetric Siamese Transformer structure. By embedding LayerNorm layers and modality-aware encoding in one branch, the model's ability to extract cross-modality invariant features is effectively enhanced.
They further designed a Global-Local Cross-modality Alignment Attention module. By modeling the interaction between global and local features, it enhances fine-grained feature representation while addressing the spatial misalignment issue in cross-modality images.
Experimental results show that the key metrics of this model on a dedicated cross-modality package re-identification dataset show significant improvement over the current state-of-the-art methods, providing reliable technical support for the intelligentization of security inspection.
This research work is the first to introduce the Transformer architecture into the cross-modality package re-identification task, breaking through the limitations of existing methods that rely on symmetric convolutional networks, according to the team.
Illustration of the VX-ReID task and main idea (Image by WANG Hongqiang)
An overview of the proposed Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA) (Image by WANG Hongqiang)