HOME

Researchers Enhance Scene Perception with Innovative Framework

May 07, 2024 | By WANG Fan; ZHAO Weiwei

Led by Prof. LIU Yong from Hefei lnstitutes of Physical Science of Chinese Academy of Sciences, the computer vision group proposed a novel framework called CLIPbased Knowledge Transfer and Relational Context Mining (CKT-RCM), which aims to address the long-tail distribution problem in computer vision.

The research results were published in IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP 2024)

Panoptic Scene Graph (PSG) stands as a prominent research direction within Scene Graph Generation, requiring comprehensive output of all relationships in an image alongside accurate segmentation for object localization. PSG endeavors to enhance the understanding of scenes by computer vision models, supporting downstream tasks like scene description and visual inference.

In this study, researchers explored how humans perceive object relationships, presenting two key perspectives. Humans anticipated the object relationships based on common sense or prior knowledge. Also they deduced relationships from contextual information between subjects and objects. These perspectives underscore the importance of leveraging prior knowledge: one involves correcting data biases using external data previously observed by humans, while the other relies on the prior distribution of conditions between objects.

"That's why we believe that sufficient prior knowledge and contextual information are crucial for PSG predictions," emphasized Dr. WANG Fan, member of the team.

They developed CKT-RCM, a network framework. Leveraging the pre-trained vision-and-language model CLIP, CKT-RCM facilitates relationship inference during PSG processes. It integrates a cross-attention mechanism to extract relational context, ensuring a balance between value and quality in relationship predictions.

This research contributes to the understanding and perception of scenes by robots and autonomous vehicles.

a novel framework called CLIPbased Knowledge Transfer and Relational Context Mining (CKT-RCM) was proposed. (Image by WANG Fan)

 

Attachments Download:
Contact

Reference
Related Articles
Copyright © Hefei Institutes of Physical Science, CAS All Rights Reserved