MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
Enxin Song*, Wenhao Chai*♡, Guanhong Wang*,Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Tian Ye, Jenq-Neng Hwang, Gaoang Wang✉
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Website]
[Paper]
[Dataset]
[Code]
MovieChat achieves state-of-the-art performace in long video understanding by introducing memory mechanism.
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao, Xiawu Zheng, Guanhong Wang, Weijiang Yu, Yunhang Shen, Ke Li, Yutong Lu✉, Yonghong Tian✉
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
We propose LegoGCD, which is seamlessly integrated into previous methods to enhance the discrimination of novel classes while
maintaining performance on previously encountered known classes.
Knowledge-guided Pre-training and Fine-tuning: Video Representation Learning for Action Recognition
Guanhong Wang*, Yang Zhou, Zhanhao He, Keyu Lu, Yang Feng, Zuozhu Liu, Gaoang Wang✉
Neurocomputing, 2023
We propose a novel video representation learning method with knowledge-guided pre-training and fine-tuning for action recognition.
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang*, Guanhong Wang*, Wenhao Chai, Jiayu Zhou, Gaoang Wang✉
Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2023
Personalized image captioning incorporate user prior knowledge into the model, such as writing styles and preferred vocabularies
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
Zhonghan Zhao*, Wenhao Chai*, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Gaoang Wang✉, Mingli Song, Jenq-Neng Hwang
arXiv Preprint.
[Paper]
Our survey provides valuable reference material for researchers interested in deep learning applications within the sporting industry whilst also shedding light on its potential to utilize sports data for analysis.
Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality
Haozhe Chi*, Minghua Yang, Junhao Zhu, Guanhong Wang, Gaoang Wang✉
Asia-Pacific Chapter of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing (AACL-IJCNLP), 2022
[Paper]
In this paper, we propose a simple yet effective meta-sampling approach for multimodal sentiment analysis with missing modalities, namely Missing Modality-based Meta Sampling (M3S).
Human-centered Prior-guided and Task-dependent Multi-task Representation Learning for Action Recognition Pre-training
Guanhong Wang*, Keyu Lu, Yang Zhou, Zhanhao He, Gaoang Wang✉
IEEE International Conference on Multimedia and Expo (ICME), 2022
[Paper]
We distill knowledge from a human parsing model to enrich the semantic capability of representation.
Multi-feature fusion refine network for video captioning
Guanhong Wang, Hongbo Zhang, Jixiang Du✉
Journal of Experimental & Theoretical Artificial Intelligence (JETAI), 2022
[Paper]
In this paper, we propose an approach based on multi-feature fusion refine network.