👀 About me

I am Zhenyu Yang (杨振宇), a fourth-year Ph.D. student (2022-2027) at the State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, advised by Prof. Changsheng Xu. Previously, I earned my Bachelor’s degree from Beijing University of Posts and Telecommunications in 2022.

My research interests include 1. Streaming Video Understanding, 2. Multimodal Large Language Models, 3. Multimodal Retrieval. I have previously worked as a research intern with the Tencent Hunyuan team, the Kuaishou Keye team, and the 360 AI Department. I welcome collaboration and am always open to discussing research opportunities—feel free to reach out via email!!!

🔥 News

2025.11: 🏆🏆 Congratulations to our “Multi-View Captioning with Semantic Delta Re-Ranking for Zero-Shot Composed Video Retrieval” for winning the Best Paper Award at ICIG 2025!
2025.09: 🎉🎉 Our paper “LiveStar: Live Streaming Assistant for Real-World Online Video Understanding” about streaming Video-LLMs has been accepted to NeurIPS 2025!
2025.08: 🎉🎉 Our paper “StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA” has been accepted to ACM MM 2025 Datasets!
2025.07: 🎉🎉 I was supported the CIE-Tencent Doctoral Research Incentive Project, a competitive grant awarded to only 23 recipients nationwide, along with a research fund of 100,000 RMB.

📝 Publications

NeurIPS 2025

🔥 [NeurIPS’2025] Poster

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Zhenyu Yang, Kairui Zhang, Yuhang Hu, Bing Wang, Shengsheng Qian, Bin Wen, Fan Yang, Tingting Gao, Weiming Dong, Changsheng Xu

[Code] [Project] [Paper] [中文解读]

ICLR 2025

🚀 [ICLR’2025] Spotlight

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding

Zhenyu Yang, Yuhang Hu, Zemin Du, Dizhan Xue, Shengsheng Qian, Jiahong Wu, Fan Yang, Weiming Dong, Changsheng Xu

[Code] [Project] [Paper] [Dataset] [Model] [Leaderboard] [Submission] [中文解读]

SIGIR 2024

🏆 [SIGIR’2024] Best Paper Honorable Mention

LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval

Zhenyu Yang, Dizhan Xue, Shengsheng Qian, Weiming Dong, Changsheng Xu

[Code] [Paper] [Video]

ACM MM 2024

🎉 [ACM MM’2024] Poster

Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval

Zhenyu Yang, Shengsheng Qian, Dizhan Xue, Jiahong Wu, Fan Yang, Weiming Dong, Changsheng Xu

[Code] [Paper]

ACM MM 2025

🎉 [ACM MM’2025] Poster

StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA

Yuhang Hu, Zhenyu Yang, Shihan Wang, Shengsheng Qian, Bin Wen, Fan Yang, Tingting Gao, Changsheng Xu

[Code] [Paper]

ICIG 2025

🏆 [ICIG’2025] Best Paper Award

Multi-View Captioning with Semantic Delta Re-Ranking for Zero-Shot Composed Video Retrieval

Zhixiang Ding, Lilong Liu, Zhenyu Yang, Shengsheng Qian

[Code] [Project] [Paper]

🎖 Honors and Awards

Best Paper Honorable Mention (5/791), SIGIR, 2024
Best Paper Award, ICIG, 2025
Spotlight Paper (~3.27%), ICLR, 2025
CIE-Tencent Doctoral Research Incentive Project / 混元学者 (中国电子学会-腾讯博士生科研激励计划), 2025
National Scholarship, Ministry of Education, China, 2024
Outstanding Graduate, Beijing, 2022
Outstanding Graduate, Beijing University of Posts and Telecommunications, 2022
First-Class Scholarship, Beijing University of Posts and Telecommunications, 2020/2021
First Prize in American Mathematical Contest in Modeling (MCM), Top 6.7% Globally, 2020

📖 Educations

2022.09 - 2027.06: Ph.D, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing. Major: Computer Applied Technology.
2018.09 - 2022.06: Undergraduate, School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing. Major: Intelligent Science and Technology.

🙋 Services

Conference Reviewer: CVPR 2026, ICLR 2026, AAAI 2026, NeurIPS 2025, ICCV 2025, ACML 2025, etc.
Journal Reviewer: IEEE Transactions on Image Processing (TIP), Transactions on Multimedia Computing Communications and Applications (TOMM), Neurocomputing, Pattern Recognition.