Manyuan Zhang

Staff Researcher at Meituan-M17, Hong Kong
Ph.D. from Multimedia Laboratory
Department of Electornic Engineering
The Chinese University of Hong Kong
Email: zhangmanyuan@link.cuhk.edu.hk
Scholar CV Github Linkin

About me

Now, I am a Staff Researcher at Meituan-M17, Hong Kong. I received my Ph.D. from Multimedia Lab (MMLab), the Chinese University of Hong Kong, supervised by Prof.Hongsheng Li and Prof.Xiaogang Wang. And I received my bachelor's degree from University of Electronic Science and Technology of China (UESTC) in 2019. Previously, I was a Researcher at SenseTime Research.

During my six years at SenseTime Research, I was involved in many projects from scratch. We built the most reliable face recognition system in the world at that time (the champion of FRVT, ICCV MFR), the best video recognition model (the champion of ActivityNet Challenge Kinetics700), reimplemented the AI of StarCraft2 (DI-star) from scratch, developed an autonomous driving algorithm based on reinforcement learning (DI-drive), and most recently, the text-to-image AIGC product SenseMirage (DAU exceeded one million within a week for the first time in SenseTime’s history, earning a special commendation from the CEO). If you are interested in my work or career, please feel free to contact me. Now Hiring Self-motivated Interns! Providing >1000 H-series GPUs!

News

[2025-08] One paper accepted to EMNLP 2025.
[2025-06] One paper accepted to ICCV 2025.
[2025-05] I successfully defended my PhD thesis and officially became Dr. Zhang!
[2025-03] One paper accepted to CVPR 2025.
[2024-07] Two papers accepted to ECCV2024.
[2024-03] One paper accepted to SIGGRAPH2024.
[2023-07] Two paper accepted to ICCV2023.
[2023-07] I pass the PhD candidate test.
[2023-05] I am invited to be a reviewer for NIPS2023 and ICLR2023.
[2023-02] One paper accepted to CVPR 2023.
[2022-12] I am invited to be a reviewer for CVPR2023 and ICCV2023.
[2022-07] One paper accepted to ECCV 2022.
[2022-04] I am invited to be a reviewer for ECCV2022 and NIPS2022.
[2022-04] I am invited to ’智东西’ to give a talk about imitation learning in automatic driving.
[2021-10] We win three championships of ICCV 2021 Masked Face Recognition Challenge on glink360k track, unconstrained track and Webface260M track. Code and solutions will be released very soon.
[2021-07] We release DI-drive, the decision intelligence platform for autonomous driving simulation. I am responsible for the imitation learning part.
[2021-07] One paper accepted to ICCV 2021.
[2021-05] We win the championship of NIST FRVT 1:1.
[2020-12] We win the championship of NIST FRVT 1:N.
[2020-06] We win 2 championships of ActivityNet on the Spatio-temporal Action Localization (AVA) track and the Trimmed Activity Recognition (Kinetics 700) track.
[2020-06] One paper accepted to ECCV 2020.
[2020-04] We release the X-Temporal for easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs.
[2019-10] One paper accepted to ICCV 2019 LFR workshop.
[2019-10] We win the championship of ICCV19 Multi-Moments in Time (MIT) Challenge.
[2019-10] We win the championship of ICCV19 Lightweight Face Recognition Challenge.

Challenge Awards

Won the 1th place in CVPR21 Masked Face Recognition Challenge (WebFace260M, InsightFace Unconstrained and InsightFace glint360k track)
Won the 1th place in CVPR20 ActivityNet Challenge (Kinetics700 track and AVA track)
Won the 1th place in NIST FRVT held by US government (1:1 Verification and 1:N Identification)
Won the 1th place in ICCV19 Multi-Moments in Time (MIT) Challenge
Won the 1th place in ICCV19 Lightweight Face Recognition Challenge

Technical Report

Large-scale Masked Face Recognition (Top-1 Solution)
Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu
1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020 (Top-1 Solution)
Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu
Top-1 Solution of Multi-Moments in Time Challenge 2019 (Top-1 Solution)
Manyuan Zhang, Hao Shao, Guanglu Song, Yu Liu, Junjie Yan

Recent Publications

*equal contribition

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, Hongsheng Li
2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Lumina-image 2.0: A unified and efficient image generative framework
Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao
2025 International Conference on Computer Vision (ICCV)

Let's Verify and Reinforce Image Generation Step by Step
Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Ziyu Guo, Haoquan Zhang, Manyuan Zhang, Jiaming Liu, Peng Gao, Hongsheng Li
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
2024 European Conference on Computer Vision (ECCV)

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, and Hongsheng Li
2024 European Conference on Computer Vision (ECCV)

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li 2024 ACM SIGGRAPH

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
2023 International Conference on Computer Vision (ICCV)

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
2023 International Conference on Computer Vision (ICCV)

FlowFormer: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
2022 The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)
Towards Robust Face Recognition with Comprehensive Search
Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
2022 European Conference on Computer Vision (ECCV)
Switchable K-class Hyperplanes for Noise-robust Representation Learning
Boxiao Liu, Guanglu Song, Manyuan Zhang, Haihang You, Yu Liu
2021 International Conference on Computer Vision (ICCV)
Discriminability Distillation in Group Representation Learning
Manyuan Zhang, Guanglu Song, Hang Zhou, Yu Liu
2020 European Conference on Computer Vision (ECCV)
Towards Flops-constrained Face Recognition
Yu Liu*, Guanglu Song*, Manyuan Zhang*, Jihao Liu*, Yucong Zhou, Junjie Yan
2019 ICCV Lightweight Face Recognition Challenge & Workshop
Tensor sensing for RF tomographic imaging
Tao Deng, Feng Qian, Xiao-Yang Liu, Manyuan Zhang, Anwar Walid
2018 IEEE International Conference on Multimedia and Expo (ICME)
Privacy-preserving sensory data recovery
Cai Chen, Manyuan Zhang, Huanzhi Zhang, Zhenyun Huang, Yong Li
2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications

Selected Projects

X-Xemporal
Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPU.
DI-drive
Decision Intelligence Platform for Autonomous Driving simulation.

Working Experience

Research intern at SenseTime Research (since Feb 2019)
Working on large-scale face recognition and video understanding with Yu Liu and Guanglu Song
Research intern at Megvii Research (from Aug 2018 to Feb 2019)
Working on style transfer with Shuaicheng Liu.
Research intern at Bytedance AI Lab (from May 2018 to Aug 2018)
Working on large-scale face recognition.