|
Manyuan Zhang
About me
Now, I am a Staff Researcher at Meituan-M17, Hong Kong. I received my Ph.D. from Multimedia Lab (MMLab), the Chinese University of Hong Kong, supervised by Prof.Hongsheng Li and Prof.Xiaogang Wang. And I received my bachelor's degree from University of Electronic Science and Technology of China (UESTC) in 2019. Previously, I was a Researcher at SenseTime Research.
During my six years at SenseTime Research, I was involved in many projects from scratch. We built the most reliable face recognition system in the world at that time (the champion of FRVT, ICCV MFR), the best video recognition model (the champion of ActivityNet Challenge Kinetics700), reimplemented the AI of StarCraft2 (DI-star) from scratch, developed an autonomous driving algorithm based on reinforcement learning (DI-drive), and most recently, the text-to-image AIGC product SenseMirage (DAU exceeded one million within a week for the first time in SenseTime's history, earning a special commendation from the CEO).
Currently, at Meituan-M17 Hong Kong, I am involved in the development of the LongCat series foundation models, such as LongCat-Flash. My work focuses on native multimodality, modeling multiple modalities in a unified autoregressive manner. If you are interested in my work or career, please feel free to contact me. Now Hiring Self-motivated Interns! Providing >1000 H-series GPUs!
News
[2026-01] Two papers accepted to ICLR 2026.
[2025-08] One paper accepted to EMNLP 2025.
[2025-06] One paper accepted to ICCV 2025.
[2025-05] I successfully defended my PhD thesis and officially became Dr. Zhang!
[2025-03] One paper accepted to CVPR 2025.
[2024-07] Two papers accepted to ECCV2024.
[2024-03] One paper accepted to SIGGRAPH2024.
[2023-07] Two paper accepted to ICCV2023.
[2023-07] I pass the PhD candidate test.
[2023-05] I am invited to be a reviewer for NIPS2023 and ICLR2023.
[2023-02] One paper accepted to CVPR 2023.
[2022-12] I am invited to be a reviewer for CVPR2023 and ICCV2023.
[2022-07] One paper accepted to ECCV 2022.
[2022-04] I am invited to be a reviewer for ECCV2022 and NIPS2022.
[2022-04] I am invited to ’智东西’ to give a talk about imitation learning in automatic driving.
[2021-10] We win three championships of ICCV 2021 Masked Face Recognition Challenge on glink360k track, unconstrained track and Webface260M track. Code and solutions will be released very soon.
[2021-07] We release DI-drive, the decision intelligence platform for autonomous driving simulation. I am responsible for the imitation learning part.
[2021-07] One paper accepted to ICCV 2021.
[2021-05] We win the championship of NIST FRVT 1:1.
[2020-12] We win the championship of NIST FRVT 1:N.
[2020-06] We win 2 championships of ActivityNet on the Spatio-temporal Action Localization (AVA) track and the Trimmed Activity Recognition (Kinetics 700) track.
[2020-06] One paper accepted to ECCV 2020.
[2020-04] We release the X-Temporal for easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs.
[2019-10] One paper accepted to ICCV 2019 LFR workshop.
[2019-10] We win the championship of ICCV19 Multi-Moments in Time (MIT) Challenge.
[2019-10] We win the championship of ICCV19 Lightweight Face Recognition Challenge.
Challenge Awards
Technical Report
LongCat-Flash Technical Report
Meituan LongCat Team (including Manyuan Zhang), et al.
Large-scale Masked Face Recognition (Top-1 Solution)
Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu
1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020 (Top-1 Solution)
Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu
Top-1 Solution of Multi-Moments in Time Challenge 2019 (Top-1 Solution)
Manyuan Zhang, Hao Shao, Guanglu Song, Yu Liu, Junjie Yan
Preprint Papers
*equal contribution ^+project lead/corresponding author
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation
Yexin Liu, Manyuan Zhang^+, Yueze Wang, Hongyu Li, Dian Zheng, Weiming Zhang, Changsheng Lu, Xunliang Cai, Yan Feng, Peng Pei, Harry Yang
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
Hongyu Li, Manyuan Zhang^+, Dian Zheng, Ziyu Guo, Yimeng Jia, Kaituo Feng, Hao Yu, Yexin Liu, Yan Feng, Peng Pei, Xunliang Cai, Linjiang Huang, Hongsheng Li, Si Liu
OneThinker: All-in-one Reasoning Model for Image and Video
Kaituo Feng, Manyuan Zhang^+, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, Xiangyu Yue
Architecture Decoupling is Not All You Need for Unified Multimodal Model
Dian Zheng, Manyuan Zhang^+, Hongyu Li, Kai Zou, Hongbo Liu, Ziyu Guo, Kaituo Feng, Yexin Liu, Ying Luo, Yan Feng, Peng Pei, Xunliang Cai, Hongsheng Li
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-driven Images
Chengqi Duan, Kaiyue Sun, Rongyao Fang, Manyuan Zhang^+, Yan Feng, Ying Luo, Yufang Liu, Ke Wang, Peng Pei, Xunliang Cai, Hongsheng Li, Yi Ma, Xihui Liu
Recent Publications
*equal contribution
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
Hao Li, Zhengyu Zou, Fangfu Liu, Xuanyang Zhang, Fangzhou Hong, Yukang Cao, Yushi Lan, Manyuan Zhang, Gang Yu, Dingwen Zhang, Ziwei Liu
2026 International Conference on Learning Representations (ICLR)
CTR3D: Cross-view Token Reduction for Dense Multi-view Generation
Kunming Luo, Hongyu Yan, Yuan Liu, Zihao Zhang, Manyuan Zhang, Wenping Wang, Ping Tan
2026 International Conference on 3D Vision (3DV) {color red}(Best Paper Award Nomination)
Lumina-image 2.0: A unified and efficient image generative framework
Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao
2025 International Conference on Computer Vision (ICCV)
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
2024 ACM SIGGRAPH
FlowFormer: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
2022 The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)
Towards Robust Face Recognition with Comprehensive Search
Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
2022 European Conference on Computer Vision (ECCV)
Switchable K-class Hyperplanes for Noise-robust Representation Learning
Boxiao Liu, Guanglu Song, Manyuan Zhang, Haihang You, Yu Liu
2021 International Conference on Computer Vision (ICCV)
Discriminability Distillation in Group Representation Learning
Manyuan Zhang, Guanglu Song, Hang Zhou, Yu Liu
2020 European Conference on Computer Vision (ECCV)
Towards Flops-constrained Face Recognition
Yu Liu*, Guanglu Song*, Manyuan Zhang*, Jihao Liu*, Yucong Zhou, Junjie Yan
2019 ICCV Lightweight Face Recognition Challenge & Workshop
Tensor sensing for RF tomographic imaging
Tao Deng, Feng Qian, Xiao-Yang Liu, Manyuan Zhang, Anwar Walid
2018 IEEE International Conference on Multimedia and Expo (ICME)
Privacy-preserving sensory data recovery
Cai Chen, Manyuan Zhang, Huanzhi Zhang, Zhenyun Huang, Yong Li
2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications
Selected Projects
Working Experience
|