Jinyi (James) Hu

I am a fourth-year Ph.D. student at Tsinghua, THUNLP lab, advised by Prof. Maosong Sun. I'm currently a visit student at MIT Han Lab, advised by Prof. Song Han and Yao Lu. My main research interest lies in foundation models for multimodal learning and embodied AI. My long-term research goal is to build a versatile agent to interact with physical world.

Previously, I interned at Bytedance working with Mingxuan Wang. During my undergraduate, I interned at Mila Québec, advised by Jian Tang, and Alibaba Damo, advised by Boxing Chen.

I am open to any possible collaboration or discussion, feel free to contact me via email.

Email  /  CV  /  Google Scholar  /  Github

profile photo
Education

B.E. Department of Computer Science and Technology, Tsinghua University, 2017-2021

Ph.D. Department of Computer Science and Technology, Tsinghua University, 2021-2026 (expected)

News

[2024.6] Open embodied agent platform LEGENT is accepted to ACL 2024 as demo paper.

[2024.1] VisCPM is accepted to ICLR 2024 as a spotlight (5%).

[2023.6] SOTA Zh-En Multimodal Model VisCPM is open-source!

[2022.10] Two papers are accepted to EMNLP 2022.

[2022.4] Della is accepted to NAACL 2022.

[2020.9] One paper is accepted to NeurIPS 2020.

Publication

(*indicates equal contribution)

Embodied AI
LEGENT: Open Platform for Embodied Agents
Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun
ACL 2024 demo
[paper] [code] [doc]

Multimodal Learning
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu, Yuan Yao, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun
ICLR 2024 (spotlight, top 5% )
[paper] [code (VisCPM)] [code (MiniCPM-V)]

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun Tat-Seng Chua
CVPR 2024
[paper] [project]

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang
CVPR 2024
[paper]

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni, Yulin Wang, Renping Zhou, Rui Lu, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Yuan Yao, Gao Huang
ECCV 2024
[paper]

GUICourse: From General Vision Language Model to Versatile GUI Agent
Wentong Chen*, Junbo Cui*, Jinyi Hu*, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, Yuan Yao, Yankai Lin, Zhiyuan Liu, Maosong Sun
arxiv 2024
[paper]

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Tianyu Yu*, Jinyi Hu*, Yuan Yao, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun
arxiv, 2023
[paper] [code (Muffin)] [dataset (UniMM-Chat)]

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Jinyi Hu, Xu Han, Xiaoyuan Yi, Yutong Chen, Wenhao Li, Zhiyuan Liu, Maosong Sun
arxiv, 2023
[paper]

Text Reasoning and Generation
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun
ACL 2024
[paper] [code]

Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation
Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie
NAACL 2022 (oral)
[paper] [code (DELLA)]

Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation
Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie
Findings of EMNLP 2022
[paper]

Towards interpretable natural language understanding with explanations as latent variables
Wangchunshu Zhou*, Jinyi Hu*, Hanlin Zhang*, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang,
NeurIPS 2020
[paper] [code]

Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Wenhao Li, Xiaoyuan Yi, Jinyi Hu, Maosong Sun, Xing Xie
EMNLP 2022
[paper]

Aspect-level sentiment-controllable review generation with mutual learning framework
Huimin Chen, Yankai Lin, Fanchao Qi, Jinyi Hu, Peng Li, Jie Zhou, Maosong Sun
AAAI 2021
[paper]

Service

Reviewer: NeurIPS, ICLR, CVPR, ACL, EMNLP, NAACL

Honors & Awards
  • Comprehensive Scholarship, 2023
  • Longhu Scholarship, 2023
  • Comprehensive Scholarship, 2022
  • Merit Student in Beijing, 2020
  • Tang Lixin Scholarship, 2020
  • Evergrande Scholarship, 2018
Miscellaneous

I usually go to the gym during the spare time. I also love NBA, Snooker and Tennis, a big fan of LeBron James, Ronnie O'Sullivan and Roger Federer. I love the number 7.