Yifei Wang

I am interested in understanding and building AI models with good representation of the world, with a focus on self-supervised learning and foundation models (see research). I obtained my PhD in Applied Mathematics from Peking University in 2023, advised by Yisen Wang, Zhouchen Lin, Jiansheng Yang. I did my undergraduate at School of Mathematical Sciences at Peking University as well. My first-author papers received 4 best paper awards and I served as an Area Chair for ICLR 2024 and 2025.

Science often arises from alchemy, and I enjoy distilling deep learning alchemy and mystery into scientific understanding. For instance, we established mathematical understanding of why overthinking harms LLM reasoning, why Transformers have position bias, why DINO features won’t collapse, why MAE learns good features, why adversarial training severely overfits, and why robust models become generative.

news

June, 2025	I gave an invited talk at the ASAP seminar on Your Next-Token Prediction and Transformers Are Biased for Long-Context Modeling (slides and recording).
May, 2025	3 papers were accepted at ICML 2025. We proposed CSR (Oral Presentation) that builds state-of-the-art shortening embedding (image/text/multimodal) with sparse coding. We characterized the reasons behind Transformers’ position bias and enhanced length generalization with output space alignment.
April, 2025	Our recent work When More is Less: Understanding Chain-of-Thought Length in LLMs received the Best Paper Runner-Up Award 🏆 at ICLR 2025 Workshop on Reasoning and Planning for LLMs.
April, 2025	I will give a tutorial on the Principles of Self-supervised Learning in the Foundation Model Era at IJCAI 2025 (Aug 16 - Aug 22). See you in Montreal.
April, 2025	I gave an invited talk at the Self-Supervised Learning Workshop hosted by Flatiron Institute (Simons Foundation) on Contextual Self-supervised Learning: A Lesson from LLMs (recording).

recent highlights

ICML Oral

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Tiansheng Wen^*, Yifei Wang^*, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, and Chenyu You

ICML Oral Presentation (1%), 2025

PDF Code
ICLR Workshop Best Paper Runner-up

When More is Less: Understanding Chain-of-Thought Length in LLMs

Yuyang Wu^*, Yifei Wang^*, Ziyu Ye, Tianqi Du, Stefanie Jegelka, and Yisen Wang

ICLR 2025 Workshop on Reasoning and Planning for LLMs, 2025

🏆 Best Paper Runner-up Award

PDF
ICLR LLM training and eval

What is Wrong with Perplexity for Long-context Language Modeling?

Lizhe Fang^*, Yifei Wang^*, Zhaoyang Liu, Chenheng Zhang, Stefanie Jegelka, Jinyang Gao, Bolin Ding, and Yisen Wang

ICLR, 2025

PDF Video Code