Yifei Wang

I work on understanding and advancing the machine learning principles in foundation models (esp LLMs), with major interests in self-supervised learning, long-context learning, reasoning, and safety (overview). My recent papers on these topics received 4 best paper awards and were featured by MIT News, CSAIL News, and Anthropic. I serve as an Area Chair for ICLR and review for ICML, NeurIPS, JMLR, PNAS.

I earned my Ph.D. in Applied Mathematics from Peking University, advised by Yisen Wang, Zhouchen Lin, and Jiansheng Yang. Previously, I received a B.S. in Data Science from the School of Mathematical Sciences, Peking University, advised by Tong Lin, and a B.A. in Philosophy from Peking University, advised by Zengding Wu.

news

September, 2025	Four papers were accepted at NeurIPS 2025, including RL-driven graph reasoning for LLMs (G1), signed graph propagation, hierarchical diffusion LMs, and ranking-based LLM decoding.
August, 2025	I gave an invited talk Two New Dimensions of Sparsity for Scaling LLMs at Google DeepMind’s Gemini team, covering our recent work on sparse long-context training (ICLR 2025) and sparse embedding (ICML 2025 Oral).
June, 2025	Our ICML 2025 paper was featured in an MIT News article, Unpacking the bias of large language models, where we identified and theoretically proved the root causes of position bias in Transformers.
June, 2025	I gave an invited talk at the ASAP Seminar on Your Next-Token Prediction and Transformers Are Biased for Long-Context Modeling—see the recording at YouTube.
May, 2025	Three papers were accepted to ICML 2025. Our oral presentation (top 1%) introduces contrastive sparse representations (CSR) to compress state-of-the-art embedding models to just 32 active dimensions, enabling ~100× faster retrieval with minimal accuracy loss and low training cost for large-scale vector databases and RAG systems.
April, 2025	Our recent work When More is Less: Understanding Chain-of-Thought Length in LLMs received the Best Paper Runner-Up Award 🏆 at ICLR 2025 Workshop on Reasoning and Planning for LLMs.

recent highlights

NeurIPS

G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Xiaojun Guo^*, Ang Li^*, Yifei Wang^*, Stefanie Jegelka, and Yisen Wang

NeurIPS, 2025

PDF Code
ICML Oral

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Tiansheng Wen^*, Yifei Wang^*, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, and Chenyu You

ICML Oral Presentation (1%), 2025

PDF Code
ICML

On the Emergence of Position Bias in Transformers

Xinyi Wu, Yifei Wang, Stefanie Jegelka, and Ali Jadbabaie

ICML, 2025

Featured by MIT News 📰.

PDF Video
ICLR Workshop Best Paper Runner-up

When More is Less: Understanding Chain-of-Thought Length in LLMs

Yuyang Wu^*, Yifei Wang^*, Ziyu Ye, Tianqi Du, Stefanie Jegelka, and Yisen Wang

ICLR 2025 Workshop on Reasoning and Planning for LLMs, 2025

🏆 Best Paper Runner-up Award

PDF
ICLR LLM training and eval

What is Wrong with Perplexity for Long-context Language Modeling?

Lizhe Fang^*, Yifei Wang^*, Zhaoyang Liu, Chenheng Zhang, Stefanie Jegelka, Jinyang Gao, Bolin Ding, and Yisen Wang

ICLR, 2025

PDF Video Code