Yifei Wang (on the job market!)

Postdoc at MIT CSAIL

avatar.JPG

I am a machine learning researcher at MIT CSAIL, advised by Stefanie Jegelka. My research has contributed to theoretical principles of foundation models (generative and representation models) and efficient algorithms for model capabilities and safety:

  • Mathematical Principles of Foundation Models. I developed theoretical foundations for a broad spectrum of self-supervised learning (SSL) methods used in pretraining, encompassing contrastive [1, 2], non-contrastive [3], reconstructive [4], autoregressive [5], and predictive [6, 7] approaches. I also analyzed the feature dynamics in deep neural networks [8, 9] and pioneered a rigorous understanding [10] of LLMs’ self-correction ability, a critical component for test-time reasoning.
  • Improving Model Capabilities. I leveraged these principles to “debug” foundation models. I addressed the rank collapse issue in neural networks [11, 12], generalized self-supervised learning with unsupervised world models [13]; alleviated data shortage by adaptively utilizing AI-generated data [14], and enhanced LLM long-context understanding with novel perplexity measures [15].
  • Trustworthy Foundation Models. I contributed to theory-inspired algorithms to build trustworthy foundation models with respect to adversarial robustness [16, 17, 18, 11], feature interpretability [19, 20], and domain generalization [21, 22, 23]. In particular, I contributed to the use of LLMs’ own emergent abilities (such as in-context learning [24] and self-correction [10]) for jailbreaking and defending LLMs (featured and scaled up by Anthropic).

My first-author papers received the Best ML Paper Award at ECML-PKDD 2021, the Silver Best Paper Award at ICML 2021 AdvML workshop, and the Best Paper Award at ICML 2024 ICL workshop. My thesis won the CAAI Outstanding Ph.D. Dissertation Runner-Up Award. I published 33 papers at NeurIPS, ICLR, and ICML, and I am a (co)-first author on 22 of them.

I served as an area chair for ICLR 2024 and 2025 and as a regular reviewer for main AI/ML conferences (NeurIPS, ICML, ECML, AISTATS, LoG, CVPR, ACL). I co-organized the NeurIPS 2024 Workshop on Red Teaming GenAI and the MIT ML Tea Seminar.

I obtained my PhD in Applied Mathematics from Peking University in 2023, advised by Yisen Wang, Zhouchen Lin, Jiansheng Yang. Prior to that, I did my undergraduate at School of Mathematical Sciences, Peking University.

news

December, 2024 I gave a talk on Principles of Foundations Models at Johns Hopkins University.
November, 2024 I gave a guest lecture on Towards Test-time Self-supervised Learning (slides) at Boston College.
October, 2024 3 new preprints are out, exploring 1) how existing long-context training of LLMs is problematic and how to address it (paper), 2) how sparse autoencoders can significantly improve robustness at noisy and few-shot scenarios (paper), and 3) whether ICL can truly extrapolate to OOD scenarios (paper).
October, 2024 6 papers were accepted to NeurIPS 2024. We inverstigated how LLMs perform self-correction at test time (paper), how to build dynamic world models through joint embedding methods (paper), how Transformers avoid feature collapse with LayerNorm and attention masks (paper), and why equivariant prediction of data corruptions helps learn good representations (paper).
September, 2024 I gave a talk at NYU Tandon on Building Safe Foundation Models from Principled Understanding.
August, 2024 I gave a talk at Princeton University on Reimagining Self-Supervised Learning with Context.

selected publications

  1. arXiv LLM training & eval
    What is Wrong with Perplexity for Long-context Language Modeling?
    Lizhe Fang*Yifei Wang*, Zhaoyang Liu, Chenheng Zhang, Stefanie Jegelka, Jinyang Gao, Bolin Ding, and Yisen Wang
    arXiv preprint arXiv:2410.23771, 2024
    We proposed a long-context perplexity measure that emphasizes long-context relevant tokens at training and evaluation, improving benchmark scores on LongBench, LongEval, and RULER by up to 22%.
  2. NeurIPS Best Paper Award
    at ICML-W’24
    A Theoretical Understanding of Self-Correction through In-context Alignment
    Yifei Wang*, Yuyang Wu*, Zeming Wei, Stefanie Jegelka, and Yisen Wang
    In NeurIPS, 2024
    Best Paper Award at ICML 2024 ICL Workshop
    We proposed the first theoretical explanation of how LLM self-correction works (as in OpenAI o1) and showed its effectiveness against social bias and jailbreak attacks.
  3. NeurIPS Oral at NeurIPS-W’24
    In-Context Symmetries: Self-Supervised Learning through Contextual World Models
    Sharut Gupta*, Chenyu Wang*Yifei Wang*, Tommi Jaakkola, and Stefanie Jegelka
    In NeurIPS, 2024
    Oral Presentation (top 4) at NeurIPS 2024 SSL Workshop
    We introduced unsupervised test-time adaptation ability to self-supervised learning through a contextual world model designed for joint embedding (JEPA) models.
  4. arXiv Featured by Anthropic
    Jailbreak and guard aligned language models with only few in-context demonstrations
    Zeming Wei,  Yifei Wang, and Yisen Wang
    arXiv preprint arXiv:2310.06387, 2023
    Cited over 140 times. Featured and scaled up in Anthropic’s research blog, where it successfully demonstrated jailbreaking prominent LLMs, including GPT and Claude.
  5. ICLR
    Non-negative Contrastive Learning
    Yifei Wang*, Qi Zhang*, Yaoyu Guo, and Yisen Wang
    In ICLR, 2024
    Inspired by NMF, we introduced a one-line technique that attains 90% feature sparsity and 10x feature interpretability in contrastive learning models.
  6. ICLR
    Rethinking the Effect of Data Augmentation in Adversarial Contrastive Learning
    Rundong Luo*Yifei Wang*, and Yisen Wang
    In ICLR, 2023
    We improved adversarial robustness under AutoAttack by 9% in the unsupervised setting with a dynamic training schedule, without extra computation cost.
  7. NeurIPS Spotlight
    How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders
    Qi Zhang*Yifei Wang*, and Yisen Wang
    In NeurIPS Spotlight (Top 5%), 2022
    We theoretically explained how masked autoencoders work and revealed their mathematical connections to joint embedding methods, unifying them as a whole.
  8. ICLR
    Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap
    Yifei Wang*, Qi Zhang*, Yisen Wang, Jiansheng Yang, and Zhouchen Lin
    In ICLR, 2022
    Cited over 120 times. We derived tight generalization bounds for contrastive learning with a new realistic theoretical framework. It derived unsupervised evaluation metrics with 97% correlation to downstream performance.
  9. ICLR Silver Best Paper
    at ICML-W’21
    A Unified Contrastive Energy-based Model for Understanding the Generative Ability of Adversarial Training
    Yifei Wang, Yisen Wang, Jiansheng Yang, and Zhouchen Lin
    In ICLR, 2022
    Silver Best Paper Award at ICML 2021 AdvML workshop
    From an energy-based perspective, we formulated contrastive learning as a generative model, and established the connection between adversarial training and maximum likelihood, thus briding generative and discriminative models together.
  10. ECML-PKDD Best ML Paper Award
    Reparameterized Sampling for Generative Adversarial Networks
    Yifei Wang, Yisen Wang, Jiansheng Yang, and Zhouchen Lin
    In ECML-PKDD, 2021
    Best ML Paper Award (1/685), invited to Machine Learning
    We explored using GAN discriminator (as a good reward model) to bootstrap sample quality through an efficient MCMC algorithm, which not only guarantees theoretical convergence but also improves sample efficiency and quality in practice.