Yifei Wang (on the job market!)
Postdoctoral researcher at MIT CSAIL, advised by Stefanie Jegelka.
My goal is to develop models that learn from massive data with minimal human efforts, which drives my persistent interests in self-supervised foundation models. My research has contributed to unveiling the key principles underlying these foundation models and designing efficient algorithms to improve their capabilities and safety:
- Mathematical Principles of Foundation Models. We established theoretical foundations for a broad spectrum of Self-Supervised Learning (SSL) methods that are at the heart of foundation models, from contrastive [1, 2], non-contrastive [3], autoregressive [4], reconstructive [5], to predictive [6] approaches. Our recent work further pioneered the first rigorous theory [7] for the test-time self-correction ability of LLMs, a key mechanism for scaling reasoning during inference.
- Improving Model Capabilities. We leveraged these principles to “debug” and “boost” foundation models. We generalized self-supervised learning to be able to self-adapt to new tasks without retraining [8] (featured by MIT), proposed adaptive training with AI data to circumvent data shortage [9], and significantly enhanced LLMs’ long-context understanding through self-identifying key tokens [10].
- Safe and Trustworthy AI. We developed principled understandings and algorithms for adversarial robustness [11, 12, 13, 14], interpretability [15, 16], and domain generalization [17, 18, 19]. In DynACL [20], we built the first self-supervised model that is as robust as the supervised one. We firstly showed that LLMs’ core emergent abilities, in-context learning [21] and self-correction [7], can play important roles in safety tasks like jailbreaking, which was featured and scaled up by Anthropic.
My first-author papers received the Best ML Paper Award (1/685) at ECML-PKDD 2021, the Silver Best Paper Award at ICML 2021 workshop, and the Best Paper Award at ICML 2024 workshop. My thesis won CAAI Outstanding Ph.D. Dissertation Runner-Up Award. I have published 44 peer-reviewed papers (39 in NeurIPS, ICLR, and ICML), and I am a (co-)first author on 28 of them.
I served as an organizer for NeurIPS 2024 Workshop on Red Teaming GenAI and the ML Tea Seminar at MIT. I served as an Area Chair for ICLR 2024 and 2025, and as a reviewer for main AI conferences (NeurIPS, ICML, ECML, AISTATS, LoG, CVPR, ACL).
I obtained my PhD in Applied Mathematics from Peking University in 2023, advised by Yisen Wang, Zhouchen Lin, Jiansheng Yang. Prior to that, I did my undergraduate at School of Mathematical Sciences, Peking University.
I am on the job market 2024-2025 and actively looking for jobs! Links: CV | Research Statement
news
January, 2025 | 6 papers were accepted at ICLR 2025 (3 as a co-first author)! We proposed long-context perplexity, invariant in-context learning, constrained tool decoding for better training and usage of LLMs. We also looked into some fundamental questions, such as OOD generalization of in-context learning, interplay between monosemanticity and robustness, and the nature of projection heads. |
---|---|
January, 2025 | I will give a talk at the CILVR seminar at NYU CDS on Feb 5. |
January, 2025 | I will give a talk at Boston University on Jan 29. |
December, 2024 | Our NeurIPS’24 work ContextSSL was featured by MIT 📰: Machines that Self-adapt to New Tasks without Re-training. It was also selected as an oral presentation (top 4) at NeurIPS’24 SSL workshop. |
December, 2024 | I gave a talk on Principles of Foundations Models at Johns Hopkins University. |
selected publications
- ICLR
LLM training and eval - NeurIPS
Best Paper Award
at ICML-W’24A Theoretical Understanding of Self-Correction through In-context AlignmentIn NeurIPS, 2024🏆 Best Paper Award at ICML 2024 ICL Workshop
We proposed the first theoretical explanation of how LLM self-correction works (as in OpenAI o1) and showed its effectiveness against social bias and jailbreak attacks. - NeurIPS
Oral at NeurIPS-W’24 Featured by MIT In-Context Symmetries: Self-Supervised Learning through Contextual World ModelsIn NeurIPS, 2024Oral Presentation (top 4) at NeurIPS 2024 SSL Workshop & featured by MIT 📰
We introduced unsupervised test-time adaptation ability to self-supervised learning through a contextual world model designed for joint embedding (JEPA) models. - arXiv
Featured by Anthropic Jailbreak and guard aligned language models with only few in-context demonstrationsarXiv preprint arXiv:2310.06387, 2023Cited over 160 times. Featured and scaled up in Anthropic’s blog 📰, where in-context attack successfully jailbroke prominent LLMs including GPT and Claude. - ICLRNon-negative Contrastive LearningIn ICLR, 2024Inspired by NMF, we introduced a simple technique (one-line) that attains 90% feature sparsity and 10x feature interpretability for self-supervised contrastive learning, with theoretical guarantees on its disentanglement and performance.
- ICLRChaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation OverlapIn ICLR, 2022Cited over 130 times. We derived tight generalization bounds for contrastive learning with a new realistic theoretical framework. It derived unsupervised evaluation metrics with 97% correlation to downstream performance.
- ICLR
Silver Best Paper
at ICML-W’21A Unified Contrastive Energy-based Model for Understanding the Generative Ability of Adversarial TrainingIn ICLR, 2022🏆 Silver Best Paper Award at ICML 2021 AdvML workshop
From an energy-based perspective, we formulated contrastive learning as a generative model, and established the connection between adversarial training and maximum likelihood, thus briding generative and discriminative models together. - ECML-PKDD
Best ML Paper Award Reparameterized Sampling for Generative Adversarial NetworksIn ECML-PKDD, 2021🏆 Best ML Paper Award (1/685), invited to Machine Learning
We explored using GAN discriminator (as a good reward model) to bootstrap sample quality through an efficient MCMC algorithm, which not only guarantees theoretical convergence but also improves sample efficiency and quality in practice.