research
* denotes shared first authorship
2025
- ICLR
LLM training and eval - ICLR
LLM training Rethinking Invariance in In-context LearningICLR, 2025We discovered an expressive invariant in-context learning scheme (InvICL) that achieves permutation invariance of in-context demonstrations while preserving autoregressive nature and full context awareness at the same time. - ICLR
LLM decoding Tool Decoding: A Plug-and-Play Approach to Enhancing Language Models for Tool UsageICLR, 2025We proposed a simple training-free, plug-and-play constrained decoding scheme that significantly improves LLM performance at tool use (e.g. it rivals GPT-4o with a 7B model). - ICLRCan In-context Learning Really Generalize to Out-of-distribution Tasks?ICLR, 2025With controlled experiments, we found that in-context learning still happens only with in-domain tasks and hardly generalizes to novel OOD tasks. In other words, LLMs’ in-context abilities are learned essentially through training data with likewise tasks.
- ICLRBeyond Interpretability: The Gains of Feature Monosemanticity on Model RobustnessICLR, 2025We found that the merits of feature monosemanticity (as studied in mechanistic interpretability) extend beyond interpretability to improving robustness across various challenges like noisy data, limited training examples, and such.
- ICLRProjection Head is Secretly an Information BottleneckICLR, 2025We showed that projection heads serve as an information bottleneck that prevent features from collapsing toward the pretraining task (e.g. instance classification).
2024
- NeurIPS
Best Paper Award
at ICML-W’24A Theoretical Understanding of Self-Correction through In-context AlignmentIn NeurIPS, 2024🏆 Best Paper Award at ICML 2024 ICL Workshop
We proposed the first theoretical explanation of how LLM self-correction works (as in OpenAI o1) and showed its effectiveness against social bias and jailbreak attacks. - NeurIPS
Oral at NeurIPS-W’24 Featured by MIT In-Context Symmetries: Self-Supervised Learning through Contextual World ModelsIn NeurIPS, 2024Oral Presentation (top 4) at NeurIPS 2024 SSL Workshop & featured by MIT 📰
We introduced unsupervised test-time adaptation ability to self-supervised learning through a contextual world model designed for joint embedding (JEPA) models. - NeurIPS
- NeurIPS WorkshopReasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem ProvingIn NeurIPS 2024 Workshop on Mathematical Reasoning and AI, 2024
- NeurIPS WorkshopThe Multi-faceted Monosemanticity in Multimodal RepresentationsIn NeurIPS 2024 Workshop on Responsibly Building the Next Generation of Multimodal Foundational Models, 2024
- ICML WorkshopRethinking Invariance in In-context LearningIn ICML Workshop on Theoretical Foundations of Foundation Models (TF2M), 2024
- ICLRNon-negative Contrastive LearningIn ICLR, 2024Inspired by NMF, we introduced a simple technique (one-line) that attains 90% feature sparsity and 10x feature interpretability for self-supervised contrastive learning, with theoretical guarantees on its disentanglement and performance.
2023
- arXiv
Featured by Anthropic Jailbreak and guard aligned language models with only few in-context demonstrationsarXiv preprint arXiv:2310.06387, 2023Cited over 160 times. Featured and scaled up in Anthropic’s blog 📰, where in-context attack successfully jailbroke prominent LLMs including GPT and Claude. - ICML
- TIPEquilibrium Image Denoising with Implicit DifferentiationIEEE Transactions on Image Processing (IEEE TIP), 2023
- ICLRUnbiased Stochastic Proximal Solver for Graph Neural Networks with Equilibrium StatesIn ICLR, 2023
- AAAI
Oral On the Connection between Invariant Learning and Adversarial Training for Out-of-Distribution GeneralizationIn AAAI, 2023
2022
- NeurIPS Workshop
Oral - ICML
- ICLRChaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation OverlapIn ICLR, 2022Cited over 130 times. We derived tight generalization bounds for contrastive learning with a new realistic theoretical framework. It derived unsupervised evaluation metrics with 97% correlation to downstream performance.
- ICLR
Silver Best Paper
at ICML-W’21A Unified Contrastive Energy-based Model for Understanding the Generative Ability of Adversarial TrainingIn ICLR, 2022🏆 Silver Best Paper Award at ICML 2021 AdvML workshop
From an energy-based perspective, we formulated contrastive learning as a generative model, and established the connection between adversarial training and maximum likelihood, thus briding generative and discriminative models together.
2021
- ECML-PKDD
Best ML Paper Award Reparameterized Sampling for Generative Adversarial NetworksIn ECML-PKDD, 2021🏆 Best ML Paper Award (1/685), invited to Machine Learning
We explored using GAN discriminator (as a good reward model) to bootstrap sample quality through an efficient MCMC algorithm, which not only guarantees theoretical convergence but also improves sample efficiency and quality in practice.