news

January, 2026 New papers are accepted at ICLR 2026, explaining scaling laws of CoT length, showing AR models rival diffusion models for any-order generation, and exploring real-world benefits of sparsity via ultra-sparse embeddings, sparse feature attention, and predicting LLM transferability.
December, 2025 Our paper G1 has received the Best Paper Award at the NeurIPS 2025 NPGML Workshop.
August, 2025 I gave an invited talk Two New Dimensions of Sparsity for Scaling LLMs at Google DeepMind on sparse long-context training (ICLR 2025) and sparse embedding (ICML 2025 Oral).
June, 2025 Our ICML 2025 paper was featured in an MIT News article, Unpacking the bias of large language models, where we identified and theoretically proved the root causes of position bias in Transformers.
June, 2025 I gave an invited talk at the ASAP Seminar on Your Next-Token Prediction and Transformers Are Biased for Long-Context Modeling—see the recording at YouTube.
May, 2025 Three papers were accepted to ICML 2025. Our oral presentation (top 1%) introduces contrastive sparse representations (CSR) to compress state-of-the-art embedding models to just 32 active dimensions, enabling ~100× faster retrieval with minimal accuracy loss and low training cost for large-scale vector databases and RAG systems.
April, 2025 Our recent work When More is Less: Understanding Chain-of-Thought Length in LLMs received the Best Paper Runner-Up Award 🏆 at ICLR 2025 Workshop on Reasoning and Planning for LLMs.
April, 2025 I will give a tutorial on the Principles of Self-supervised Learning in the Foundation Model Era at IJCAI 2025 (Aug 16 - Aug 22).
April, 2025 I gave an invited talk at the Self-Supervised Learning Workshop hosted by Flatiron Institute (Simons Foundation) on Contextual Self-supervised Learning: A Lesson from LLMs (video).
April, 2025 I gave an invited talk at the CSE 600 Seminar at Stony Brook University on April 18.
March, 2025 Our recent work Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation has sparked significant interest as a promising alternative paradigm for efficient embedding retrieval.
January, 2025 I gave an an invited talk at the University of Michigan on Feb 12.
January, 2025 5 papers were accepted at ICLR 2025 (3 as a co-first author)! We proposed long-context perplexity and invariant in-context learning for better training and usage of LLMs. We also looked into some fundamental questions, such as OOD generalization of in-context learning, interplay between monosemanticity and robustness, and the nature of projection heads.
January, 2025 I gave an invited talk at the CILVR Seminars at New York University on Feb 5.
December, 2024 Our NeurIPS’24 work ContextSSL was featured by MIT 📰: Machines that Self-adapt to New Tasks without Re-training. It was also selected as an oral presentation (top 4) at NeurIPS’24 SSL workshop.
October, 2024 6 papers were accepted to NeurIPS 2024. We inverstigated how LLMs perform self-correction at test time (paper), how to build dynamic world models through joint embedding methods (paper), how Transformers avoid feature collapse with LayerNorm and attention masks (paper), and why equivariant prediction of data corruptions helps learn good representations (paper).
September, 2024 I gave an invited talk at NYU Tandon on Building Safe Foundation Models from Principled Understanding.
August, 2024 I gave an invited talk at Princeton University on Reimagining Self-Supervised Learning with Context.