news

May, 2025 3 papers were accepted at ICML 2025. We inverstigated how position bias emerges in Transformers (lost-in-the-middle, attention sink, RoPE) and how to improve length generalization with output alignment. We also proposed CSR as a new paradigm to attain adaptive embeddings with sparsity.
April, 2025 Our recent work When More is Less: Understanding Chain-of-Thought Length in LLMs received the Best Paper Runner-Up Award 🏆 at ICLR 2025 Workshop on Reasoning and Planning for LLMs.
April, 2025 I will be giving a tutorial on Principles of Self-supervised Learning in the Foundation Model Era at IJCAI 2025 (Aug 16 - Aug 22). We will progressively release the accompanying materials—stay tuned!
April, 2025 I will be giving a talk at Self-Supervised Learning Workshop at the Flatiron Institute on April 29.
April, 2025 I will be giving a talk at the CSE 600 Seminar at Stony Brook University on April 18.
March, 2025 Our recent work Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation has sparked significant interest as a promising alternative paradigm for efficient embedding retrieval.
January, 2025 I will give a talk at the University of Michigan on Feb 12.
January, 2025 5 papers were accepted at ICLR 2025 (3 as a co-first author)! We proposed long-context perplexity and invariant in-context learning for better training and usage of LLMs. We also looked into some fundamental questions, such as OOD generalization of in-context learning, interplay between monosemanticity and robustness, and the nature of projection heads.
January, 2025 I will give a talk at the CILVR Seminars at New York University on Feb 5.
December, 2024 Our NeurIPS’24 work ContextSSL was featured by MIT 📰: Machines that Self-adapt to New Tasks without Re-training. It was also selected as an oral presentation (top 4) at NeurIPS’24 SSL workshop.
October, 2024 6 papers were accepted to NeurIPS 2024. We inverstigated how LLMs perform self-correction at test time (paper), how to build dynamic world models through joint embedding methods (paper), how Transformers avoid feature collapse with LayerNorm and attention masks (paper), and why equivariant prediction of data corruptions helps learn good representations (paper).
September, 2024 I gave a talk at NYU Tandon on Building Safe Foundation Models from Principled Understanding.
August, 2024 I gave a talk at Princeton University on Reimagining Self-Supervised Learning with Context.