January, 2025 | 6 papers were accepted at ICLR 2025 (3 as a co-first author)! We proposed long-context perplexity, invariant in-context learning, constrained tool decoding for better training and usage of LLMs. We also looked into some fundamental questions, such as OOD generalization of in-context learning, interplay between monosemanticity and robustness, and the nature of projection heads. |
January, 2025 | I will give a talk at the CILVR seminar at NYU CDS on Feb 5. |
January, 2025 | I will give a talk at Boston University on Jan 29. |
December, 2024 | Our NeurIPS’24 work ContextSSL was featured by MIT 📰: Machines that Self-adapt to New Tasks without Re-training. It was also selected as an oral presentation (top 4) at NeurIPS’24 SSL workshop. |
December, 2024 | I gave a talk on Principles of Foundations Models at Johns Hopkins University. |
November, 2024 | I gave a guest lecture on Towards Test-time Self-supervised Learning (slides) at Boston College. |
October, 2024 | 3 new papers are on arxiv, exploring 1) how existing long-context training of LLMs is problematic and how to address it (paper), 2) how sparse autoencoders can significantly improve robustness at noisy and few-shot scenarios (paper), and 3) whether ICL can truly extrapolate to OOD scenarios (paper). |
October, 2024 | 6 papers were accepted to NeurIPS 2024. We inverstigated how LLMs perform self-correction at test time (paper), how to build dynamic world models through joint embedding methods (paper), how Transformers avoid feature collapse with LayerNorm and attention masks (paper), and why equivariant prediction of data corruptions helps learn good representations (paper). |
September, 2024 | I gave a talk at NYU Tandon on Building Safe Foundation Models from Principled Understanding. |
August, 2024 | I gave a talk at Princeton University on Reimagining Self-Supervised Learning with Context. |