2026 1 27
New papers are accepted at ICLR 2026, explaining scaling laws of CoT length, showing AR models rival diffusion models for any-order generation, and exploring real-world benefits of sparsity via ultra-sparse embeddings, sparse feature attention, and predicting LLM transferability.