research

My research spans the theory and methodology of the following core areas of modern machine learning (click to see):

Research Methodology. Although people often find that deep learning relies on tons of empirical intuitions, I’d rather believe that “nothing is more intuitive than a good theory.” The process of formalizing a theory of your intuition helps question, sharpen, validate your understanding. I follow this principle in my work and hope you find it helpful as well.


I. Contextual Understanding and Reasoning with LLMs

How LLMs understand, adapt to, and reason with contexts.

I.1 Contextual Understanding: In-context Learning, Long-context Modeling, Length Generalization

I.2 Chain-of-thought and Reasoning: Self-correction, Optimal Chain-of-thought Length, Reasoning on Graph

I.3 Transformers: Position Bias, Multi-layer Attention, Dimensional Collapse


II. Unsupervised Representation Learning

How to pretrain powerful foundation models from massive unlabeled data.

II.1 Self-supervised Learning (SSL): Contrastive, Non-contrastive, Equivariant, Contextual

II.2 Generative Models: Masked Autoencoders, Autoregressive, Energy-based Models, Generative Adversarial Networks

II.3 Key SSL Components: Predictor, Discrete Tokenization, Projector, Generated Data

II.4 Feature Sparsity, Identifiability, Interpretability: Non-negative CL, triCL, Robustness Gains, CSR Embedding


III. Robust Representation Learning

How to build models robust to adversarial attacks and reliable across distribution shifts.

III.1 Adversarial Attack and Defense: Adversarial Examples, Unsupervised Adversarial Training, Robust Overfitting

III.2 LLM Jailbreak: In-context Attack and Defense, Safety of Chain-of-thought Reasoning

III.3 Out-of-distribution (OOD) Generalization: OODRobustBench, Adversarial Training for OOD


IV. Structural Representation Learning

How to learn structured data (e.g., graphs) efficiently with structured models, such as, Graph Neural Networks (GNNs).

IV.1 Feature Dynamics of GNNs: oversmoothing, graph equilibrium, unbiased graph sampling, GraphSSL

IV.2 Learning with Symmetry: Laplacian canonicalization, theory of canonicalization