Caiming Xiong

50
Papers
1,561
Total Citations

Papers (50)

Learned in Translation: Contextualized Word Vectors

NeurIPS 2017arXiv
932
citations

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

CVPR 2024
192
citations

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

ICML 2025
165
citations

HIVE: Harnessing Human Feedback for Instructional Visual Editing

CVPR 2024
164
citations

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

ICLR 2024
104
citations

ViUniT: Visual Unit Tests for More Robust Visual Programming

CVPR 2025
2
citations

Trust but Verify: Programmatic VLM Evaluation in the Wild

ICCV 2025
2
citations

Can Humans Fly? Action Understanding With Multiple Classes of Actors

CVPR 2015
0
citations

Recognizing Car Fluents From Video

CVPR 2016
0
citations

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

CVPR 2017arXiv
0
citations

End-to-End Dense Video Captioning With Masked Transformer

CVPR 2018arXiv
0
citations

AdaFrame: Adaptive Frame Selection for Fast Video Recognition

CVPR 2019
0
citations

The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation

CVPR 2019
0
citations

Learning From Noisy Anchors for One-Stage Object Detection

CVPR 2020arXiv
0
citations

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

CVPR 2021arXiv
0
citations

Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework

CVPR 2022arXiv
0
citations

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

CVPR 2023
0
citations

StartNet: Online Detection of Action Start in Untrimmed Videos

ICCV 2019
0
citations

Learning From Noisy Data With Robust Representation Learning

ICCV 2021
0
citations

CoMatch: Semi-Supervised Learning With Contrastive Graph Regularization

ICCV 2021arXiv
0
citations

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

ICCV 2023
0
citations

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

ECCV 2022
0
citations

Structured Scene Memory for Vision-Language Navigation

CVPR 2021arXiv
0
citations

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

NeurIPS 2025
0
citations

Text2Data: Low-Resource Data Generation with Textual Control

AAAI 2025
0
citations

Diffusion Model Alignment Using Direct Preference Optimization

CVPR 2024
0
citations

Unified Training of Universal Time Series Forecasting Transformers

ICML 2024
0
citations

Position: TrustLLM: Trustworthiness in Large Language Models

ICML 2024
0
citations

Joint Action Recognition and Pose Estimation From Video

CVPR 2015
0
citations

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

NeurIPS 2019
0
citations

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

NeurIPS 2019
0
citations

Online Structured Meta-learning

NeurIPS 2020
0
citations

Theory-Inspired Path-Regularized Differential Network Architecture Search

NeurIPS 2020
0
citations

Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning

NeurIPS 2020
0
citations

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

NeurIPS 2020
0
citations

A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning

NeurIPS 2021
0
citations

Evaluating State-of-the-Art Classification Models Against Bayes Optimality

NeurIPS 2021
0
citations

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

NeurIPS 2021
0
citations

Understanding the Under-Coverage Bias in Uncertainty Estimation

NeurIPS 2021
0
citations

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

NeurIPS 2021
0
citations

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

NeurIPS 2021
0
citations

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

NeurIPS 2022
0
citations

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

NeurIPS 2022
0
citations

Preference-grounded Token-level Guidance for Language Model Fine-tuning

NeurIPS 2023
0
citations

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

NeurIPS 2023
0
citations

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

NeurIPS 2023
0
citations

Dynamic Memory Networks for Visual and Textual Question Answering

ICML 2016
0
citations

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

ICML 2019
0
citations

Taming MAML: Efficient unbiased meta-reinforcement learning

ICML 2019
0
citations

On the Generalization Gap in Reparameterizable Reinforcement Learning

ICML 2019
0
citations