Caiming Xiong
50
Papers
1,561
Total Citations
Papers (50)
Learned in Translation: Contextualized Word Vectors
NeurIPS 2017arXiv
932
citations
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
CVPR 2024
192
citations
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
ICML 2025
165
citations
HIVE: Harnessing Human Feedback for Instructional Visual Editing
CVPR 2024
164
citations
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
ICLR 2024
104
citations
ViUniT: Visual Unit Tests for More Robust Visual Programming
CVPR 2025
2
citations
Trust but Verify: Programmatic VLM Evaluation in the Wild
ICCV 2025
2
citations
Can Humans Fly? Action Understanding With Multiple Classes of Actors
CVPR 2015
0
citations
Recognizing Car Fluents From Video
CVPR 2016
0
citations
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
CVPR 2017arXiv
0
citations
End-to-End Dense Video Captioning With Masked Transformer
CVPR 2018arXiv
0
citations
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
CVPR 2019
0
citations
The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation
CVPR 2019
0
citations
Learning From Noisy Anchors for One-Stage Object Detection
CVPR 2020arXiv
0
citations
WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos
CVPR 2021arXiv
0
citations
Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework
CVPR 2022arXiv
0
citations
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
CVPR 2023
0
citations
StartNet: Online Detection of Action Start in Untrimmed Videos
ICCV 2019
0
citations
Learning From Noisy Data With Robust Representation Learning
ICCV 2021
0
citations
CoMatch: Semi-Supervised Learning With Contrastive Graph Regularization
ICCV 2021arXiv
0
citations
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
ICCV 2023
0
citations
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
ECCV 2022
0
citations
Structured Scene Memory for Vision-Language Navigation
CVPR 2021arXiv
0
citations
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
NeurIPS 2025
0
citations
Text2Data: Low-Resource Data Generation with Textual Control
AAAI 2025
0
citations
Diffusion Model Alignment Using Direct Preference Optimization
CVPR 2024
0
citations
Unified Training of Universal Time Series Forecasting Transformers
ICML 2024
0
citations
Position: TrustLLM: Trustworthiness in Large Language Models
ICML 2024
0
citations
Joint Action Recognition and Pose Estimation From Video
CVPR 2015
0
citations
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards
NeurIPS 2019
0
citations
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
NeurIPS 2019
0
citations
Online Structured Meta-learning
NeurIPS 2020
0
citations
Theory-Inspired Path-Regularized Differential Network Architecture Search
NeurIPS 2020
0
citations
Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning
NeurIPS 2020
0
citations
Towards Understanding Hierarchical Learning: Benefits of Neural Representations
NeurIPS 2020
0
citations
A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning
NeurIPS 2021
0
citations
Evaluating State-of-the-Art Classification Models Against Bayes Optimality
NeurIPS 2021
0
citations
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
NeurIPS 2021
0
citations
Understanding the Under-Coverage Bias in Uncertainty Estimation
NeurIPS 2021
0
citations
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
NeurIPS 2021
0
citations
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
NeurIPS 2021
0
citations
Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization
NeurIPS 2022
0
citations
Policy Optimization for Markov Games: Unified Framework and Faster Convergence
NeurIPS 2022
0
citations
Preference-grounded Token-level Guidance for Language Model Fine-tuning
NeurIPS 2023
0
citations
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
NeurIPS 2023
0
citations
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
NeurIPS 2023
0
citations
Dynamic Memory Networks for Visual and Textual Question Answering
ICML 2016
0
citations
Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting
ICML 2019
0
citations
Taming MAML: Efficient unbiased meta-reinforcement learning
ICML 2019
0
citations
On the Generalization Gap in Reparameterizable Reinforcement Learning
ICML 2019
0
citations