Ran Xu

20

Papers

462

Total Citations

Papers (20)

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Trust but Verify: Programmatic VLM Evaluation in the Wild

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

Position: TrustLLM: Trustworthiness in Large Language Models

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Mask-Free OVIS: Open-Vocabulary Instance Segmentation Without Manual Mask Annotations

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Burn after Reading: Online Adaptation for Cross-Domain Streaming Data

SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles

Structured Policy Optimization: Enhance Large Vision-Language Model via Self-referenced Dialogue

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

Text2Data: Low-Resource Data Generation with Textual Control

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild