Xiaohu Qie

18

Papers

1,423

Total Citations

Papers (18)

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Bridging Video-Text Retrieval With Multiple Choice Questions

Object-Aware Video-Language Pre-Training for Retrieval

BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild

UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection

Accelerating Vision-Language Pretraining With Free Language Modeling

All in One: Exploring Unified Video-Language Pre-Training

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval

RILS: Masked Visual Reconstruction in Language Semantic Space

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Order-Prompted Tag Sequence Generation for Video Tagging

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video

OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval

Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes