Qi Chen

32

Papers

152

Total Citations

Papers (32)

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

WebVLN: Vision-and-Language Navigation on Websites

CREAD: A Classification-Restoration Framework with Error Adaptive Discretization for Watch Time Prediction in Video Recommender Systems

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation

Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering

Dual Energy-Based Model with Open-World Uncertainty Estimation for Out-of-distribution Detection

Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution

Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only

Contrastive Neural Architecture Search With Neural Architecture Comparators

V2C: Visual Voice Cloning

Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images

Training-Free Class Purification for Open-Vocabulary Semantic Segmentation

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

OVG-HQ: Online Video Grounding with Hybrid-modal Queries

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

Enhancing Large Language Model Performance with Gradient-Based Parameter Selection

Towards Generalizable Tumor Synthesis

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search

PolarStream: Streaming Object Detection and Segmentation with Polar Pillars

Learning Distinct and Representative Modes for Image Captioning

A Neural Corpus Indexer for Document Retrieval

Model-enhanced Vector Index