Qi Chen

18

Papers

152

Total Citations

Papers (18)

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

WebVLN: Vision-and-Language Navigation on Websites

CREAD: A Classification-Restoration Framework with Error Adaptive Discretization for Watch Time Prediction in Video Recommender Systems

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation

Dual Energy-Based Model with Open-World Uncertainty Estimation for Out-of-distribution Detection

Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

OVG-HQ: Online Video Grounding with Hybrid-modal Queries

Enhancing Large Language Model Performance with Gradient-Based Parameter Selection

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

Training-Free Class Purification for Open-Vocabulary Semantic Segmentation

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images

Towards Generalizable Tumor Synthesis