Qi Wu

25
Papers
500
Total Citations
1
Affiliations

Affiliations

Carnegie Mellon University

Papers (25)

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

AAAI 2024arXiv
276
citations

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

AAAI 2024arXiv
57
citations

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

CVPR 2025
51
citations

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

CVPR 2024
42
citations

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

ICLR 2025arXiv
30
citations

WebVLN: Vision-and-Language Navigation on Websites

AAAI 2024arXiv
19
citations

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

CVPR 2024
12
citations

General Scene Adaptation for Vision-and-Language Navigation

ICLR 2025arXiv
10
citations

Invariant Random Forest: Tree-Based Model Solution for OOD Generalization

AAAI 2024arXiv
3
citations

Augmented Commonsense Knowledge for Remote Object Grounding

AAAI 2024arXiv
0
citations

The Causal Impact of Credit Lines on Spending Distributions

AAAI 2024
0
citations

Sparse Bayesian Deep Learning for Cross Domain Medical Image Reconstruction

AAAI 2024
0
citations

KPA-Tracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking

AAAI 2024
0
citations

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images

CVPR 2024
0
citations

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

CVPR 2024
0
citations

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

CVPR 2024
0
citations

ModaVerse: Efficiently Transforming Modalities with LLMs

CVPR 2024
0
citations

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

CVPR 2025
0
citations

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

CVPR 2025
0
citations

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

CVPR 2025
0
citations

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

ICCV 2025
0
citations

COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation

ICCV 2025
0
citations

MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark

AAAI 2025
0
citations

Realistic Noise Synthesis with Diffusion Models

AAAI 2025
0
citations

Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data

AAAI 2025
0
citations