Qi Wu

78
Papers
627
Total Citations
1
Affiliations

Affiliations

Carnegie Mellon University

Papers (78)

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

AAAI 2024arXiv
276
citations

Object-and-Action Aware Model for Visual Language Navigation

ECCV 2020
127
citations

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

AAAI 2024arXiv
57
citations

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

CVPR 2025
51
citations

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

CVPR 2024
42
citations

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

ICLR 2025arXiv
30
citations

WebVLN: Vision-and-Language Navigation on Websites

AAAI 2024arXiv
19
citations

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

CVPR 2024
12
citations

General Scene Adaptation for Vision-and-Language Navigation

ICLR 2025
10
citations

Invariant Random Forest: Tree-Based Model Solution for OOD Generalization

AAAI 2024arXiv
3
citations

The Causal Impact of Credit Lines on Spending Distributions

AAAI 2024
0
citations

Sparse Bayesian Deep Learning for Cross Domain Medical Image Reconstruction

AAAI 2024
0
citations

KPA-Tracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking

AAAI 2024
0
citations

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images

CVPR 2024
0
citations

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

CVPR 2024
0
citations

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

CVPR 2024
0
citations

ModaVerse: Efficiently Transforming Modalities with LLMs

CVPR 2024
0
citations

What Value Do Explicit High Level Concepts Have in Vision to Language Problems?

CVPR 2016
0
citations

Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources

CVPR 2016
0
citations

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

CVPR 2017
0
citations

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

CVPR 2018arXiv
0
citations

Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries

CVPR 2018arXiv
0
citations

Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning

CVPR 2018arXiv
0
citations

Learning Semantic Concepts and Order for Image and Sentence Matching

CVPR 2018arXiv
0
citations

Visual Question Answering With Memory-Augmented Networks

CVPR 2018arXiv
0
citations

Visual Grounding via Accumulated Attention

CVPR 2018
0
citations

Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks

CVPR 2019
0
citations

Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks

CVPR 2019
0
citations

What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions

CVPR 2019
0
citations

Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs

CVPR 2020arXiv
0
citations

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

CVPR 2020
0
citations

Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning

CVPR 2020arXiv
0
citations

Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only

CVPR 2020arXiv
0
citations

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

CVPR 2020arXiv
0
citations

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning

CVPR 2020arXiv
0
citations

Sketch, Ground, and Refine: Top-Down Dense Video Captioning

CVPR 2021
0
citations

Towards Accurate Text-Based Image Captioning With Content Diversity Exploration

CVPR 2021arXiv
0
citations

Jo-SRC: A Contrastive Approach for Combating Noisy Labels

CVPR 2021
0
citations

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

CVPR 2021
0
citations

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

CVPR 2021arXiv
0
citations

VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

CVPR 2021
0
citations

V2C: Visual Voice Cloning

CVPR 2022arXiv
0
citations

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

CVPR 2022arXiv
0
citations

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering

CVPR 2022arXiv
0
citations

Maintaining Reasoning Consistency in Compositional Visual Question Answering

CVPR 2022
0
citations

HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation

CVPR 2022arXiv
0
citations

Learning To Dub Movies via Hierarchical Prosody Models

CVPR 2023arXiv
0
citations

S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

CVPR 2023
0
citations

The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

ICCV 2021
0
citations

AerialVLN: Vision-and-Language Navigation for UAVs

ICCV 2023arXiv
0
citations

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

ICCV 2023
0
citations

Scaling Data Generation in Vision-and-Language Navigation

ICCV 2023arXiv
0
citations

Identity-Consistent Aggregation for Video Object Detection

ICCV 2023arXiv
0
citations

ShapeScaffolder: Structure-Aware 3D Shape Generation from Text

ICCV 2023
0
citations

March in Chat: Interactive Prompting for Remote Embodied Referring Expression

ICCV 2023arXiv
0
citations

NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping

ICCV 2023
0
citations

Soft Expert Reward Learning for Vision-and-Language Navigation

ECCV 2020
0
citations

Length-Controllable Image Captioning

ECCV 2020
0
citations

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

ECCV 2020
0
citations

UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier

ECCV 2022
0
citations

A Simple and Robust Correlation Filtering Method for Text-Based Person Search

ECCV 2022
0
citations

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

ICCV 2023arXiv
0
citations

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

CVPR 2025
0
citations

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

CVPR 2025
0
citations

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

CVPR 2025
0
citations

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

ICCV 2025
0
citations

COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation

ICCV 2025
0
citations

MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark

AAAI 2025
0
citations

Realistic Noise Synthesis with Diffusion Models

AAAI 2025
0
citations

Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data

AAAI 2025
0
citations

Augmented Commonsense Knowledge for Remote Object Grounding

AAAI 2024arXiv
0
citations

Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning

NeurIPS 2018
0
citations

Cross-sectional Learning of Extremal Dependence among Financial Assets

NeurIPS 2019
0
citations

Language and Visual Entity Relationship Graph for Agent Navigation

NeurIPS 2020
0
citations

Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision

NeurIPS 2021
0
citations

Debiased Visual Question Answering from Feature and Sample Perspectives

NeurIPS 2021
0
citations

Learning Distinct and Representative Modes for Image Captioning

NeurIPS 2022
0
citations

LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering

NeurIPS 2023
0
citations