Qi Wu
78
Papers
627
Total Citations
1
Affiliations
Affiliations
Carnegie Mellon University
Papers (78)
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
AAAI 2024arXiv
276
citations
Object-and-Action Aware Model for Visual Language Navigation
ECCV 2020
127
citations
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
AAAI 2024arXiv
57
citations
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting
CVPR 2025
51
citations
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
CVPR 2024
42
citations
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs
ICLR 2025arXiv
30
citations
WebVLN: Vision-and-Language Navigation on Websites
AAAI 2024arXiv
19
citations
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
CVPR 2024
12
citations
General Scene Adaptation for Vision-and-Language Navigation
ICLR 2025
10
citations
Invariant Random Forest: Tree-Based Model Solution for OOD Generalization
AAAI 2024arXiv
3
citations
The Causal Impact of Credit Lines on Spending Distributions
AAAI 2024
0
citations
Sparse Bayesian Deep Learning for Cross Domain Medical Image Reconstruction
AAAI 2024
0
citations
KPA-Tracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking
AAAI 2024
0
citations
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
CVPR 2024
0
citations
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
CVPR 2024
0
citations
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
CVPR 2024
0
citations
ModaVerse: Efficiently Transforming Modalities with LLMs
CVPR 2024
0
citations
What Value Do Explicit High Level Concepts Have in Vision to Language Problems?
CVPR 2016
0
citations
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources
CVPR 2016
0
citations
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
CVPR 2017
0
citations
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
CVPR 2018arXiv
0
citations
Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries
CVPR 2018arXiv
0
citations
Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning
CVPR 2018arXiv
0
citations
Learning Semantic Concepts and Order for Image and Sentence Matching
CVPR 2018arXiv
0
citations
Visual Question Answering With Memory-Augmented Networks
CVPR 2018arXiv
0
citations
Visual Grounding via Accumulated Attention
CVPR 2018
0
citations
Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks
CVPR 2019
0
citations
Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks
CVPR 2019
0
citations
What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions
CVPR 2019
0
citations
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
CVPR 2020arXiv
0
citations
Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
CVPR 2020
0
citations
Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning
CVPR 2020arXiv
0
citations
Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only
CVPR 2020arXiv
0
citations
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
CVPR 2020arXiv
0
citations
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
CVPR 2020arXiv
0
citations
Sketch, Ground, and Refine: Top-Down Dense Video Captioning
CVPR 2021
0
citations
Towards Accurate Text-Based Image Captioning With Content Diversity Exploration
CVPR 2021arXiv
0
citations
Jo-SRC: A Contrastive Approach for Combating Noisy Labels
CVPR 2021
0
citations
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
CVPR 2021
0
citations
Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
CVPR 2021arXiv
0
citations
VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
CVPR 2021
0
citations
V2C: Visual Voice Cloning
CVPR 2022arXiv
0
citations
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
CVPR 2022arXiv
0
citations
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
CVPR 2022arXiv
0
citations
Maintaining Reasoning Consistency in Compositional Visual Question Answering
CVPR 2022
0
citations
HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation
CVPR 2022arXiv
0
citations
Learning To Dub Movies via Hierarchical Prosody Models
CVPR 2023arXiv
0
citations
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
CVPR 2023
0
citations
The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
ICCV 2021
0
citations
AerialVLN: Vision-and-Language Navigation for UAVs
ICCV 2023arXiv
0
citations
VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
ICCV 2023
0
citations
Scaling Data Generation in Vision-and-Language Navigation
ICCV 2023arXiv
0
citations
Identity-Consistent Aggregation for Video Object Detection
ICCV 2023arXiv
0
citations
ShapeScaffolder: Structure-Aware 3D Shape Generation from Text
ICCV 2023
0
citations
March in Chat: Interactive Prompting for Remote Embodied Referring Expression
ICCV 2023arXiv
0
citations
NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping
ICCV 2023
0
citations
Soft Expert Reward Learning for Vision-and-Language Navigation
ECCV 2020
0
citations
Length-Controllable Image Captioning
ECCV 2020
0
citations
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
ECCV 2020
0
citations
UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier
ECCV 2022
0
citations
A Simple and Robust Correlation Filtering Method for Text-Based Person Search
ECCV 2022
0
citations
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
ICCV 2023arXiv
0
citations
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
CVPR 2025
0
citations
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
CVPR 2025
0
citations
EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
CVPR 2025
0
citations
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
ICCV 2025
0
citations
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
ICCV 2025
0
citations
MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark
AAAI 2025
0
citations
Realistic Noise Synthesis with Diffusion Models
AAAI 2025
0
citations
Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data
AAAI 2025
0
citations
Augmented Commonsense Knowledge for Remote Object Grounding
AAAI 2024arXiv
0
citations
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning
NeurIPS 2018
0
citations
Cross-sectional Learning of Extremal Dependence among Financial Assets
NeurIPS 2019
0
citations
Language and Visual Entity Relationship Graph for Agent Navigation
NeurIPS 2020
0
citations
Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
NeurIPS 2021
0
citations
Debiased Visual Question Answering from Feature and Sample Perspectives
NeurIPS 2021
0
citations
Learning Distinct and Representative Modes for Image Captioning
NeurIPS 2022
0
citations
LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
NeurIPS 2023
0
citations