Zihao Wang

25

Papers

55

Total Citations

Papers (25)

Where am I? Cross-View Geo-localization with Natural Language Descriptions

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Learning Hierarchical Polynomials with Three-Layer Neural Networks

Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing

OnePose: One-Shot Object Pose Estimation Without CAD Models

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

Learning Transformation-Predictive Representations for Detection and Description of Local Features

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

Weakly-supervised 3D Shape Completion in the Wild

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap

Transforming and Combining Rewards for Aligning Large Language Models

Open-World Skill Discovery from Unsegmented Demonstration Videos

MSV-PCT: Multi-Sparse-View Enhanced Transformer Framework for Salient Object Detection in Point Clouds

ESEG: Event-Based Segmentation Boosted by Explicit Edge-Semantic Guidance

ProAgent: Building Proactive Cooperative Agents with Large Language Models

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Posterior Collapse of a Linear Latent Variable Model

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents

Concept Algebra for (Score-Based) Text-Controlled Generative Models

Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks