Shanghang Zhang

31
Papers
423
Total Citations

Papers (31)

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

ICML 2025
190
citations

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

CVPR 2025
89
citations

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

AAAI 2024arXiv
32
citations

Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation

AAAI 2024
25
citations

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection

AAAI 2024arXiv
22
citations

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

NeurIPS 2025
18
citations

Cloud-Device Collaborative Learning for Multimodal Large Language Models

CVPR 2024
18
citations

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation

CVPR 2025
6
citations

PINNsAgent: Automated PDE Surrogation with Large Language Models

ICML 2025
5
citations

Subgraph Aggregation for Out-of-Distribution Generalization on Graphs

AAAI 2025
4
citations

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

NeurIPS 2025
4
citations

Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

CVPR 2025arXiv
4
citations

4D Visual Pre-training for Robot Learning

ICCV 2025
3
citations

Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression

CVPR 2025
3
citations

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

CVPR 2024
0
citations

Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

ICML 2024
0
citations

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

ICML 2024
0
citations

Compositional Few-Shot Class-Incremental Learning

ICML 2024
0
citations

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

CVPR 2025
0
citations

Segment Any Motion in Videos

CVPR 2025
0
citations

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

ICCV 2025
0
citations

EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting

ICCV 2025
0
citations

Authentic 4D Driving Simulation with a Video Generation Model

ICCV 2025
0
citations

DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework

AAAI 2025
0
citations

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

AAAI 2025
0
citations

Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

AAAI 2024
0
citations

Gradient-based Parameter Selection for Efficient Fine-Tuning

CVPR 2024
0
citations

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

CVPR 2024
0
citations

PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought

CVPR 2024
0
citations

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

CVPR 2024
0
citations

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

CVPR 2024
0
citations