Shanghang Zhang
31
Papers
423
Total Citations
Papers (31)
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
ICML 2025
190
citations
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
CVPR 2025
89
citations
Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
AAAI 2024arXiv
32
citations
Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation
AAAI 2024
25
citations
FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection
AAAI 2024arXiv
22
citations
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
NeurIPS 2025
18
citations
Cloud-Device Collaborative Learning for Multimodal Large Language Models
CVPR 2024
18
citations
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
CVPR 2025
6
citations
PINNsAgent: Automated PDE Surrogation with Large Language Models
ICML 2025
5
citations
Subgraph Aggregation for Out-of-Distribution Generalization on Graphs
AAAI 2025
4
citations
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
NeurIPS 2025
4
citations
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025arXiv
4
citations
4D Visual Pre-training for Robot Learning
ICCV 2025
3
citations
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
CVPR 2025
3
citations
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
CVPR 2024
0
citations
Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting
ICML 2024
0
citations
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
ICML 2024
0
citations
Compositional Few-Shot Class-Incremental Learning
ICML 2024
0
citations
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
CVPR 2025
0
citations
Segment Any Motion in Videos
CVPR 2025
0
citations
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
ICCV 2025
0
citations
EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting
ICCV 2025
0
citations
Authentic 4D Driving Simulation with a Video Generation Model
ICCV 2025
0
citations
DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework
AAAI 2025
0
citations
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
0
citations
Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection
AAAI 2024
0
citations
Gradient-based Parameter Selection for Efficient Fine-Tuning
CVPR 2024
0
citations
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
CVPR 2024
0
citations
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
CVPR 2024
0
citations
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
CVPR 2024
0
citations
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
CVPR 2024
0
citations