Shanghang Zhang

62
Papers
423
Total Citations

Papers (62)

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

ICML 2025
190
citations

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

CVPR 2025
89
citations

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

AAAI 2024arXiv
32
citations

Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation

AAAI 2024
25
citations

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection

AAAI 2024arXiv
22
citations

Cloud-Device Collaborative Learning for Multimodal Large Language Models

CVPR 2024
18
citations

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

NeurIPS 2025
18
citations

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation

CVPR 2025
6
citations

PINNsAgent: Automated PDE Surrogation with Large Language Models

ICML 2025
5
citations

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

NeurIPS 2025
4
citations

Subgraph Aggregation for Out-of-Distribution Generalization on Graphs

AAAI 2025
4
citations

Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

CVPR 2025arXiv
4
citations

4D Visual Pre-training for Robot Learning

ICCV 2025
3
citations

Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression

CVPR 2025
3
citations

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

CVPR 2024
0
citations

Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

ICML 2024
0
citations

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

ICML 2024
0
citations

Compositional Few-Shot Class-Incremental Learning

ICML 2024
0
citations

Understanding Traffic Density From Large-Scale Web Camera Data

CVPR 2017arXiv
0
citations

Learning to Understand Image Blur

CVPR 2018
0
citations

Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation

CVPR 2021arXiv
0
citations

Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation

CVPR 2021arXiv
0
citations

Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts

CVPR 2022arXiv
0
citations

Annealing-Based Label-Transfer Learning for Open World Object Detection

CVPR 2023
0
citations

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

CVPR 2023arXiv
0
citations

Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level

CVPR 2023
0
citations

Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation

CVPR 2023arXiv
0
citations

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks

CVPR 2023
0
citations

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World

CVPR 2023arXiv
0
citations

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

CVPR 2023arXiv
0
citations

FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

ICCV 2017
0
citations

Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency

ICCV 2021arXiv
0
citations

Contrastive Multimodal Fusion With TupleInfoNCE

ICCV 2021arXiv
0
citations

Q-Diffusion: Quantizing Diffusion Models

ICCV 2023
0
citations

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

ICCV 2023arXiv
0
citations

QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection

ICCV 2023arXiv
0
citations

TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning

ECCV 2020
0
citations

Instance Adaptive Self-Training for Unsupervised Domain Adaptation

ECCV 2020
0
citations

MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer

ECCV 2022
0
citations

Efficient Meta-Tuning for Content-Aware Neural Video Delivery

ECCV 2022
0
citations

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

CVPR 2023arXiv
0
citations

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

CVPR 2025
0
citations

Segment Any Motion in Videos

CVPR 2025
0
citations

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

ICCV 2025
0
citations

EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting

ICCV 2025
0
citations

Authentic 4D Driving Simulation with a Video Generation Model

ICCV 2025
0
citations

DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework

AAAI 2025
0
citations

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

AAAI 2025
0
citations

Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

AAAI 2024
0
citations

Gradient-based Parameter Selection for Efficient Fine-Tuning

CVPR 2024
0
citations

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

CVPR 2024
0
citations

PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought

CVPR 2024
0
citations

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

CVPR 2024
0
citations

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

CVPR 2024
0
citations

Adversarial Multiple Source Domain Adaptation

NeurIPS 2018
0
citations

MaCow: Masked Convolutional Generative Flow

NeurIPS 2019
0
citations

Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning

NeurIPS 2019
0
citations

Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks

NeurIPS 2021
0
citations

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

NeurIPS 2022
0
citations

Jump Self-attention: Capturing High-order Statistics in Transformers

NeurIPS 2022
0
citations

Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation

NeurIPS 2022
0
citations

PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection

NeurIPS 2023
0
citations