Wei Zhang
46
Papers
386
Total Citations
Papers (46)
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024arXiv
58
citations
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
CVPR 2024
45
citations
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
44
citations
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
ICCV 2025
43
citations
Latent Space Editing in Transformer-Based Flow Matching
AAAI 2024arXiv
38
citations
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
CVPR 2024
32
citations
Language-Driven Anchors for Zero-Shot Adversarial Robustness
CVPR 2024
21
citations
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
CVPR 2024
19
citations
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
CVPR 2025
18
citations
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
CVPR 2025
13
citations
Gaussian Process Neural Additive Models
AAAI 2024arXiv
11
citations
LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement
AAAI 2024arXiv
10
citations
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
CVPR 2024
8
citations
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
ICCV 2025
7
citations
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
NeurIPS 2025
4
citations
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
ICCV 2025
3
citations
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
ICLR 2025
3
citations
Less Attention is More: Prompt Transformer for Generalized Category Discovery
CVPR 2025
3
citations
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
CVPR 2025
2
citations
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
NeurIPS 2025
1
citations
SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination
ICLR 2025
1
citations
Context Guided Transformer Entropy Modeling for Video Compression
ICCV 2025
1
citations
Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On
ICCV 2025
1
citations
Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems
AAAI 2024
0
citations
Decoupled Motion Expression Video Segmentation
CVPR 2025
0
citations
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
CVPR 2025
0
citations
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
ICCV 2025
0
citations
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
ICCV 2025
0
citations
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
ICCV 2025
0
citations
LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation
ICCV 2025
0
citations
General Compression Framework for Efficient Transformer Object Tracking
ICCV 2025
0
citations
Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion
ICCV 2025
0
citations
PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation
AAAI 2025
0
citations
In2NeCT: Inter-class and Intra-class Neural Collapse Tuning for Semantic Segmentation of Imbalanced Remote Sensing Images
AAAI 2025
0
citations
Coherency Improved Explainable Recommendation via Large Language Model
AAAI 2025
0
citations
STAIR: Manipulating Collaborative and Multimodal Information for E-Commerce Recommendation
AAAI 2025
0
citations
CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation
AAAI 2024
0
citations
SFOD: Spiking Fusion Object Detector
CVPR 2024
0
citations
EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
CVPR 2024
0
citations
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
CVPR 2024
0
citations
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
CVPR 2024
0
citations
Event-based Visible and Infrared Fusion via Multi-task Collaboration
CVPR 2024
0
citations
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
CVPR 2024
0
citations
HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion
ICML 2025
0
citations
ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection
ICML 2024
0
citations
Interpreting and Improving Large Language Models in Arithmetic Calculation
ICML 2024
0
citations