Wei Zhang

46
Papers
386
Total Citations

Papers (46)

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

AAAI 2024arXiv
58
citations

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

CVPR 2024
45
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

ICCV 2025
43
citations

Latent Space Editing in Transformer-Based Flow Matching

AAAI 2024arXiv
38
citations

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

CVPR 2024
32
citations

Language-Driven Anchors for Zero-Shot Adversarial Robustness

CVPR 2024
21
citations

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

CVPR 2024
19
citations

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

CVPR 2025
18
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

CVPR 2025
13
citations

Gaussian Process Neural Additive Models

AAAI 2024arXiv
11
citations

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

AAAI 2024arXiv
10
citations

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

CVPR 2024
8
citations

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

ICCV 2025
7
citations

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

NeurIPS 2025
4
citations

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

ICCV 2025
3
citations

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

ICLR 2025
3
citations

Less Attention is More: Prompt Transformer for Generalized Category Discovery

CVPR 2025
3
citations

EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting

CVPR 2025
2
citations

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

NeurIPS 2025
1
citations

SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination

ICLR 2025
1
citations

Context Guided Transformer Entropy Modeling for Video Compression

ICCV 2025
1
citations

Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On

ICCV 2025
1
citations

Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems

AAAI 2024
0
citations

Decoupled Motion Expression Video Segmentation

CVPR 2025
0
citations

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

CVPR 2025
0
citations

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

ICCV 2025
0
citations

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

ICCV 2025
0
citations

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

ICCV 2025
0
citations

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation

ICCV 2025
0
citations

General Compression Framework for Efficient Transformer Object Tracking

ICCV 2025
0
citations

Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion

ICCV 2025
0
citations

PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation

AAAI 2025
0
citations

In2NeCT: Inter-class and Intra-class Neural Collapse Tuning for Semantic Segmentation of Imbalanced Remote Sensing Images

AAAI 2025
0
citations

Coherency Improved Explainable Recommendation via Large Language Model

AAAI 2025
0
citations

STAIR: Manipulating Collaborative and Multimodal Information for E-Commerce Recommendation

AAAI 2025
0
citations

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

AAAI 2024
0
citations

SFOD: Spiking Fusion Object Detector

CVPR 2024
0
citations

EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling

CVPR 2024
0
citations

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

CVPR 2024
0
citations

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

CVPR 2024
0
citations

Event-based Visible and Infrared Fusion via Multi-task Collaboration

CVPR 2024
0
citations

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

CVPR 2024
0
citations

HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion

ICML 2025
0
citations

ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection

ICML 2024
0
citations

Interpreting and Improving Large Language Models in Arithmetic Calculation

ICML 2024
0
citations