Wei Wu

54

Papers

137

Total Citations

Papers (54)

Language-Image Pre-training with Long Captions

Theoretical Benefit and Limitation of Diffusion Language Model

NeurIPS 2025arXiv

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models

ProtCLIP: Function-Informed Protein Multi-Modal Learning

Learning Visual Generative Priors without Text

DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion

SwiftPillars: High-Efficiency Pillar Encoder for Lidar-Based 3D Detection

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

End-to-End Flow Correlation Tracking With Spatial-Temporal Attention

Practical Block-Wise Neural Network Architecture Generation

High Performance Visual Tracking With Siamese Region Proposal Network

Feedback Network for Image Super-Resolution

SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks

IRLAS: Inverse Reinforcement Learning for Architecture Search

Selective Sensor Fusion for Neural Visual-Inertial Odometry

Adaptive Dilated Network With Self-Correction Supervision for Counting

Improving One-Shot NAS by Suppressing the Posterior Fading

Hierarchical Feature Embedding for Attribute Recognition

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Learning Statistical Texture for Semantic Segmentation

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection

Unsupervised Learning of Accurate Siamese Tracking

Learning Video Representations of Human Motion From Synthetic Data

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

LidarGait: Benchmarking 3D Gait Recognition With Point Clouds

STM: SpatioTemporal and Motion Encoding for Action Recognition

Dynamic Curriculum Learning for Imbalanced Data Classification

Online Hyper-Parameter Learning for Auto-Augmentation Strategy

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Incorporating Convolution Designs Into Visual Transformers

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-Trained Vision-Language Models

ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation

Scalable Video Object Segmentation with Simplified Framework

Class-wise Dynamic Graph Convolution for Semantic Segmentation

L-Tracing: Fast Light Visibility Estimation on Neural Surfaces by Sphere Tracing

Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking

AM-LFS: AutoML for Loss Function Search

UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection

InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation

GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-based Transformer

FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers

DynaAct: Large Language Model Reasoning with Dynamic Action Spaces

Causal Inference over Visual-Semantic-Aligned Graph for Image Classification

Synergy of GFlowNet and Protein Language Model Makes a Diverse Antibody Designer

PointCNN: Convolution On X-Transformed Points

Glyce: Glyph-vectors for Chinese Character Representations

Zero-Resource Knowledge-Grounded Dialogue Generation

Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models

Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis