Jiahao Wang

28

Papers

136

Total Citations

Papers (28)

Structure-Aware Sparse-View X-ray 3D Reconstruction

Universal Segmentation at Arbitrary Granularity with Language Instruction

CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

SpotActor: Training-Free Layout-Controlled Consistent Image Generation

SAUI: Scale-Aware Unseen Imagineer for Zero-Shot Object Detection

SceneCrafter: Controllable Multi-View Driving Scene Editing

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

Mamba-Reg: Vision Mamba Also Needs Registers

Imbalance in Balance: Online Concept Balancing in Generation Models

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

NeurIPS 2025arXiv

IWRN:A Robust Blind Watermarking Method for Artwork Image Copyright Protection Against Noise Attack

ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-guided Optimization

CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation

RobustLight: Improving Robustness via Diffusion Reinforcement Learning for Traffic Signal Control

Learning Adaptive Warping for Real-World Rolling Shutter Correction

Accelerating Neural Network Optimization Through an Automated Control Theory Lens

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

Memory-and-Anticipation Transformer for Online Action Understanding

Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape

SAGA: Stochastic Whole-Body Grasping with Contact

Global Spectral Filter Memory Network for Video Object Segmentation

Adder Attention for Vision Transformer

Towards Precise Scaling Laws for Video Diffusion Transformers