Chen Zhao

44

Papers

247

Total Citations

Papers (44)

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

TexOct: Generating Textures of 3D Models with Octree-based Diffusion

Towards Automated Movie Trailer Generation

Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

Auto-Regressively Generating Multi-View Consistent Images

SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search

TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer

Metric-Agnostic Continual Learning for Sustainable Group Fairness

Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration

OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses

From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences

G-TAD: Sub-Graph Localization for Temporal Action Detection

Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Large-Capacity and Flexible Video Steganography via Invertible Neural Network

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Open Set Action Recognition via Multi-Label Evidential Learning

Video Self-Stitching Graph Network for Temporal Action Localization

Progressive Correspondence Pruning by Consensus Learning

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Learning Semantic Neural Tree for Human Parsing

Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction

Fusing Local Similarities for Retrieval-Based 3D Orientation Estimation of Unseen Objects

Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

End-to-End Active Speaker Detection