Dahua Lin

133
Papers
3,162
Total Citations

Papers (133)

VBench: Comprehensive Benchmark Suite for Video Generative Models

CVPR 2024
996
citations

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

CVPR 2024
589
citations

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

CVPR 2024
365
citations

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

ICLR 2024
209
citations

Contrastive Learning for Image Captioning

NeurIPS 2017arXiv
203
citations

Recognize Complex Events From Static Images by Fusing Deep Channels

CVPR 2015
127
citations

VideoBooth: Diffusion-based Video Generation with Image Prompts

CVPR 2024
118
citations

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

ICLR 2024
100
citations

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

ECCV 2020
95
citations

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

CVPR 2024
62
citations

Long Context Tuning for Video Generation

ICCV 2025
56
citations

LEGION: Learning to Ground and Explain for Synthetic Image Detection

ICCV 2025
32
citations

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

CVPR 2025
31
citations

Online Multi-modal Person Search in Videos

ECCV 2020
29
citations

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

AAAI 2025
26
citations

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

ICML 2025
21
citations

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

ICLR 2025
18
citations

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

ICLR 2025
15
citations

Learn to Propagate Reliably on Noisy Affinity Graphs

ECCV 2020
13
citations

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs

ICCV 2025arXiv
12
citations

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

CVPR 2025
11
citations

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

ICCV 2025
7
citations

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

NeurIPS 2025
6
citations

Keyframe-Guided Creative Video Inpainting

CVPR 2025
6
citations

Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning

AAAI 2025
6
citations

Multi-identity Human Image Animation with Structural Video Diffusion

ICCV 2025
5
citations

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

ICCV 2025
2
citations

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

ICCV 2025
2
citations

Adapting Object Detectors via Selective Cross-Domain Alignment

CVPR 2019
0
citations

Libra R-CNN: Towards Balanced Learning for Object Detection

CVPR 2019
0
citations

Learning a Unified Classifier Incrementally via Rebalancing

CVPR 2019
0
citations

Self-Supervised Learning via Conditional Motion Propagation

CVPR 2019
0
citations

Learning to Cluster Faces on an Affinity Graph

CVPR 2019
0
citations

Region Proposal by Guided Anchoring

CVPR 2019
0
citations

Hybrid Task Cascade for Instance Segmentation

CVPR 2019
0
citations

IRLAS: Inverse Reinforcement Learning for Architecture Search

CVPR 2019
0
citations

FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding

CVPR 2020arXiv
0
citations

Self-Supervised Scene De-Occlusion

CVPR 2020arXiv
0
citations

Intra- and Inter-Action Understanding via Temporal Action Parsing

CVPR 2020
0
citations

When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks

CVPR 2020arXiv
0
citations

A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation

CVPR 2020arXiv
0
citations

Learning to Cluster Faces via Confidence and Connectivity Estimation

CVPR 2020arXiv
0
citations

DSNAS: Direct Neural Architecture Search Without Parameter Retraining

CVPR 2020arXiv
0
citations

Open Compound Domain Adaptation

CVPR 2020arXiv
0
citations

Prime Sample Attention in Object Detection

CVPR 2020arXiv
0
citations

Visually Informed Binaural Audio Generation without Binaural Audios

CVPR 2021arXiv
0
citations

Scene-Aware Generative Network for Human Motion Synthesis

CVPR 2021arXiv
0
citations

Adversarial Robustness Under Long-Tailed Distribution

CVPR 2021arXiv
0
citations

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

CVPR 2021arXiv
0
citations

Seesaw Loss for Long-Tailed Instance Segmentation

CVPR 2021arXiv
0
citations

Towards Evaluating and Training Verifiably Robust Neural Networks

CVPR 2021arXiv
0
citations

TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition

CVPR 2022arXiv
0
citations

OCSampler: Compressing Videos to One Clip With Single-Step Sampling

CVPR 2022arXiv
0
citations

Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis

CVPR 2022arXiv
0
citations

SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition

CVPR 2022arXiv
0
citations

Revisiting Skeleton-Based Action Recognition

CVPR 2022arXiv
0
citations

Multi-Level Logit Distillation

CVPR 2023
0
citations

OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images

CVPR 2023arXiv
0
citations

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

CVPR 2023
0
citations

Controllable Mesh Generation Through Sparse Latent Point Diffusion Models

CVPR 2023arXiv
0
citations

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

CVPR 2023arXiv
0
citations

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

CVPR 2023
0
citations

Grid-Guided Neural Radiance Fields for Large Urban Scenes

CVPR 2023arXiv
0
citations

Be Your Own Prada: Fashion Synthesis With Structural Coherence

ICCV 2017arXiv
0
citations

Temporal Action Detection With Structured Segment Networks

ICCV 2017arXiv
0
citations

Towards Diverse and Natural Image Descriptions via a Conditional GAN

ICCV 2017arXiv
0
citations

Recursive Visual Sound Separation Using Minus-Plus Net

ICCV 2019
0
citations

CARAFE: Content-Aware ReAssembly of FEatures

ICCV 2019
0
citations

Convolutional Sequence Generation for Skeleton-Based Action Synthesis

ICCV 2019
0
citations

A Graph-Based Framework to Bridge Movies and Synopses

ICCV 2019
0
citations

Online Hyper-Parameter Learning for Auto-Augmentation Strategy

ICCV 2019
0
citations

Vision Transformer With Progressive Sampling

ICCV 2021arXiv
0
citations

BlockPlanner: City Block Generation With Vectorized Graph Representation

ICCV 2021
0
citations

3D Building Reconstruction From Monocular Remote Sensing Images

ICCV 2021
0
citations

MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond

ICCV 2023
0
citations

SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling

ICCV 2023arXiv
0
citations

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

CVPR 2025
0
citations

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering

ICCV 2023
0
citations

Scene as Occupancy

ICCV 2023arXiv
0
citations

AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation

ICCV 2023arXiv
0
citations

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

ICCV 2023arXiv
0
citations

V3Det: Vast Vocabulary Visual Detection Dataset

ICCV 2023arXiv
0
citations

Learning Human Dynamics in Autonomous Driving Scenarios

ICCV 2023
0
citations

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

ECCV 2020
0
citations

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

ECCV 2020
0
citations

Side-Aware Boundary Localization for More Precise Object Detection

ECCV 2020
0
citations

MovieNet: A Holistic Dataset for Movie Understanding

ECCV 2020
0
citations

A Unified Framework for Shot Type Classification Based on Subject Centric Lens

ECCV 2020
0
citations

Motion Guided 3D Pose Estimation from Videos

ECCV 2020
0
citations

Omni-sourced Webly-supervised Learning for Video Recognition

ECCV 2020
0
citations

Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation

ECCV 2020
0
citations

Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations

ECCV 2020
0
citations

Monocular 3D Object Detection with Depth from Motion

ECCV 2022
0
citations

Static and Dynamic Concepts for Self-Supervised Video Representation Learning

ECCV 2022
0
citations

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering

ECCV 2022
0
citations

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

ICCV 2023arXiv
0
citations

Conical Visual Concentration for Efficient Large Vision-Language Models

CVPR 2025
0
citations

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

CVPR 2025
0
citations

MM-IFEngine: Towards Multimodal Instruction Following

ICCV 2025
0
citations

Visual-RFT: Visual Reinforcement Fine-Tuning

ICCV 2025
0
citations

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

ICCV 2025
0
citations

X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting

ICCV 2025
0
citations

Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

NeurIPS 2025
0
citations

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

CVPR 2024
0
citations

OneLLM: One Framework to Align All Modalities with Language

CVPR 2024
0
citations

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

CVPR 2024
0
citations

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

CVPR 2024
0
citations

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

CVPR 2024
0
citations

Towards Text-guided 3D Scene Composition

CVPR 2024
0
citations

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

CVPR 2024
0
citations

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

CVPR 2024
0
citations

MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving

ICML 2024
0
citations

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

ICML 2024
0
citations

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

CVPR 2017arXiv
0
citations

Detecting Visual Relationships With Deep Relational Networks

CVPR 2017arXiv
0
citations

Discover and Learn New Objects From Documentaries

CVPR 2017arXiv
0
citations

UntrimmedNets for Weakly Supervised Action Recognition and Detection

CVPR 2017arXiv
0
citations

Unifying Identification and Context Learning for Person Recognition

CVPR 2018arXiv
0
citations

Unsupervised Feature Learning via Non-Parametric Instance Discrimination

CVPR 2018arXiv
0
citations

Low-Latency Video Semantic Segmentation

CVPR 2018arXiv
0
citations

Learning Globally Optimized Object Detector via Policy Gradient

CVPR 2018
0
citations

Recognize Actions by Disentangling Components of Dynamics

CVPR 2018
0
citations

Optimizing Video Object Detection via a Scale-Time Lattice

CVPR 2018arXiv
0
citations

Trajectory Convolution for Action Recognition

NeurIPS 2018
0
citations

A Neural Compositional Paradigm for Image Captioning

NeurIPS 2018
0
citations

Policy Continuation with Hindsight Inverse Dynamics

NeurIPS 2019
0
citations

Few-Shot Object Detection via Association and DIscrimination

NeurIPS 2021
0
citations

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis

NeurIPS 2021
0
citations

Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

NeurIPS 2021
0
citations

Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant

NeurIPS 2022
0
citations

Audio-Driven Co-Speech Gesture Video Generation

NeurIPS 2022
0
citations

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

NeurIPS 2023
0
citations

POPQORN: Quantifying Robustness of Recurrent Neural Networks

ICML 2019
0
citations