Dahua Lin
133
Papers
3,162
Total Citations
Papers (133)
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
996
citations
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
CVPR 2024
589
citations
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024
365
citations
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
ICLR 2024
209
citations
Contrastive Learning for Image Captioning
NeurIPS 2017arXiv
203
citations
Recognize Complex Events From Static Images by Fusing Deep Channels
CVPR 2015
127
citations
VideoBooth: Diffusion-based Video Generation with Image Prompts
CVPR 2024
118
citations
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
ICLR 2024
100
citations
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
ECCV 2020
95
citations
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
CVPR 2024
62
citations
Long Context Tuning for Video Generation
ICCV 2025
56
citations
LEGION: Learning to Ground and Explain for Synthetic Image Detection
ICCV 2025
32
citations
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025
31
citations
Online Multi-modal Person Search in Videos
ECCV 2020
29
citations
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
AAAI 2025
26
citations
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025
21
citations
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
18
citations
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
ICLR 2025
15
citations
Learn to Propagate Reliably on Noisy Affinity Graphs
ECCV 2020
13
citations
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
ICCV 2025arXiv
12
citations
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
CVPR 2025
11
citations
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
ICCV 2025
7
citations
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
NeurIPS 2025
6
citations
Keyframe-Guided Creative Video Inpainting
CVPR 2025
6
citations
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
AAAI 2025
6
citations
Multi-identity Human Image Animation with Structural Video Diffusion
ICCV 2025
5
citations
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICCV 2025
2
citations
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
ICCV 2025
2
citations
Adapting Object Detectors via Selective Cross-Domain Alignment
CVPR 2019
0
citations
Libra R-CNN: Towards Balanced Learning for Object Detection
CVPR 2019
0
citations
Learning a Unified Classifier Incrementally via Rebalancing
CVPR 2019
0
citations
Self-Supervised Learning via Conditional Motion Propagation
CVPR 2019
0
citations
Learning to Cluster Faces on an Affinity Graph
CVPR 2019
0
citations
Region Proposal by Guided Anchoring
CVPR 2019
0
citations
Hybrid Task Cascade for Instance Segmentation
CVPR 2019
0
citations
IRLAS: Inverse Reinforcement Learning for Architecture Search
CVPR 2019
0
citations
FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding
CVPR 2020arXiv
0
citations
Self-Supervised Scene De-Occlusion
CVPR 2020arXiv
0
citations
Intra- and Inter-Action Understanding via Temporal Action Parsing
CVPR 2020
0
citations
When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks
CVPR 2020arXiv
0
citations
A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation
CVPR 2020arXiv
0
citations
Learning to Cluster Faces via Confidence and Connectivity Estimation
CVPR 2020arXiv
0
citations
DSNAS: Direct Neural Architecture Search Without Parameter Retraining
CVPR 2020arXiv
0
citations
Open Compound Domain Adaptation
CVPR 2020arXiv
0
citations
Prime Sample Attention in Object Detection
CVPR 2020arXiv
0
citations
Visually Informed Binaural Audio Generation without Binaural Audios
CVPR 2021arXiv
0
citations
Scene-Aware Generative Network for Human Motion Synthesis
CVPR 2021arXiv
0
citations
Adversarial Robustness Under Long-Tailed Distribution
CVPR 2021arXiv
0
citations
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
CVPR 2021arXiv
0
citations
Seesaw Loss for Long-Tailed Instance Segmentation
CVPR 2021arXiv
0
citations
Towards Evaluating and Training Verifiably Robust Neural Networks
CVPR 2021arXiv
0
citations
TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition
CVPR 2022arXiv
0
citations
OCSampler: Compressing Videos to One Clip With Single-Step Sampling
CVPR 2022arXiv
0
citations
Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis
CVPR 2022arXiv
0
citations
SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition
CVPR 2022arXiv
0
citations
Revisiting Skeleton-Based Action Recognition
CVPR 2022arXiv
0
citations
Multi-Level Logit Distillation
CVPR 2023
0
citations
OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images
CVPR 2023arXiv
0
citations
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
CVPR 2023
0
citations
Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
CVPR 2023arXiv
0
citations
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
CVPR 2023arXiv
0
citations
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
0
citations
Grid-Guided Neural Radiance Fields for Large Urban Scenes
CVPR 2023arXiv
0
citations
Be Your Own Prada: Fashion Synthesis With Structural Coherence
ICCV 2017arXiv
0
citations
Temporal Action Detection With Structured Segment Networks
ICCV 2017arXiv
0
citations
Towards Diverse and Natural Image Descriptions via a Conditional GAN
ICCV 2017arXiv
0
citations
Recursive Visual Sound Separation Using Minus-Plus Net
ICCV 2019
0
citations
CARAFE: Content-Aware ReAssembly of FEatures
ICCV 2019
0
citations
Convolutional Sequence Generation for Skeleton-Based Action Synthesis
ICCV 2019
0
citations
A Graph-Based Framework to Bridge Movies and Synopses
ICCV 2019
0
citations
Online Hyper-Parameter Learning for Auto-Augmentation Strategy
ICCV 2019
0
citations
Vision Transformer With Progressive Sampling
ICCV 2021arXiv
0
citations
BlockPlanner: City Block Generation With Vectorized Graph Representation
ICCV 2021
0
citations
3D Building Reconstruction From Monocular Remote Sensing Images
ICCV 2021
0
citations
MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond
ICCV 2023
0
citations
SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling
ICCV 2023arXiv
0
citations
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
CVPR 2025
0
citations
DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering
ICCV 2023
0
citations
Scene as Occupancy
ICCV 2023arXiv
0
citations
AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation
ICCV 2023arXiv
0
citations
Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
ICCV 2023arXiv
0
citations
V3Det: Vast Vocabulary Visual Detection Dataset
ICCV 2023arXiv
0
citations
Learning Human Dynamics in Autonomous Driving Scenarios
ICCV 2023
0
citations
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
ECCV 2020
0
citations
Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets
ECCV 2020
0
citations
Side-Aware Boundary Localization for More Precise Object Detection
ECCV 2020
0
citations
MovieNet: A Holistic Dataset for Movie Understanding
ECCV 2020
0
citations
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
ECCV 2020
0
citations
Motion Guided 3D Pose Estimation from Videos
ECCV 2020
0
citations
Omni-sourced Webly-supervised Learning for Video Recognition
ECCV 2020
0
citations
Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation
ECCV 2020
0
citations
Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations
ECCV 2020
0
citations
Monocular 3D Object Detection with Depth from Motion
ECCV 2022
0
citations
Static and Dynamic Concepts for Self-Supervised Video Representation Learning
ECCV 2022
0
citations
BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering
ECCV 2022
0
citations
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
ICCV 2023arXiv
0
citations
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
0
citations
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
0
citations
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
0
citations
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
0
citations
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
0
citations
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
ICCV 2025
0
citations
Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go
NeurIPS 2025
0
citations
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
CVPR 2024
0
citations
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
0
citations
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
CVPR 2024
0
citations
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CVPR 2024
0
citations
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
CVPR 2024
0
citations
Towards Text-guided 3D Scene Composition
CVPR 2024
0
citations
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
CVPR 2024
0
citations
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
CVPR 2024
0
citations
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
ICML 2024
0
citations
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
ICML 2024
0
citations
PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
CVPR 2017arXiv
0
citations
Detecting Visual Relationships With Deep Relational Networks
CVPR 2017arXiv
0
citations
Discover and Learn New Objects From Documentaries
CVPR 2017arXiv
0
citations
UntrimmedNets for Weakly Supervised Action Recognition and Detection
CVPR 2017arXiv
0
citations
Unifying Identification and Context Learning for Person Recognition
CVPR 2018arXiv
0
citations
Unsupervised Feature Learning via Non-Parametric Instance Discrimination
CVPR 2018arXiv
0
citations
Low-Latency Video Semantic Segmentation
CVPR 2018arXiv
0
citations
Learning Globally Optimized Object Detector via Policy Gradient
CVPR 2018
0
citations
Recognize Actions by Disentangling Components of Dynamics
CVPR 2018
0
citations
Optimizing Video Object Detection via a Scale-Time Lattice
CVPR 2018arXiv
0
citations
Trajectory Convolution for Action Recognition
NeurIPS 2018
0
citations
A Neural Compositional Paradigm for Image Captioning
NeurIPS 2018
0
citations
Policy Continuation with Hindsight Inverse Dynamics
NeurIPS 2019
0
citations
Few-Shot Object Detection via Association and DIscrimination
NeurIPS 2021
0
citations
Generative Occupancy Fields for 3D Surface-Aware Image Synthesis
NeurIPS 2021
0
citations
Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion
NeurIPS 2021
0
citations
Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant
NeurIPS 2022
0
citations
Audio-Driven Co-Speech Gesture Video Generation
NeurIPS 2022
0
citations
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
NeurIPS 2023
0
citations
POPQORN: Quantifying Robustness of Recurrent Neural Networks
ICML 2019
0
citations