Song-Chun Zhu

81
Papers
151
Total Citations

Papers (81)

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

CVPR 2024
45
citations

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

ICLR 2025
37
citations

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

CVPR 2025
18
citations

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

CVPR 2025
17
citations

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

ICLR 2024
17
citations

Neural-Symbolic Recursive Machine for Systematic Generalization

ICLR 2024
14
citations

Differentiable Information Enhanced Model-Based Reinforcement Learning

AAAI 2025
3
citations

Joint Action Recognition and Pose Estimation From Video

CVPR 2015
0
citations

Recognizing Car Fluents From Video

CVPR 2016
0
citations

Inferring Forces and Learning Human Utilities From Videos

CVPR 2016
0
citations

Multi-View People Tracking via Hierarchical Trajectory Composition

CVPR 2016
0
citations

Mining Object Parts From CNNs via Active Question-Answering

CVPR 2017arXiv
0
citations

CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

CVPR 2017arXiv
0
citations

Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet

CVPR 2017arXiv
0
citations

A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

CVPR 2018arXiv
0
citations

Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification

CVPR 2018
0
citations

Human-Centric Indoor Scene Synthesis Using Stochastic Grammar

CVPR 2018arXiv
0
citations

Inferring Shared Attention in Social Scene Videos

CVPR 2018
0
citations

Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks

CVPR 2018
0
citations

Learning Descriptor Networks for 3D Shape Synthesis and Analysis

CVPR 2018arXiv
0
citations

Interpretable Convolutional Neural Networks

CVPR 2018arXiv
0
citations

Learning Generative ConvNets via Multi-Grid Modeling and Sampling

CVPR 2018arXiv
0
citations

RAVEN: A Dataset for Relational and Analogical Visual REasoNing

CVPR 2019
0
citations

Reasoning Visual Dialogs With Structural and Partial Observations

CVPR 2019
0
citations

Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model

CVPR 2019
0
citations

Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network

CVPR 2019
0
citations

Joint Training of Variational Auto-Encoder and Latent Energy-Based Model

CVPR 2020arXiv
0
citations

Inducing Hierarchical Compositional Model by Sparsifying Generator Network

CVPR 2020arXiv
0
citations

Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification

CVPR 2021arXiv
0
citations

ACRE: Abstract Causal REasoning Beyond Covariation

CVPR 2021arXiv
0
citations

Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis

CVPR 2021arXiv
0
citations

Learning Triadic Belief Dynamics in Nonverbal Communication From Videos

CVPR 2021arXiv
0
citations

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

CVPR 2021arXiv
0
citations

Diffusion-Based Generation, Optimization, and Planning in 3D Scenes

CVPR 2023arXiv
0
citations

Mining And-Or Graphs for Graph Matching and Object Discovery

ICCV 2015
0
citations

Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose

ICCV 2015
0
citations

Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face

ICCV 2015
0
citations

Predicting Human Activities Using Stochastic Grammar

ICCV 2017arXiv
0
citations

Jointly Recognizing Object Fluents and Tasks in Egocentric Videos

ICCV 2017
0
citations

Monocular 3D Human Pose Estimation by Predicting Depth on Joints

ICCV 2017
0
citations

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

ICCV 2019
0
citations

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

ICCV 2019
0
citations

Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense

ICCV 2019
0
citations

YouRefIt: Embodied Reference Understanding With Language and Gesture

ICCV 2021arXiv
0
citations

Spatio-Temporal Self-Supervised Representation Learning for 3D Point Clouds

ICCV 2021arXiv
0
citations

VLGrammar: Grounded Grammar Induction of Vision and Language

ICCV 2021arXiv
0
citations

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

ICCV 2023
0
citations

ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes

ICCV 2023arXiv
0
citations

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

ECCV 2020
0
citations

Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference

ECCV 2020
0
citations

LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities

ECCV 2020
0
citations

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

ECCV 2022
0
citations

Generative Hierarchical Learning of Sparse FRAME Models

CVPR 2017
0
citations

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

CVPR 2025
0
citations

Social World Model-Augmented Mechanism Design Policy Learning

NeurIPS 2025
0
citations

ProAgent: Building Proactive Cooperative Agents with Large Language Models

AAAI 2024
0
citations

An Embodied Generalist Agent in 3D World

ICML 2024
0
citations

Fast Peer Adaptation with Context-aware Exploration

ICML 2024
0
citations

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

ICML 2024
0
citations

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

NeurIPS 2018
0
citations

Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model

NeurIPS 2019
0
citations

Learning Perceptual Inference by Contrasting

NeurIPS 2019
0
citations

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

NeurIPS 2019
0
citations

Learning Latent Space Energy-Based Prior Model

NeurIPS 2020
0
citations

Robust Visual Reasoning via Language Guided Neural Module Networks

NeurIPS 2021
0
citations

Unsupervised Foreground Extraction via Deep Region Competition

NeurIPS 2021
0
citations

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling

NeurIPS 2021
0
citations

Iterative Teacher-Aware Learning

NeurIPS 2021arXiv
0
citations

Learning Probabilistic Models from Generator Latent Spaces with Hat EBM

NeurIPS 2022
0
citations

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

NeurIPS 2022
0
citations

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

NeurIPS 2022
0
citations

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

NeurIPS 2022
0
citations

Emergent Graphical Conventions in a Visual Communication Game

NeurIPS 2022
0
citations

MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control

NeurIPS 2022
0
citations

Learning non-Markovian Decision-Making from State-only Sequences

NeurIPS 2023
0
citations

Evaluating and Inducing Personality in Pre-trained Language Models

NeurIPS 2023
0
citations

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

NeurIPS 2023
0
citations

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

NeurIPS 2023
0
citations

Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning

NeurIPS 2023
0
citations

A Theory of Generative ConvNet

ICML 2016
0
citations

Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction

ICML 2018
0
citations