Song-Chun Zhu
81
Papers
151
Total Citations
Papers (81)
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
CVPR 2024
45
citations
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
ICLR 2025
37
citations
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
CVPR 2025
18
citations
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
CVPR 2025
17
citations
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
ICLR 2024
17
citations
Neural-Symbolic Recursive Machine for Systematic Generalization
ICLR 2024
14
citations
Differentiable Information Enhanced Model-Based Reinforcement Learning
AAAI 2025
3
citations
Joint Action Recognition and Pose Estimation From Video
CVPR 2015
0
citations
Recognizing Car Fluents From Video
CVPR 2016
0
citations
Inferring Forces and Learning Human Utilities From Videos
CVPR 2016
0
citations
Multi-View People Tracking via Hierarchical Trajectory Composition
CVPR 2016
0
citations
Mining Object Parts From CNNs via Active Question-Answering
CVPR 2017arXiv
0
citations
CERN: Confidence-Energy Recurrent Network for Group Activity Recognition
CVPR 2017arXiv
0
citations
Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet
CVPR 2017arXiv
0
citations
A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects
CVPR 2018arXiv
0
citations
Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification
CVPR 2018
0
citations
Human-Centric Indoor Scene Synthesis Using Stochastic Grammar
CVPR 2018arXiv
0
citations
Inferring Shared Attention in Social Scene Videos
CVPR 2018
0
citations
Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks
CVPR 2018
0
citations
Learning Descriptor Networks for 3D Shape Synthesis and Analysis
CVPR 2018arXiv
0
citations
Interpretable Convolutional Neural Networks
CVPR 2018arXiv
0
citations
Learning Generative ConvNets via Multi-Grid Modeling and Sampling
CVPR 2018arXiv
0
citations
RAVEN: A Dataset for Relational and Analogical Visual REasoNing
CVPR 2019
0
citations
Reasoning Visual Dialogs With Structural and Partial Observations
CVPR 2019
0
citations
Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model
CVPR 2019
0
citations
Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network
CVPR 2019
0
citations
Joint Training of Variational Auto-Encoder and Latent Energy-Based Model
CVPR 2020arXiv
0
citations
Inducing Hierarchical Compositional Model by Sparsifying Generator Network
CVPR 2020arXiv
0
citations
Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification
CVPR 2021arXiv
0
citations
ACRE: Abstract Causal REasoning Beyond Covariation
CVPR 2021arXiv
0
citations
Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis
CVPR 2021arXiv
0
citations
Learning Triadic Belief Dynamics in Nonverbal Communication From Videos
CVPR 2021arXiv
0
citations
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
CVPR 2021arXiv
0
citations
Diffusion-Based Generation, Optimization, and Planning in 3D Scenes
CVPR 2023arXiv
0
citations
Mining And-Or Graphs for Graph Matching and Object Discovery
ICCV 2015
0
citations
Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose
ICCV 2015
0
citations
Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face
ICCV 2015
0
citations
Predicting Human Activities Using Stochastic Grammar
ICCV 2017arXiv
0
citations
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos
ICCV 2017
0
citations
Monocular 3D Human Pose Estimation by Predicting Depth on Joints
ICCV 2017
0
citations
Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
ICCV 2019
0
citations
DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare
ICCV 2019
0
citations
Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense
ICCV 2019
0
citations
YouRefIt: Embodied Reference Understanding With Language and Gesture
ICCV 2021arXiv
0
citations
Spatio-Temporal Self-Supervised Representation Learning for 3D Point Clouds
ICCV 2021arXiv
0
citations
VLGrammar: Grounded Grammar Induction of Vision and Language
ICCV 2021arXiv
0
citations
X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
ICCV 2023
0
citations
ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes
ICCV 2023arXiv
0
citations
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
ECCV 2020
0
citations
Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference
ECCV 2020
0
citations
LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities
ECCV 2020
0
citations
Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning
ECCV 2022
0
citations
Generative Hierarchical Learning of Sparse FRAME Models
CVPR 2017
0
citations
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
CVPR 2025
0
citations
Social World Model-Augmented Mechanism Design Policy Learning
NeurIPS 2025
0
citations
ProAgent: Building Proactive Cooperative Agents with Large Language Models
AAAI 2024
0
citations
An Embodied Generalist Agent in 3D World
ICML 2024
0
citations
Fast Peer Adaptation with Context-aware Exploration
ICML 2024
0
citations
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
ICML 2024
0
citations
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation
NeurIPS 2018
0
citations
Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
NeurIPS 2019
0
citations
Learning Perceptual Inference by Contrasting
NeurIPS 2019
0
citations
PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points
NeurIPS 2019
0
citations
Learning Latent Space Energy-Based Prior Model
NeurIPS 2020
0
citations
Robust Visual Reasoning via Language Guided Neural Module Networks
NeurIPS 2021
0
citations
Unsupervised Foreground Extraction via Deep Region Competition
NeurIPS 2021
0
citations
On Path Integration of Grid Cells: Group Representation and Isotropic Scaling
NeurIPS 2021
0
citations
Iterative Teacher-Aware Learning
NeurIPS 2021arXiv
0
citations
Learning Probabilistic Models from Generator Latent Spaces with Hat EBM
NeurIPS 2022
0
citations
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
NeurIPS 2022
0
citations
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
NeurIPS 2022
0
citations
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning
NeurIPS 2022
0
citations
Emergent Graphical Conventions in a Visual Communication Game
NeurIPS 2022
0
citations
MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control
NeurIPS 2022
0
citations
Learning non-Markovian Decision-Making from State-only Sequences
NeurIPS 2023
0
citations
Evaluating and Inducing Personality in Pre-trained Language Models
NeurIPS 2023
0
citations
Learning Energy-Based Prior Model with Diffusion-Amortized MCMC
NeurIPS 2023
0
citations
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
NeurIPS 2023
0
citations
Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning
NeurIPS 2023
0
citations
A Theory of Generative ConvNet
ICML 2016
0
citations
Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction
ICML 2018
0
citations