Jiajun Wu

106
Papers
3,708
Total Citations
1
Affiliations

Affiliations

Stanford University

Papers (106)

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

NeurIPS 2016arXiv
2,081
citations

MarrNet: 3D Shape Reconstruction via 2.5D Sketches

NeurIPS 2017arXiv
435
citations

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

NeurIPS 2016arXiv
417
citations

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

CVPR 2024
192
citations

Self-Supervised Intrinsic Image Decomposition

NeurIPS 2017arXiv
141
citations

WonderWorld: Interactive 3D Scene Generation from a Single Image

CVPR 2025
120
citations

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

CVPR 2024
85
citations

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

CVPR 2025
44
citations

Learning the 3D Fauna of the Web

CVPR 2024
42
citations

Re-thinking Temporal Search for Long-Form Video Understanding

CVPR 2025
36
citations

Shape and Material from Sound

NeurIPS 2017
33
citations

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

CVPR 2025
15
citations

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

CVPR 2024
14
citations

Language-Informed Visual Concept Learning

ICLR 2024
12
citations

FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video

CVPR 2025
11
citations

Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos

ECCV 2024
10
citations

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

CVPR 2024
9
citations

Birth and Death of a Rose

CVPR 2025
5
citations

PGC: Physics-Based Gaussian Cloth from a Single Pose

CVPR 2025
3
citations

Taming generative video models for zero-shot optical flow extraction

NeurIPS 2025
3
citations

Perspective Plane Program Induction From a Single Image

CVPR 2020arXiv
0
citations

End-to-End Optimization of Scene Layout

CVPR 2020arXiv
0
citations

Probabilistic Video Prediction From Noisy Data With a Posterior Confidence

CVPR 2020
0
citations

Hierarchical Motion Understanding via Motion Programs

CVPR 2021arXiv
0
citations

Repopulating Street Scenes

CVPR 2021arXiv
0
citations

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

CVPR 2021arXiv
0
citations

Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

CVPR 2021
0
citations

De-Rendering the World's Revolutionary Artefacts

CVPR 2021
0
citations

Rotationally Equivariant 3D Object Detection

CVPR 2022arXiv
0
citations

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

CVPR 2022
0
citations

Programmatic Concept Learning for Human Motion Description and Synthesis

CVPR 2022
0
citations

Revisiting the "Video" in Video-Language Understanding

CVPR 2022
0
citations

Ego-Body Pose Estimation via Ego-Head Pose Estimation

CVPR 2023arXiv
0
citations

NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

CVPR 2023arXiv
0
citations

Multi-Object Manipulation via Object-Centric Neural Scattering Functions

CVPR 2023
0
citations

Seeing a Rose in Five Thousand Ways

CVPR 2023arXiv
0
citations

Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes

CVPR 2023arXiv
0
citations

3D Neural Field Generation Using Triplane Diffusion

CVPR 2023arXiv
0
citations

RealImpact: A Dataset of Impact Sound Fields for Real Objects

CVPR 2023
0
citations

Accidental Light Probes

CVPR 2023arXiv
0
citations

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

CVPR 2023
0
citations

The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects

CVPR 2023
0
citations

CIRCLE: Capture in Rich Contextual Environments

CVPR 2023
0
citations

PyPose: A Library for Robot Learning With Physics-Based Optimization

CVPR 2023arXiv
0
citations

Generative Modeling of Audible Shapes for Object Perception

ICCV 2017
0
citations

Raster-To-Vector: Revisiting Floorplan Transformation

ICCV 2017
0
citations

Program-Guided Image Manipulators

ICCV 2019
0
citations

Neural Radiance Flow for 4D View Synthesis and Video Processing

ICCV 2021arXiv
0
citations

3D Shape Generation and Completion Through Point-Voxel Diffusion

ICCV 2021arXiv
0
citations

Learning Temporal Dynamics From Cycles in Narrated Video

ICCV 2021arXiv
0
citations

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

ICCV 2023arXiv
0
citations

Tree-Structured Shading Decomposition

ICCV 2023arXiv
0
citations

Rendering Humans from Object-Occluded Monocular Videos

ICCV 2023arXiv
0
citations

Video Extrapolation in Space and Time

ECCV 2022
0
citations

Unsupervised Segmentation in Real-World Images via Spelke Object Inference

ECCV 2022
0
citations

Translating a Visual LEGO Manual to a Machine-Executable Plan

ECCV 2022
0
citations

Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning

NeurIPS 2015
0
citations

Learning to See Physics via Visual De-animation

NeurIPS 2017
0
citations

WonderJourney: Going from Anywhere to Everywhere

CVPR 2024
0
citations

Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset

CVPR 2025
0
citations

Lifting Motion to the 3D World via 2D Diffusion

CVPR 2025
0
citations

Category-Agnostic Neural Object Rigging

CVPR 2025
0
citations

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

CVPR 2025
0
citations

X-Capture: An Open-Source Portable Device for Multi-Sensory Learning

ICCV 2025
0
citations

Weakly-Supervised Learning of Dense Functional Correspondences

ICCV 2025
0
citations

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

ICCV 2025
0
citations

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

ICCV 2025
0
citations

WorldScore: Unified Evaluation Benchmark for World Generation

ICCV 2025
0
citations

HVAdam: A Full-Dimension Adaptive Optimizer

AAAI 2025
0
citations

SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

AAAI 2024
0
citations

Hearing Anything Anywhere

CVPR 2024
0
citations

Holodeck: Language Guided Generation of 3D Embodied AI Environments

CVPR 2024
0
citations

Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

ICML 2024
0
citations

Deep Multiple Instance Learning for Image Classification and Auto-Annotation

CVPR 2015
0
citations

Neural Scene De-Rendering

CVPR 2017
0
citations

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks

CVPR 2017
0
citations

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

CVPR 2018arXiv
0
citations

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

CVPR 2018arXiv
0
citations

Learning to Reconstruct Shapes from Unseen Classes

NeurIPS 2018
0
citations

Learning to Exploit Stability for 3D Scene Parsing

NeurIPS 2018
0
citations

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

NeurIPS 2018
0
citations

3D-Aware Scene Manipulation via Inverse Graphics

NeurIPS 2018
0
citations

Visual Object Networks: Image Generation with Disentangled 3D Representations

NeurIPS 2018
0
citations

Visual Concept-Metaconcept Learning

NeurIPS 2019
0
citations

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

NeurIPS 2019
0
citations

Learning Physical Graph Representations from Visual Scenes

NeurIPS 2020
0
citations

Multi-Plane Program Induction with 3D Box Priors

NeurIPS 2020
0
citations

Grammar-Based Grounded Lexicon Learning

NeurIPS 2021
0
citations

MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing

NeurIPS 2022
0
citations

CLEVRER-Humans: Describing Physical and Causal Events the Human Way

NeurIPS 2022
0
citations

E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance

NeurIPS 2022
0
citations

Interaction Modeling with Multiplex Attention

NeurIPS 2022
0
citations

IKEA-Manual: Seeing Shape Assembly Step by Step

NeurIPS 2022
0
citations

Unsupervised Learning of Shape Programs with Repeatable Implicit Parts

NeurIPS 2022
0
citations

Geoclidean: Few-Shot Generalization in Euclidean Geometry

NeurIPS 2022
0
citations

Model-Based Control with Sparse Neural Dynamics

NeurIPS 2023
0
citations

3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

NeurIPS 2023
0
citations

What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

NeurIPS 2023
0
citations

Siamese Masked Autoencoders

NeurIPS 2023
0
citations

Are These the Same Apple? Comparing Images Based on Object Intrinsics

NeurIPS 2023
0
citations

Disentanglement via Latent Quantization

NeurIPS 2023
0
citations

Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

NeurIPS 2023
0
citations

SoundCam: A Dataset for Finding Humans Using Room Acoustics

NeurIPS 2023
0
citations

Inferring Hybrid Neural Fluid Fields from Videos

NeurIPS 2023
0
citations

Holistic Evaluation of Text-to-Image Models

NeurIPS 2023
0
citations

Neurally-Guided Structure Inference

ICML 2019
0
citations