Jiajun Wu

106

Papers

3,708

Total Citations

1

Affiliations

Affiliations

Stanford University

Papers (106)

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

NeurIPS 2016arXiv

MarrNet: 3D Shape Reconstruction via 2.5D Sketches

NeurIPS 2017arXiv

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

NeurIPS 2016arXiv

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

Self-Supervised Intrinsic Image Decomposition

NeurIPS 2017arXiv

WonderWorld: Interactive 3D Scene Generation from a Single Image

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Learning the 3D Fauna of the Web

Re-thinking Temporal Search for Long-Form Video Understanding

Shape and Material from Sound

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Language-Informed Visual Concept Learning

FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video

Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Birth and Death of a Rose

PGC: Physics-Based Gaussian Cloth from a Single Pose

Taming generative video models for zero-shot optical flow extraction

Perspective Plane Program Induction From a Single Image

End-to-End Optimization of Scene Layout

Probabilistic Video Prediction From Noisy Data With a Posterior Confidence

Hierarchical Motion Understanding via Motion Programs

Repopulating Street Scenes

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

De-Rendering the World's Revolutionary Artefacts

Rotationally Equivariant 3D Object Detection

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Programmatic Concept Learning for Human Motion Description and Synthesis

Revisiting the "Video" in Video-Language Understanding

Ego-Body Pose Estimation via Ego-Head Pose Estimation

NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

Multi-Object Manipulation via Object-Centric Neural Scattering Functions

Seeing a Rose in Five Thousand Ways

Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes

3D Neural Field Generation Using Triplane Diffusion

RealImpact: A Dataset of Impact Sound Fields for Real Objects

Accidental Light Probes

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects

CIRCLE: Capture in Rich Contextual Environments

PyPose: A Library for Robot Learning With Physics-Based Optimization

Generative Modeling of Audible Shapes for Object Perception

Raster-To-Vector: Revisiting Floorplan Transformation

Program-Guided Image Manipulators

Neural Radiance Flow for 4D View Synthesis and Video Processing

3D Shape Generation and Completion Through Point-Voxel Diffusion

Learning Temporal Dynamics From Cycles in Narrated Video

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

Tree-Structured Shading Decomposition

Rendering Humans from Object-Occluded Monocular Videos

Video Extrapolation in Space and Time

Unsupervised Segmentation in Real-World Images via Spelke Object Inference

Translating a Visual LEGO Manual to a Machine-Executable Plan

Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning

Learning to See Physics via Visual De-animation

WonderJourney: Going from Anywhere to Everywhere

Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset

Lifting Motion to the 3D World via 2D Diffusion

Category-Agnostic Neural Object Rigging

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

X-Capture: An Open-Source Portable Device for Multi-Sensory Learning

Weakly-Supervised Learning of Dense Functional Correspondences

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

WorldScore: Unified Evaluation Benchmark for World Generation

HVAdam: A Full-Dimension Adaptive Optimizer

SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

Hearing Anything Anywhere

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

Deep Multiple Instance Learning for Image Classification and Auto-Annotation

Neural Scene De-Rendering

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Learning to Reconstruct Shapes from Unseen Classes

Learning to Exploit Stability for 3D Scene Parsing

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

3D-Aware Scene Manipulation via Inverse Graphics

Visual Object Networks: Image Generation with Disentangled 3D Representations

Visual Concept-Metaconcept Learning

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

Learning Physical Graph Representations from Visual Scenes

Multi-Plane Program Induction with 3D Box Priors

Grammar-Based Grounded Lexicon Learning

MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing

CLEVRER-Humans: Describing Physical and Causal Events the Human Way

E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance

Interaction Modeling with Multiplex Attention

IKEA-Manual: Seeing Shape Assembly Step by Step

Unsupervised Learning of Shape Programs with Repeatable Implicit Parts

Geoclidean: Few-Shot Generalization in Euclidean Geometry

Model-Based Control with Sparse Neural Dynamics

3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

Siamese Masked Autoencoders

Are These the Same Apple? Comparing Images Based on Object Intrinsics

Disentanglement via Latent Quantization

Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

SoundCam: A Dataset for Finding Humans Using Room Acoustics

Inferring Hybrid Neural Fluid Fields from Videos

Holistic Evaluation of Text-to-Image Models

Neurally-Guided Structure Inference