Jitendra Malik

94
Papers
3,257
Total Citations

Papers (94)

Hypercolumns for Object Segmentation and Fine-Grained Localization

CVPR 2015
1,630
citations

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

NeurIPS 2016arXiv
595
citations

Learning a Multi-View Stereo Machine

NeurIPS 2017arXiv
572
citations

Sequential Modeling Enables Scalable Learning for Large Vision Models

CVPR 2024
230
citations

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

ECCV 2020
163
citations

Estimating Body and Hand Motion in an Ego‑sensed World

CVPR 2025
27
citations

An Empirical Study of Autoregressive Pre-training from Videos

ICCV 2025
15
citations

Scaling Properties of Diffusion Models For Perceptual Tasks

CVPR 2025
15
citations

Reconstructing People, Places, and Cameras

CVPR 2025
10
citations

Depth From Shading, Defocus, and Correspondence Using Light-Field Angular Coherence

CVPR 2015
0
citations

Category-Specific Object Reconstruction From a Single Image

CVPR 2015
0
citations

Virtual View Networks for Object Reconstruction

CVPR 2015
0
citations

Learning to Segment Moving Objects in Videos

CVPR 2015
0
citations

Aligning 3D Models to RGB-D Images of Cluttered Scenes

CVPR 2015
0
citations

Cross Modal Distillation for Supervision Transfer

CVPR 2016
0
citations

Iterative Instance Segmentation

CVPR 2016
0
citations

Human Pose Estimation With Iterative Error Feedback

CVPR 2016
0
citations

Feedback Networks

CVPR 2017arXiv
0
citations

Cognitive Mapping and Planning for Visual Navigation

CVPR 2017arXiv
0
citations

Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency

CVPR 2017arXiv
0
citations

Learning Shape Abstractions by Assembling Volumetric Primitives

CVPR 2017arXiv
0
citations

Factoring Shape, Pose, and Layout From the 2D Image of a 3D Scene

CVPR 2018arXiv
0
citations

Multi-View Consistency as Supervisory Signal for Learning Shape and Pose Prediction

CVPR 2018arXiv
0
citations

Taskonomy: Disentangling Task Transfer Learning

CVPR 2018arXiv
0
citations

From Lifestyle Vlogs to Everyday Interactions

CVPR 2018arXiv
0
citations

AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions

CVPR 2018arXiv
0
citations

End-to-End Recovery of Human Shape and Pose

CVPR 2018arXiv
0
citations

Gibson Env: Real-World Perception for Embodied Agents

CVPR 2018arXiv
0
citations

Learning Individual Styles of Conversational Gesture

CVPR 2019
0
citations

Learning Independent Object Motion From Unlabelled Stereoscopic Videos

CVPR 2019
0
citations

Learning 3D Human Dynamics From Video

CVPR 2019
0
citations

Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors

CVPR 2019
0
citations

Robust Learning Through Cross-Task Consistency

CVPR 2020arXiv
0
citations

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

CVPR 2022arXiv
0
citations

ABO: Dataset and Benchmarks for Real-World 3D Object Understanding

CVPR 2022arXiv
0
citations

Human Mesh Recovery From Multiple Shots

CVPR 2022arXiv
0
citations

Tracking People by Predicting 3D Appearance, Location and Pose

CVPR 2022
0
citations

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

CVPR 2022arXiv
0
citations

Reversible Vision Transformers

CVPR 2022
0
citations

PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning

CVPR 2022
0
citations

Coupling Vision and Proprioception for Navigation of Legged Robots

CVPR 2022arXiv
0
citations

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

CVPR 2022arXiv
0
citations

Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering

CVPR 2022arXiv
0
citations

Ego4D: Around the World in 3,000 Hours of Egocentric Video

CVPR 2022
0
citations

On the Benefits of 3D Pose and Tracking for Human Action Recognition

CVPR 2023arXiv
0
citations

Decoupling Human and Camera Motion From Videos in the Wild

CVPR 2023arXiv
0
citations

Multiview Compressive Coding for 3D Reconstruction

CVPR 2023arXiv
0
citations

Learning to See by Moving

ICCV 2015
0
citations

Pose Induction for Novel Object Categories

ICCV 2015
0
citations

Amodal Completion and Size Constancy in Natural Scenes

ICCV 2015
0
citations

Contextual Action Recognition With R*CNN

ICCV 2015
0
citations

Actions and Attributes From Wholes and Parts

ICCV 2015
0
citations

DeepBox: Learning Objectness With Convolutional Networks

ICCV 2015
0
citations

What Will Happen Next? Forecasting Player Moves in Sports Videos

ICCV 2017
0
citations

Diverse Image Synthesis From Semantic Layouts via Conditional IMLE

ICCV 2019
0
citations

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

ICCV 2019
0
citations

SlowFast Networks for Video Recognition

ICCV 2019
0
citations

Predicting 3D Human Dynamics From Video

ICCV 2019
0
citations

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

ICCV 2019
0
citations

Habitat: A Platform for Embodied AI Research

ICCV 2019
0
citations

Mesh R-CNN

ICCV 2019
0
citations

From Goals, Waypoints & Paths to Long Term Human Trajectory Forecasting

ICCV 2021arXiv
0
citations

Multiscale Vision Transformers

ICCV 2021arXiv
0
citations

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans

ICCV 2021arXiv
0
citations

Reconstructing Hand-Object Interactions in the Wild

ICCV 2021arXiv
0
citations

Humans in 4D: Reconstructing and Tracking Humans with Transformers

ICCV 2023arXiv
0
citations

Navigating to Objects Specified by Images

ICCV 2023arXiv
0
citations

Long-term Human Motion Prediction with Scene Context

ECCV 2020
0
citations

It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction

ECCV 2020
0
citations

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

ECCV 2020
0
citations

Shape and Viewpoint without Keypoints

ECCV 2020
0
citations

Recurrent Network Models for Human Dynamics

ICCV 2015
0
citations

Poly-Autoregressive Prediction for Modeling Interactions

CVPR 2025
0
citations

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

CVPR 2024
0
citations

Reconstructing Hands in 3D with Transformers

CVPR 2024
0
citations

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

CVPR 2024
0
citations

xT: Nested Tokenization for Larger Context in Large Images

ICML 2024
0
citations

Deformable Part Models are Convolutional Neural Networks

CVPR 2015
0
citations

Finding Action Tubes

CVPR 2015
0
citations

Viewpoints and Keypoints

CVPR 2015
0
citations

Visual Memory for Robust Path Following

NeurIPS 2018
0
citations

Approximate Feature Collisions in Neural Nets

NeurIPS 2019
0
citations

3D Shape Reconstruction from Vision and Touch

NeurIPS 2020
0
citations

Habitat 2.0: Training Home Assistants to Rearrange their Habitat

NeurIPS 2021
0
citations

SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

NeurIPS 2021
0
citations

Active 3D Shape Reconstruction from Vision and Touch

NeurIPS 2021
0
citations

Tracking People with 3D Representations

NeurIPS 2021
0
citations

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

NeurIPS 2022
0
citations

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

NeurIPS 2023
0
citations

MAViL: Masked Audio-Video Learners

NeurIPS 2023
0
citations

Speculative Decoding with Big Little Decoder

NeurIPS 2023
0
citations

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

NeurIPS 2023
0
citations

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

ICML 2016
0
citations

Fast k-Nearest Neighbour Search via Prioritized DCI

ICML 2017
0
citations