Jitendra Malik
94
Papers
3,257
Total Citations
Papers (94)
Hypercolumns for Object Segmentation and Fine-Grained Localization
CVPR 2015
1,630
citations
Learning to Poke by Poking: Experiential Learning of Intuitive Physics
NeurIPS 2016arXiv
595
citations
Learning a Multi-View Stereo Machine
NeurIPS 2017arXiv
572
citations
Sequential Modeling Enables Scalable Learning for Large Vision Models
CVPR 2024
230
citations
Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild
ECCV 2020
163
citations
Estimating Body and Hand Motion in an Ego‑sensed World
CVPR 2025
27
citations
An Empirical Study of Autoregressive Pre-training from Videos
ICCV 2025
15
citations
Scaling Properties of Diffusion Models For Perceptual Tasks
CVPR 2025
15
citations
Reconstructing People, Places, and Cameras
CVPR 2025
10
citations
Depth From Shading, Defocus, and Correspondence Using Light-Field Angular Coherence
CVPR 2015
0
citations
Category-Specific Object Reconstruction From a Single Image
CVPR 2015
0
citations
Virtual View Networks for Object Reconstruction
CVPR 2015
0
citations
Learning to Segment Moving Objects in Videos
CVPR 2015
0
citations
Aligning 3D Models to RGB-D Images of Cluttered Scenes
CVPR 2015
0
citations
Cross Modal Distillation for Supervision Transfer
CVPR 2016
0
citations
Iterative Instance Segmentation
CVPR 2016
0
citations
Human Pose Estimation With Iterative Error Feedback
CVPR 2016
0
citations
Feedback Networks
CVPR 2017arXiv
0
citations
Cognitive Mapping and Planning for Visual Navigation
CVPR 2017arXiv
0
citations
Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency
CVPR 2017arXiv
0
citations
Learning Shape Abstractions by Assembling Volumetric Primitives
CVPR 2017arXiv
0
citations
Factoring Shape, Pose, and Layout From the 2D Image of a 3D Scene
CVPR 2018arXiv
0
citations
Multi-View Consistency as Supervisory Signal for Learning Shape and Pose Prediction
CVPR 2018arXiv
0
citations
Taskonomy: Disentangling Task Transfer Learning
CVPR 2018arXiv
0
citations
From Lifestyle Vlogs to Everyday Interactions
CVPR 2018arXiv
0
citations
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
CVPR 2018arXiv
0
citations
End-to-End Recovery of Human Shape and Pose
CVPR 2018arXiv
0
citations
Gibson Env: Real-World Perception for Embodied Agents
CVPR 2018arXiv
0
citations
Learning Individual Styles of Conversational Gesture
CVPR 2019
0
citations
Learning Independent Object Motion From Unlabelled Stereoscopic Videos
CVPR 2019
0
citations
Learning 3D Human Dynamics From Video
CVPR 2019
0
citations
Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors
CVPR 2019
0
citations
Robust Learning Through Cross-Task Consistency
CVPR 2020arXiv
0
citations
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
CVPR 2022arXiv
0
citations
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
CVPR 2022arXiv
0
citations
Human Mesh Recovery From Multiple Shots
CVPR 2022arXiv
0
citations
Tracking People by Predicting 3D Appearance, Location and Pose
CVPR 2022
0
citations
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
CVPR 2022arXiv
0
citations
Reversible Vision Transformers
CVPR 2022
0
citations
PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning
CVPR 2022
0
citations
Coupling Vision and Proprioception for Navigation of Legged Robots
CVPR 2022arXiv
0
citations
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
CVPR 2022arXiv
0
citations
Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering
CVPR 2022arXiv
0
citations
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
0
citations
On the Benefits of 3D Pose and Tracking for Human Action Recognition
CVPR 2023arXiv
0
citations
Decoupling Human and Camera Motion From Videos in the Wild
CVPR 2023arXiv
0
citations
Multiview Compressive Coding for 3D Reconstruction
CVPR 2023arXiv
0
citations
Learning to See by Moving
ICCV 2015
0
citations
Pose Induction for Novel Object Categories
ICCV 2015
0
citations
Amodal Completion and Size Constancy in Natural Scenes
ICCV 2015
0
citations
Contextual Action Recognition With R*CNN
ICCV 2015
0
citations
Actions and Attributes From Wholes and Parts
ICCV 2015
0
citations
DeepBox: Learning Objectness With Convolutional Networks
ICCV 2015
0
citations
What Will Happen Next? Forecasting Player Moves in Sports Videos
ICCV 2017
0
citations
Diverse Image Synthesis From Semantic Layouts via Conditional IMLE
ICCV 2019
0
citations
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
ICCV 2019
0
citations
SlowFast Networks for Video Recognition
ICCV 2019
0
citations
Predicting 3D Human Dynamics From Video
ICCV 2019
0
citations
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
ICCV 2019
0
citations
Habitat: A Platform for Embodied AI Research
ICCV 2019
0
citations
Mesh R-CNN
ICCV 2019
0
citations
From Goals, Waypoints & Paths to Long Term Human Trajectory Forecasting
ICCV 2021arXiv
0
citations
Multiscale Vision Transformers
ICCV 2021arXiv
0
citations
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans
ICCV 2021arXiv
0
citations
Reconstructing Hand-Object Interactions in the Wild
ICCV 2021arXiv
0
citations
Humans in 4D: Reconstructing and Tracking Humans with Transformers
ICCV 2023arXiv
0
citations
Navigating to Objects Specified by Images
ICCV 2023arXiv
0
citations
Long-term Human Motion Prediction with Scene Context
ECCV 2020
0
citations
It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction
ECCV 2020
0
citations
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
ECCV 2020
0
citations
Shape and Viewpoint without Keypoints
ECCV 2020
0
citations
Recurrent Network Models for Human Dynamics
ICCV 2015
0
citations
Poly-Autoregressive Prediction for Modeling Interactions
CVPR 2025
0
citations
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
CVPR 2024
0
citations
Reconstructing Hands in 3D with Transformers
CVPR 2024
0
citations
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
0
citations
xT: Nested Tokenization for Larger Context in Large Images
ICML 2024
0
citations
Deformable Part Models are Convolutional Neural Networks
CVPR 2015
0
citations
Finding Action Tubes
CVPR 2015
0
citations
Viewpoints and Keypoints
CVPR 2015
0
citations
Visual Memory for Robust Path Following
NeurIPS 2018
0
citations
Approximate Feature Collisions in Neural Nets
NeurIPS 2019
0
citations
3D Shape Reconstruction from Vision and Touch
NeurIPS 2020
0
citations
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
NeurIPS 2021
0
citations
SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency
NeurIPS 2021
0
citations
Active 3D Shape Reconstruction from Vision and Touch
NeurIPS 2021
0
citations
Tracking People with 3D Representations
NeurIPS 2021
0
citations
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
NeurIPS 2022
0
citations
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
NeurIPS 2023
0
citations
MAViL: Masked Audio-Video Learners
NeurIPS 2023
0
citations
Speculative Decoding with Big Little Decoder
NeurIPS 2023
0
citations
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
NeurIPS 2023
0
citations
Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing
ICML 2016
0
citations
Fast k-Nearest Neighbour Search via Prioritized DCI
ICML 2017
0
citations