Dinesh Manocha

38
Papers
399
Total Citations

Papers (38)

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

CVPR 2024
354
citations

Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models

NeurIPS 2025
19
citations

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

NeurIPS 2025
13
citations

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

ICCV 2025
6
citations

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

NeurIPS 2025
5
citations

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

ICCV 2025
2
citations

AV-RIR: Audio-Visual Room Impulse Response Estimation

CVPR 2024
0
citations

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

CVPR 2024
0
citations

Position: On the Possibilities of AI-Generated Text Detection

ICML 2024
0
citations

MaxMin-RLHF: Alignment with Diverse Human Preferences

ICML 2024
0
citations

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

ICML 2024
0
citations

A Closer Look at the Limitations of Instruction Tuning

ICML 2024
0
citations

3D Reconstruction in the Presence of Glasses by Acoustic and Stereo Fusion

CVPR 2015
0
citations

TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions

CVPR 2019
0
citations

EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle

CVPR 2020
0
citations

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality

CVPR 2021arXiv
0
citations

STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes

CVPR 2022arXiv
0
citations

3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos

CVPR 2022arXiv
0
citations

TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering

CVPR 2023arXiv
0
citations

VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation

ICCV 2019
0
citations

HighlightMe: Detecting Highlights From Human-Centric Videos

ICCV 2021arXiv
0
citations

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

ICCV 2021arXiv
0
citations

Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVIS

ICCV 2021
0
citations

LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference

ICCV 2023arXiv
0
citations

CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition

ICCV 2023arXiv
0
citations

AdVerb: Visually Guided Audio Dereverberation

ICCV 2023arXiv
0
citations

Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping

ECCV 2020
0
citations

AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points

ECCV 2020
0
citations

A Repulsive Force Unit for Garment Collision Handling in Neural Networks

ECCV 2022
0
citations

D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights

ECCV 2022
0
citations

Human Trajectory Prediction via Neural Social Physics

ECCV 2022
0
citations

EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

CVPR 2025
0
citations

FAR: Fourier Aerial Video Recognition

ECCV 2022
0
citations

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

CVPR 2025
0
citations

IM360: Large-scale Indoor Mapping with 360 Cameras

ICCV 2025
0
citations

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

ICCV 2025
0
citations

RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization

NeurIPS 2025
0
citations

LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering

CVPR 2024
0
citations