Dinesh Manocha
38
Papers
399
Total Citations
Papers (38)
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
CVPR 2024
354
citations
Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models
NeurIPS 2025
19
citations
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
NeurIPS 2025
13
citations
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
ICCV 2025
6
citations
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
NeurIPS 2025
5
citations
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
ICCV 2025
2
citations
AV-RIR: Audio-Visual Room Impulse Response Estimation
CVPR 2024
0
citations
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
CVPR 2024
0
citations
Position: On the Possibilities of AI-Generated Text Detection
ICML 2024
0
citations
MaxMin-RLHF: Alignment with Diverse Human Preferences
ICML 2024
0
citations
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
ICML 2024
0
citations
A Closer Look at the Limitations of Instruction Tuning
ICML 2024
0
citations
3D Reconstruction in the Presence of Glasses by Acoustic and Stereo Fusion
CVPR 2015
0
citations
TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions
CVPR 2019
0
citations
EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle
CVPR 2020
0
citations
Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality
CVPR 2021arXiv
0
citations
STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes
CVPR 2022arXiv
0
citations
3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos
CVPR 2022arXiv
0
citations
TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering
CVPR 2023arXiv
0
citations
VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation
ICCV 2019
0
citations
HighlightMe: Detecting Highlights From Human-Centric Videos
ICCV 2021arXiv
0
citations
DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
ICCV 2021arXiv
0
citations
Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVIS
ICCV 2021
0
citations
LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference
ICCV 2023arXiv
0
citations
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
ICCV 2023arXiv
0
citations
AdVerb: Visually Guided Audio Dereverberation
ICCV 2023arXiv
0
citations
Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping
ECCV 2020
0
citations
AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points
ECCV 2020
0
citations
A Repulsive Force Unit for Garment Collision Handling in Neural Networks
ECCV 2022
0
citations
D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights
ECCV 2022
0
citations
Human Trajectory Prediction via Neural Social Physics
ECCV 2022
0
citations
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching
CVPR 2025
0
citations
FAR: Fourier Aerial Video Recognition
ECCV 2022
0
citations
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
CVPR 2025
0
citations
IM360: Large-scale Indoor Mapping with 360 Cameras
ICCV 2025
0
citations
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
ICCV 2025
0
citations
RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization
NeurIPS 2025
0
citations
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
CVPR 2024
0
citations