Mohit Bansal
22
Papers
167
Total Citations
Papers (22)
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
NeurIPS 2025arXiv
28
citations
Self-Consistency Preference Optimization
ICML 2025
23
citations
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
ICLR 2024
20
citations
CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
ICCV 2025
19
citations
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
ICLR 2025
15
citations
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
ICLR 2024
13
citations
Unbounded: A Generative Infinite Game of Character Life Simulation
ICLR 2025
12
citations
VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation
AAAI 2024arXiv
10
citations
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
9
citations
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
ICLR 2025
7
citations
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
NeurIPS 2025arXiv
6
citations
LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits
NeurIPS 2025
5
citations
Position: TrustLLM: Trustworthiness in Large Language Models
ICML 2024
0
citations
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
CVPR 2025
0
citations
VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
ICCV 2025
0
citations
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
ICCV 2025
0
citations
Multimodal Representation Learning by Alternating Unimodal Adaptation
CVPR 2024
0
citations
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
CVPR 2024
0
citations
Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse Prompts
CVPR 2024
0
citations
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
ICML 2024
0
citations
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
ICML 2024
0
citations
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
CVPR 2025
0
citations