Mohamed Elhoseiny
18
Papers
100
Total Citations
Papers (18)
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
28
citations
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
ICLR 2025
23
citations
CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding
ICLR 2024
16
citations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
ECCV 2024
7
citations
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
ICCV 2025
6
citations
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
NeurIPS 2025arXiv
6
citations
ShapeWalk: Compositional Shape Editing Through Language-Guided Chains
CVPR 2024
5
citations
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
NeurIPS 2025
5
citations
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
ICCV 2025
3
citations
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
ICLR 2025
1
citations
Adversarial Text to Continuous Image Generation
CVPR 2024
0
citations
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
ICCV 2025
0
citations
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
CVPR 2025
0
citations
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
ICCV 2025
0
citations
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
ICCV 2025
0
citations
StoryGPT-V: Large Language Models as Consistent Story Visualizers
CVPR 2025
0
citations
Diffusion-Based Imaginative Coordination for Bimanual Manipulation
ICCV 2025
0
citations
Overcoming Generic Knowledge Loss with Selective Parameter Update
CVPR 2024
0
citations