Mohamed Elhoseiny

18
Papers
100
Total Citations

Papers (18)

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

ICCV 2025
28
citations

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

ICLR 2025
23
citations

CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding

ICLR 2024
16
citations

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations

ECCV 2024
7
citations

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

ICCV 2025
6
citations

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

NeurIPS 2025arXiv
6
citations

ShapeWalk: Compositional Shape Editing Through Language-Guided Chains

CVPR 2024
5
citations

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

NeurIPS 2025
5
citations

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description

ICCV 2025
3
citations

ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

ICLR 2025
1
citations

Adversarial Text to Continuous Image Generation

CVPR 2024
0
citations

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

ICCV 2025
0
citations

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

CVPR 2025
0
citations

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation

ICCV 2025
0
citations

4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding

ICCV 2025
0
citations

StoryGPT-V: Large Language Models as Consistent Story Visualizers

CVPR 2025
0
citations

Diffusion-Based Imaginative Coordination for Bimanual Manipulation

ICCV 2025
0
citations

Overcoming Generic Knowledge Loss with Selective Parameter Update

CVPR 2024
0
citations