Yi Yang

55
Papers
359
Total Citations

Papers (55)

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

CVPR 2024
109
citations

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

CVPR 2024
45
citations

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

ICLR 2024
22
citations

Clustering Propagation for Universal Medical Image Segmentation

CVPR 2024
21
citations

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

CVPR 2024
19
citations

Controllable Navigation Instruction Generation with Chain of Thought Prompting

ECCV 2024arXiv
16
citations

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

ICCV 2025
15
citations

BVINet: Unlocking Blind Video Inpainting with Zero Annotations

ICCV 2025
12
citations

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

CVPR 2025
10
citations

Learning from One Continuous Video Stream

CVPR 2024
10
citations

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

NeurIPS 2025
10
citations

Clustering for Protein Representation Learning

CVPR 2024
8
citations

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy

CVPR 2025
8
citations

Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion

AAAI 2025
8
citations

Scene Map-based Prompt Tuning for Navigation Instruction Generation

CVPR 2025
7
citations

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery

CVPR 2025
7
citations

DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation

CVPR 2025arXiv
5
citations

NeRF Is a Valuable Assistant for 3D Gaussian Splatting

ICCV 2025arXiv
3
citations

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

ICCV 2025
3
citations

From Image to Video: An Empirical Study of Diffusion Representations

ICCV 2025
3
citations

Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models

AAAI 2025
3
citations

GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning

CVPR 2025
3
citations

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

NeurIPS 2025
2
citations

Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation

CVPR 2025
2
citations

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

AAAI 2025
2
citations

LLM Agents Can Be Choice-Supportive Biased Evaluators: An Empirical Study

AAAI 2025
2
citations

TDDBench: A Benchmark for Training data detection

ICLR 2025
1
citations

Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation

ICCV 2025
1
citations

Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration

CVPR 2025
1
citations

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

ECCV 2024
1
citations

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

ICCV 2025
0
citations

BrainGuard: Privacy-Preserving Multisubject Image Reconstructions from Brain Activities

AAAI 2025
0
citations

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

ICML 2024
0
citations

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

AAAI 2024arXiv
0
citations

Stitching Segments and Sentences towards Generalization in Video-Text Pre-training

AAAI 2024
0
citations

Interpretable3D: An Ad

AAAI 2024
0
citations

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs

ICCV 2025
0
citations

Volumetric Environment Representation for Vision-Language Navigation

CVPR 2024
0
citations

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

ICCV 2025
0
citations

TAPNext: Tracking Any Point (TAP) as Next Token Prediction

ICCV 2025
0
citations

Neural Clustering based Visual Representation Learning

CVPR 2024
0
citations

CapHuman: Capture Your Moments in Parallel Universes

CVPR 2024
0
citations

MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting

ICCV 2025
0
citations

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

CVPR 2024
0
citations

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

CVPR 2024
0
citations

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

CVPR 2025
0
citations

MS-DETR: Efficient DETR Training with Mixed Supervision

CVPR 2024
0
citations

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

CVPR 2024
0
citations

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

CVPR 2024
0
citations

Underwater Visual SLAM with Depth Uncertainty and Medium Modeling

ICCV 2025
0
citations

Towards Human-like Virtual Beings: Simulating Human Behavior in 3D Scenes

ICCV 2025
0
citations

Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction

ICCV 2025
0
citations

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment

ICCV 2025
0
citations

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

ICML 2024
0
citations

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

ICCV 2025arXiv
0
citations