Yi Yang
55
Papers
359
Total Citations
Papers (55)
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
CVPR 2024
109
citations
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
CVPR 2024
45
citations
Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks
ICLR 2024
22
citations
Clustering Propagation for Universal Medical Image Segmentation
CVPR 2024
21
citations
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
CVPR 2024
19
citations
Controllable Navigation Instruction Generation with Chain of Thought Prompting
ECCV 2024arXiv
16
citations
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
ICCV 2025
15
citations
BVINet: Unlocking Blind Video Inpainting with Zero Annotations
ICCV 2025
12
citations
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
CVPR 2025
10
citations
Learning from One Continuous Video Stream
CVPR 2024
10
citations
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
NeurIPS 2025
10
citations
Clustering for Protein Representation Learning
CVPR 2024
8
citations
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
CVPR 2025
8
citations
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
AAAI 2025
8
citations
Scene Map-based Prompt Tuning for Navigation Instruction Generation
CVPR 2025
7
citations
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
CVPR 2025
7
citations
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
CVPR 2025arXiv
5
citations
NeRF Is a Valuable Assistant for 3D Gaussian Splatting
ICCV 2025arXiv
3
citations
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
ICCV 2025
3
citations
From Image to Video: An Empirical Study of Diffusion Representations
ICCV 2025
3
citations
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
AAAI 2025
3
citations
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning
CVPR 2025
3
citations
SparseDiT: Token Sparsification for Efficient Diffusion Transformer
NeurIPS 2025
2
citations
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
CVPR 2025
2
citations
ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning
AAAI 2025
2
citations
LLM Agents Can Be Choice-Supportive Biased Evaluators: An Empirical Study
AAAI 2025
2
citations
TDDBench: A Benchmark for Training data detection
ICLR 2025
1
citations
Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation
ICCV 2025
1
citations
Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration
CVPR 2025
1
citations
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
ECCV 2024
1
citations
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
ICCV 2025
0
citations
BrainGuard: Privacy-Preserving Multisubject Image Reconstructions from Brain Activities
AAAI 2025
0
citations
Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
ICML 2024
0
citations
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
AAAI 2024arXiv
0
citations
Stitching Segments and Sentences towards Generalization in Video-Text Pre-training
AAAI 2024
0
citations
Interpretable3D: An Ad
AAAI 2024
0
citations
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
ICCV 2025
0
citations
Volumetric Environment Representation for Vision-Language Navigation
CVPR 2024
0
citations
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
ICCV 2025
0
citations
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
ICCV 2025
0
citations
Neural Clustering based Visual Representation Learning
CVPR 2024
0
citations
CapHuman: Capture Your Moments in Parallel Universes
CVPR 2024
0
citations
MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting
ICCV 2025
0
citations
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
CVPR 2024
0
citations
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
CVPR 2024
0
citations
SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons
CVPR 2025
0
citations
MS-DETR: Efficient DETR Training with Mixed Supervision
CVPR 2024
0
citations
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
CVPR 2024
0
citations
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
CVPR 2024
0
citations
Underwater Visual SLAM with Depth Uncertainty and Medium Modeling
ICCV 2025
0
citations
Towards Human-like Virtual Beings: Simulating Human Behavior in 3D Scenes
ICCV 2025
0
citations
Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction
ICCV 2025
0
citations
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
ICCV 2025
0
citations
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
ICML 2024
0
citations
Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding
ICCV 2025arXiv
0
citations