Most Cited ICCV "arbitrary domain shapes" Papers
2,701 papers found • Page 14 of 14
Conference
Highlight What You Want: Weakly-Supervised Instance-Level Controllable Infrared-Visible Image Fusion
Zeyu Wang, Jizheng Zhang, Haiyu Song et al.
FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
Weijie Lyu, Yi Zhou, Ming-Hsuan Yang et al.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Kyle Genova, Songyou Peng et al.
Blind2Sound: Self-Supervised Image Denoising without Residual Noise
Jiazheng Liu, Zejin Wang, Bohao Chen et al.
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li, Chinthani Sugandhika, Ee Yeo Keat et al.
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim et al.
Privacy-centric Deep Motion Retargeting for Anonymization of Skeleton-Based Motion Visualization
Thomas Carr, Depeng Xu, Shuhan Yuan et al.
UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Yan Wu, Korrawe Karunratanakul, Zhengyi Luo et al.
UniRes: Universal Image Restoration for Complex Degradations
Mo Zhou, Keren Ye, Mauricio Delbracio et al.
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
Chun-Han Yao, Yiming Xie, Vikram Voleti et al.
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Yujie Zhou, Jiazi Bu, Pengyang Ling et al.
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
Ke Fan, Shunlin Lu, Minyue Dai et al.
Group-wise Scaling and Orthogonal Decomposition for Domain-Invariant Feature Extraction in Face Anti-Spoofing
Seungjin Jung, Kanghee Lee, Yonghyun Jeong et al.
DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors
Runqi Wang, Yang Chen, Sijie Xu et al.
DisenQ: Disentangling Q-Former for Activity-Biometrics
Shehreen Azad, Yogesh Rawat
T2Bs: Text-to-Character Blendshapes via Video Generation
Jiahao Luo, Chaoyang Wang, Michael Vasilkovsky et al.
LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Minwoo Choi et al.
MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization
Yiwen Chen, Yikai Wang, Yihao Luo et al.
π-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
Susan Liang, Chao Huang, Yolo Yunlong Tang et al.
SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning
Lanmiao Liu, Esam Ghaleb, asli ozyurek et al.
I2VControl: Disentangled and Unified Video Motion Synthesis Control
Wanquan Feng, Tianhao Qi, Jiawei Liu et al.
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.
LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
Xunpeng Yi, yibing zhang, Xinyu Xiang et al.
MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
Syed Talal Wasim, Hamid Suleman, Olga Zatsarynna et al.
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Taekyung Ki, Dongchan Min, Gyeongsu Chae
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger, Snehal Jauhri, Vignesh Prasad et al.
RayZer: A Self-supervised Large View Synthesis Model
Hanwen Jiang, Hao Tan, Peng Wang et al.
MatchDiffusion: Training-free Generation of Match-Cuts
Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.
Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models
Jianwei Fei, Yunshu Dai, Peipeng Yu et al.
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation
Junyi Wu, Zhiteng Li, Zheng Hui et al.
Tree-NeRV: Efficient Non-Uniform Sampling for Neural Video Representation via Tree-Structured Feature Grids
Jiancheng Zhao, Yifan Zhan, Qingtian Zhu et al.
MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer
Nisha Huang, Henglin Liu, Yizhou Lin et al.
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Kumara Kahatapitiya, Haozhe Liu, Sen He et al.
FlowChef: Steering of Rectified Flow Models for Controlled Generations
Maitreya Patel, Song Wen, Dimitris Metaxas et al.
SynTag: Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking
Han Fang, Kejiang Chen, Zehua Ma et al.
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang, Jun Chen, Dannong Xu et al.
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang, Yuzhang Shang, Zhihang Yuan et al.
Split-and-Combine: Enhancing Style Augmentation for Single Domain Generalization
Zhen Zhang, Zhen Zhang, Qianlong Dang et al.
Zero-Shot Depth Aware Image Editing with Diffusion Models
Rishubh Parihar, Sachidanand VS, Venkatesh Babu Radhakrishnan
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
Yuran Dong, Mang Ye
Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis
Baoyue Hu, Yang Wei, Junhao Xiao et al.
FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process
Bin Fu, Zixuan Wang, Kainan Yan et al.
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
Jiahao Wang, Ning Kang, Lewei Yao et al.
TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control
Zhenyu Yan, Jian Wang, Aoqiang Wang et al.
MCID: Multi-aspect Copyright Infringement Detection for Generated Images
Chuanwei Huang, Zexi Jia, Hongyan Fei et al.
Text2Outfit: Controllable Outfit Generation with Multimodal Language Models
Yuanhao Zhai, Yen-Liang Lin, Minxu Peng et al.
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
Revant Teotia, Candace Ross, Karen Ullrich et al.
Cross-Granularity Online Optimization with Masked Compensated Information for Learned Image Compression
Haowei Kuang, Wenhan Yang, Zongming Guo et al.
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Minghao Fu, Guo-Hua Wang, Xiaohao Chen et al.
CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve et al.
PLA: Prompt Learning Attack against Text-to-Image Generative Models
XINQI LYU, Yihao LIU, Yanjie Li et al.
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
Hengyuan Zhang, Zhe Li, Xingqun Qi et al.
Toward Better Out-painting: Improving the Image Composition with Initialization Policy Model
Xuan Han, Yihao Zhao, Yanhao Ge et al.
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong, David Fan, Jiachen Zhu et al.
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
Zhuoling Li, Haoxuan Qu, Jason Kuen et al.
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos et al.
AM-Adapter: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Siyoon Jin, Jisu Nam, Jiyoung Kim et al.
Diffusion Epistemic Uncertainty with Asymmetric Learning for Diffusion-Generated Image Detection
Yingsong Huang, Hui Guo, Jing Huang et al.
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim, Seokho Ahn, Young-Duk Seo
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim, Wooseok Seo, Junwan Kim et al.
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
Zeyi Sun, Ziyang Chu, Pan Zhang et al.
AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li, Xincheng Shuai, Hao Luo et al.
EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
Zexuan Yan, Yue Ma, Chang Zou et al.
RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
Yuhan Li, Xianfeng Tan, Wenxiang Shang et al.
Instruction-based Image Editing with Planning, Reasoning, and Generation
Liya Ji, Chenyang Qi, Qifeng Chen
HDR Image Generation via Gain Map Decomposed Diffusion
Yuanshen Guan, Ruikang Xu, Yinuo Liao et al.
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee, Kyungho Bae, Kyle Min et al.
Accelerating Diffusion Transformer via Gradient-Optimized Cache
Junxiang Qiu, Lin Liu, Shuo Wang et al.
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
Ruoyu Wang, Huayang Huang, Ye Zhu et al.
Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
Aniruddha Mahapatra, Long Mai, David Bourgin et al.
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu, Linchao Zhu, Yi Yang
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.
DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
Kazuma Nagata, Naoshi Kaneko
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu, Ruonan Yu, Xinchao Wang
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni, Meet Soni, Sirisha Rambhatla
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo, Yawei Li, Taolin Zhang et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu et al.
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde, Zihao Lu, Radu Timofte
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation
You Huang, Lichao Chen, Jiayi Ji et al.
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang, Kun-Yu Lin, Chaolei Tan et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation
Xiaoling Hu, Xiangrui Zeng, Oula Puonti et al.
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko et al.
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
Yefei He, Feng Chen, Jing Liu et al.
FastJSMA: Accelerating Jacobian-based Saliency Map Attacks through Gradient Decoupling
Zhenghao Gao, Shengjie Xu, Zijing Li et al.
Federated Continuous Category Discovery and Learning
Lixu Wang, Chenxi Liu, Junfeng Guo et al.
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
Xiaoqi Wang, Clint Sebastian, Wenbin He et al.
Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong et al.
Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models
Jieun Kim, Jinmyeong Kim, Yoonji Kim et al.
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.