Most Cited CVPR "hierarchical coordination" Papers
5,589 papers found • Page 4 of 28
Conference
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
seil kang, Jinyeong Kim, Junhyeok Kim et al.
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu et al.
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.
Modular Blind Video Quality Assessment
Wen Wen, Mu Li, Yabin ZHANG et al.
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang, Siyue Yu, Yunchao Wei et al.
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
xin zhang, Jiawei Du, Weiying Xie et al.
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen, Yingwei Pan, haibo yang et al.
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu, Cairong Zhang, Yitong Wang et al.
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si et al.
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang, Liu Liu, Tianheng Cheng et al.
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
Yunhe Gao
SHAP-EDITOR: Instruction-Guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie, Iro Laina et al.
LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu, Mathias Gehrig, Qing Lyu et al.
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
Xiaoyang Wang, Huihui Bai, Limin Yu et al.
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
Empowering LLMs to Understand and Generate Complex Vector Graphics
XiMing Xing, Juncheng Hu, Guotao Liang et al.
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
Chanyoung Kim, Woojung Han, Dayun Ju et al.
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
Woo-Jin Ahn, Geun-Yeong Yang, Hyunduck Choi et al.
Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu, Lingdong Kong, Xiaoyang Wu et al.
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
Jihyun Lee, Shunsuke Saito, Giljoo Nam et al.
Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It
Adam Lilja, Junsheng Fu, Erik Stenborg et al.
Seeing Motion at Nighttime with an Event Camera
Haoyue Liu, Shihan Peng, Lin Zhu et al.
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao, Juze Zhang, Jiashen Du et al.
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman, Noam Rotstein, Roy Ganz et al.
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen, Boyang Sun, Anran Zhang et al.
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
feilong tang, Zhongxing Xu, Zhaojun QU et al.
Single Domain Generalization for Crowd Counting
Zhuoxuan Peng, S.-H. Gary Chan
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
Zhihe Yang, Xufang Luo, Dongqi Han et al.
Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection
Wei Luo, Yunkang Cao, Haiming Yao et al.
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
Sheng Zhou, Junbin Xiao, Qingyun Li et al.
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang, Fan Tang, Yong Zhang et al.
Self-Supervised Facial Representation Learning with Facial Region Awareness
Zheng Gao, Ioannis Patras
A Simple Baseline for Efficient Hand Mesh Reconstruction
zhishan zhou, shihao zhou, Zhi Lv et al.
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Ruihai Wu, Haoran Lu, Yiyan Wang et al.
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.
Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Junxi Chen, Liang Li, Li Su et al.
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.
Unified Language-driven Zero-shot Domain Adaptation
Senqiao Yang, Zhuotao Tian, Li Jiang et al.
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
Wenqiao Zhang, Zheqi Lv
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Jiaxin Zhang, Dezhi Peng, Chongyu Liu et al.
OmniViD: A Generative Framework for Universal Video Understanding
Junke Wang, Dongdong Chen, Chong Luo et al.
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Vishwesh Nath, Wenqi Li, Dong Yang et al.
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan-Bac Nguyen et al.
Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis
Sunghwan Hong, Jaewoo Jung, Heeseong Shin et al.
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri et al.
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Hao Li, Dingwen Zhang, Yalun Dai et al.
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
Jinseok Kim, Tae-Kyun Kim
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He, Qiaochu Huang, Zhensong Zhang et al.
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li, Xian Liu, Anil Kag et al.
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni, Yulin Wang, Renping Zhou et al.
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Lihe Ding, Shaocong Dong, Zhanpeng Huang et al.
LaneCPP: Continuous 3D Lane Detection using Physical Priors
Maximilian Pittner, Joel Janai, Alexandru Paul Condurache
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan, Dongyue Wu, Guilin Zhu et al.
DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion
Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.
Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection
Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker et al.
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.
Distilling Multi-modal Large Language Models for Autonomous Driving
Deepti Hegde, Rajeev Yasarla, Hong Cai et al.
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li, Jinglin Xu, Yunzhen Zhao et al.
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari et al.
Retrieval-Augmented Embodied Agents
Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu, Wentong Li, Song Wang et al.
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
Shuming Liu, Chen Zhao, Tianqi Xu et al.
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Zhongwei Ren, Yunchao Wei, Xun Guo et al.
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang, Yuzhe Qin, Kaiming Kuang et al.
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
Zhejun Zhang, Peter Karkus, Maximilian Igl et al.
Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts
Jiayi Chen, Benteng Ma, Hengfei Cui et al.
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang, Difan Liu, Yan Kang et al.
Blind Image Quality Assessment Based on Geometric Order Learning
Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao, Bo Wan, Ying Zhang et al.
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu, Yikun Liu, Ferenas et al.
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
Mingzhen Sun, Weining Wang, Li et al.
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
Dale Decatur, Itai Lang, Kfir Aberman et al.
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting
Hyunwoo Park, Gun Ryu, Wonjun Kim
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Le Zhang, Rabiul Awal, Aishwarya Agrawal
Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.
Light3R-SfM: Towards Feed-forward Structure-from-Motion
Sven Elflein, Qunjie Zhou, Laura Leal-Taixe
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Xiangyang Zhu, Renrui Zhang, Bowei He et al.
Erasing Undesirable Influence in Diffusion Models
Jing Wu, Trung Le, Munawar Hayat et al.
Sparse Global Matching for Video Frame Interpolation with Large Motion
Chunxu Liu, Guozhen Zhang, Rui Zhao et al.
FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
Geunhyuk Youk, Jihyong Oh, Munchurl Kim
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng, Huangying Zhan, Zheng Chen et al.
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Ying Chen, Guoan Wang, Yuanfeng Ji et al.
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen, Pha Nguyen, Khoa Luu
Multi-modal Learning for Geospatial Vegetation Forecasting
Vitus Benson, Claire Robin, Christian Requena-Mesa et al.
Estimating Body and Hand Motion in an Ego‑sensed World
Brent Yi, Vickie Ye, Maya Zheng et al.
Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning
HAO ZHANG, Linfeng Tang, Xinyu Xiang et al.
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
Beomyoung Kim, Joonsang Yu, Sung Ju Hwang
Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification
Kunlun Xu, Xu Zou, Yuxin Peng et al.
Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering
Cheng Sun, Jaesung Choe, Charles Loop et al.
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Junlong Cheng, Bin Fu, Jin Ye et al.
StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation
Sidi Wu, Yizi Chen, Loic Landrieu et al.
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Zuoyue Li, Zhenqiang Li, Zhaopeng Cui et al.
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Ruoxi Shi, Xinyue Wei, Cheng Wang et al.
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi, Peihao Chen, Junyan Li et al.
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang, qiuyu Huang, Junjie Liu et al.
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Yuming Gu, Hongyi Xu, You Xie et al.
TEA: Test-time Energy Adaptation
Yige Yuan, Bingbing Xu, Liang Hou et al.
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
Yiran Xu, Taesung Park, Richard Zhang et al.
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
Xinshun Wang, Zhongbin Fang, Xia Li et al.
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen, Lin Gu, Liang Li et al.
Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
Rui Zhao, Ruiqin Xiong, Jing Zhao et al.
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao, Haori Lu, Linlan Huang et al.
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.
PhysGen3D: Crafting a Miniature Interactive World from a Single Image
Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.
CPR: Retrieval Augmented Generation for Copyright Protection
Aditya Golatkar, Alessandro Achille, Luca Zancato et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Ce Zhang, Simon Stepputtis, Joseph Campbell et al.
The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding
Lorenzo Bianchi, Fabio Carrara, Nicola Messina et al.
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu, Yingwei Li, Nan Liu et al.
MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Honghua Chen, Chen Change Loy, Xingang Pan
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu, Haoyu Liu, Hongming Fu et al.
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
MoDE: CLIP Data Experts via Clustering
Jiawei Ma, Po-Yao Huang, Saining Xie et al.
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li, Zhelun Shi, Xuhao Hu et al.
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
Junyan Ye, Qiyan Luo, Jinhua Yu et al.
Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
Yun Liu, Chengwen Zhang, Ruofan Xing et al.
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking
Wei Cao, Chang Luo, Biao Zhang et al.
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
Siyuan Liang, Jiawei Liang, Tianyu Pang et al.
AffordDP: Generalizable Diffusion Policy with Transferable Affordance
Shijie Wu, Yihang Zhu, Yunao Huang et al.
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Xue Song, Jiequan Cui, Hanwang Zhang et al.
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
Qiyuan Dai, Sibei Yang
Adversarial Diffusion Compression for Real-World Image Super-Resolution
Bin Chen, Gehui Li, Rongyuan Wu et al.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.
Multi-Object Tracking in the Dark
Xinzhe Wang, Kang Ma, Qiankun Liu et al.
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu et al.
AutoPresent: Designing Structured Visuals from Scratch
Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.
Interleaved-Modal Chain-of-Thought
Jun Gao, Yongqi Li, Ziqiang Cao et al.
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
Lue Fan, Hao ZHANG, Qitai Wang et al.
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
Ziyu Shan, Yujie Zhang, Qi Yang et al.
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
AJ Piergiovanni, Isaac Noble, Dahun Kim et al.
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang, Dong In Lee, MinHyuk Jang et al.
Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li, Bhavan Jasani, Peng Tang et al.
FineVQ: Fine-Grained User Generated Content Video Quality Assessment
Huiyu Duan, Qiang Hu, Wang Jiarui et al.
Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
Xianghui Xie, Bharat Lal Bhatnagar, Jan Lenssen et al.
MagicQuill: An Intelligent Interactive Image Editing System
Zichen Liu, Yue Yu, Hao Ouyang et al.
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
Fei Ni, Jianye Hao, Shiguang Wu et al.
Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation
Yuan Wang, Rui Sun, Naisong Luo et al.
Federated Generalized Category Discovery
Nan Pu, Wenjing Li, Xinyuan Ji et al.
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
Jonas Herzog
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu, Jintong Li, Yicheng Jiang et al.
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee, Xiaohang Li, Qifei Wang et al.
Learning Correlation Structures for Vision Transformers
Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid et al.
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang, hongzhen wang, Zonghao Guo et al.
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Zhen Qu, Xian Tao, Xinyi Gong et al.
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Jihyun Kim, Changjae Oh, Hoseok Do et al.
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.
360+x: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen, Yuqi Hou, Chenyuan Qu et al.
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das et al.
Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang, Xinyu Xiong, Jie Ma et al.
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu, Jie Liu, Jie Tang et al.
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang, Ziwei Zheng, Boxu Chen et al.
Training-Free Pretrained Model Merging
Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang
VkD: Improving Knowledge Distillation using Orthogonal Projections
Roy Miles, Ismail Elezi, Jiankang Deng
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
Zijie Chen, Lichao Zhang, Fangsheng Weng et al.
Face2Diffusion for Fast and Editable Face Personalization
Kaede Shiohara, Toshihiko Yamasaki
Supervised Anomaly Detection for Complex Industrial Images
Aimira Baitieva, David Hurych, Victor Besnier et al.
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
Yuwei Tang, ZhenYi Lin, Qilong Wang et al.
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Zaid Khan, Yun Fu
Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha, Ankit Jha, Shirsha Bose et al.
MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models
Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu, Chen Jia, Fan Shi et al.
Garment Recovery with Shape and Deformation Priors
Ren Li, Corentin Dumery, Benoît Guillard et al.
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou, Xu Gao, Zichong Chen et al.
EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang et al.
Tyche: Stochastic In-Context Learning for Medical Image Segmentation
Marianne Rakic, Hallee Wong, Jose Javier Gonzalez Ortiz et al.
Diffusion Time-step Curriculum for One Image to 3D Generation
YI Xuanyu, Zike Wu, Qingshan Xu et al.
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang, Zeyu Lu et al.
AnimateAnything: Consistent and Controllable Animation for Video Generation
guojun lei, Chi Wang, Rong Zhang et al.
MANUS: Markerless Grasp Capture using Articulated 3D Gaussians
Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li, Weijian Ma, Xueyang Li et al.
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features
Andre Rochow, Max Schwarz, Sven Behnke
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen, Yusuke Hirota, Mayu Otani et al.
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang, Biao Gong, Yutong Feng et al.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.
Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
Yujeong Chae, Hyeonseong Kim, Kuk-Jin Yoon
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou, Yicong Liu, Yiman Hu et al.
CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
Sangyun Shin, Kaichen Zhou, Madhu Vankadari et al.