Most Cited CVPR "topological feature localization" Papers
5,589 papers found • Page 4 of 28
Conference
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
Jihyun Lee, Shunsuke Saito, Giljoo Nam et al.
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
Chanyoung Kim, Woojung Han, Dayun Ju et al.
Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu, Lingdong Kong, Xiaoyang Wu et al.
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.
Modular Blind Video Quality Assessment
Wen Wen, Mu Li, Yabin ZHANG et al.
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen, Yingwei Pan, haibo yang et al.
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu, Cairong Zhang, Yitong Wang et al.
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang, Liu Liu, Tianheng Cheng et al.
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
Xiaoyang Wang, Huihui Bai, Limin Yu et al.
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
Empowering LLMs to Understand and Generate Complex Vector Graphics
XiMing Xing, Juncheng Hu, Guotao Liang et al.
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
Yunhe Gao
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si et al.
LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu, Mathias Gehrig, Qing Lyu et al.
SHAP-EDITOR: Instruction-Guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie, Iro Laina et al.
Seeing Motion at Nighttime with an Event Camera
Haoyue Liu, Shihan Peng, Lin Zhu et al.
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
Woo-Jin Ahn, Geun-Yeong Yang, Hyunduck Choi et al.
Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It
Adam Lilja, Junsheng Fu, Erik Stenborg et al.
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Jiaxin Zhang, Dezhi Peng, Chongyu Liu et al.
OmniViD: A Generative Framework for Universal Video Understanding
Junke Wang, Dongdong Chen, Chong Luo et al.
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao, Juze Zhang, Jiashen Du et al.
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman, Noam Rotstein, Roy Ganz et al.
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen, Boyang Sun, Anran Zhang et al.
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
feilong tang, Zhongxing Xu, Zhaojun QU et al.
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
Zhihe Yang, Xufang Luo, Dongqi Han et al.
Single Domain Generalization for Crowd Counting
Zhuoxuan Peng, S.-H. Gary Chan
Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection
Wei Luo, Yunkang Cao, Haiming Yao et al.
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang, Fan Tang, Yong Zhang et al.
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
Sheng Zhou, Junbin Xiao, Qingyun Li et al.
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang, Siyue Yu, Yunchao Wei et al.
A Simple Baseline for Efficient Hand Mesh Reconstruction
zhishan zhou, shihao zhou, Zhi Lv et al.
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Ruihai Wu, Haoran Lu, Yiyan Wang et al.
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.
Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Junxi Chen, Liang Li, Li Su et al.
Unified Language-driven Zero-shot Domain Adaptation
Senqiao Yang, Zhuotao Tian, Li Jiang et al.
VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment
Phong Tran, Egor Zakharov, Long Nhat Ho et al.
Self-Supervised Facial Representation Learning with Facial Region Awareness
Zheng Gao, Ioannis Patras
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Vishwesh Nath, Wenqi Li, Dong Yang et al.
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan-Bac Nguyen et al.
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
Wenqiao Zhang, Zheqi Lv
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li, Xian Liu, Anil Kag et al.
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri et al.
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
Jinseok Kim, Tae-Kyun Kim
Retrieval-Augmented Embodied Agents
Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He, Qiaochu Huang, Zhensong Zhang et al.
Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis
Sunghwan Hong, Jaewoo Jung, Heeseong Shin et al.
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Hao Li, Dingwen Zhang, Yalun Dai et al.
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Lihe Ding, Shaocong Dong, Zhanpeng Huang et al.
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni, Yulin Wang, Renping Zhou et al.
DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion
Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan, Dongyue Wu, Guilin Zhu et al.
Distilling Multi-modal Large Language Models for Autonomous Driving
Deepti Hegde, Rajeev Yasarla, Hong Cai et al.
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang, Difan Liu, Yan Kang et al.
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li, Jinglin Xu, Yunzhen Zhao et al.
Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection
Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker et al.
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu, Wentong Li, Song Wang et al.
Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts
Jiayi Chen, Benteng Ma, Hengfei Cui et al.
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
Shuming Liu, Chen Zhao, Tianqi Xu et al.
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Zhongwei Ren, Yunchao Wei, Xun Guo et al.
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
Zhejun Zhang, Peter Karkus, Maximilian Igl et al.
LaneCPP: Continuous 3D Lane Detection using Physical Priors
Maximilian Pittner, Joel Janai, Alexandru Paul Condurache
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari et al.
Blind Image Quality Assessment Based on Geometric Order Learning
Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao, Bo Wan, Ying Zhang et al.
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
Mingzhen Sun, Weining Wang, Li et al.
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
Dale Decatur, Itai Lang, Kfir Aberman et al.
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting
Hyunwoo Park, Gun Ryu, Wonjun Kim
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Le Zhang, Rabiul Awal, Aishwarya Agrawal
Light3R-SfM: Towards Feed-forward Structure-from-Motion
Sven Elflein, Qunjie Zhou, Laura Leal-Taixe
Erasing Undesirable Influence in Diffusion Models
Jing Wu, Trung Le, Munawar Hayat et al.
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Ying Chen, Guoan Wang, Yuanfeng Ji et al.
FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
Geunhyuk Youk, Jihyong Oh, Munchurl Kim
Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.
Sparse Global Matching for Video Frame Interpolation with Large Motion
Chunxu Liu, Guozhen Zhang, Rui Zhao et al.
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Xiangyang Zhu, Renrui Zhang, Bowei He et al.
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu, Yikun Liu, Ferenas et al.
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen, Pha Nguyen, Khoa Luu
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
Beomyoung Kim, Joonsang Yu, Sung Ju Hwang
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng, Huangying Zhan, Zheng Chen et al.
Estimating Body and Hand Motion in an Ego‑sensed World
Brent Yi, Vickie Ye, Maya Zheng et al.
Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning
HAO ZHANG, Linfeng Tang, Xinyu Xiang et al.
Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification
Kunlun Xu, Xu Zou, Yuxin Peng et al.
Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering
Cheng Sun, Jaesung Choe, Charles Loop et al.
Multi-modal Learning for Geospatial Vegetation Forecasting
Vitus Benson, Claire Robin, Christian Requena-Mesa et al.
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Junlong Cheng, Bin Fu, Jin Ye et al.
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Ce Zhang, Simon Stepputtis, Joseph Campbell et al.
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Zuoyue Li, Zhenqiang Li, Zhaopeng Cui et al.
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi, Peihao Chen, Junyan Li et al.
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang, qiuyu Huang, Junjie Liu et al.
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Ruoxi Shi, Xinyue Wei, Cheng Wang et al.
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Yuming Gu, Hongyi Xu, You Xie et al.
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
Yiran Xu, Taesung Park, Richard Zhang et al.
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
Xinshun Wang, Zhongbin Fang, Xia Li et al.
TEA: Test-time Energy Adaptation
Yige Yuan, Bingbing Xu, Liang Hou et al.
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen, Lin Gu, Liang Li et al.
CPR: Retrieval Augmented Generation for Copyright Protection
Aditya Golatkar, Alessandro Achille, Luca Zancato et al.
Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
Rui Zhao, Ruiqin Xiong, Jing Zhao et al.
PhysGen3D: Crafting a Miniature Interactive World from a Single Image
Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Honghua Chen, Chen Change Loy, Xingang Pan
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.
The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding
Lorenzo Bianchi, Fabio Carrara, Nicola Messina et al.
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu, Yingwei Li, Nan Liu et al.
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu, Haoyu Liu, Hongming Fu et al.
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
Jonas Herzog
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
Qiyuan Dai, Sibei Yang
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li, Zhelun Shi, Xuhao Hu et al.
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
Yun Liu, Chengwen Zhang, Ruofan Xing et al.
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Xue Song, Jiequan Cui, Hanwang Zhang et al.
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
Siyuan Liang, Jiawei Liang, Tianyu Pang et al.
AffordDP: Generalizable Diffusion Policy with Transferable Affordance
Shijie Wu, Yihang Zhu, Yunao Huang et al.
Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking
Wei Cao, Chang Luo, Biao Zhang et al.
MoDE: CLIP Data Experts via Clustering
Jiawei Ma, Po-Yao Huang, Saining Xie et al.
Multi-Object Tracking in the Dark
Xinzhe Wang, Kang Ma, Qiankun Liu et al.
Adversarial Diffusion Compression for Real-World Image Super-Resolution
Bin Chen, Gehui Li, Rongyuan Wu et al.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.
AutoPresent: Designing Structured Visuals from Scratch
Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.
Interleaved-Modal Chain-of-Thought
Jun Gao, Yongqi Li, Ziqiang Cao et al.
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu et al.
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
Lue Fan, Hao ZHANG, Qitai Wang et al.
Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li, Bhavan Jasani, Peng Tang et al.
FineVQ: Fine-Grained User Generated Content Video Quality Assessment
Huiyu Duan, Qiang Hu, Wang Jiarui et al.
MagicQuill: An Intelligent Interactive Image Editing System
Zichen Liu, Yue Yu, Hao Ouyang et al.
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
Ziyu Shan, Yujie Zhang, Qi Yang et al.
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang, Dong In Lee, MinHyuk Jang et al.
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
AJ Piergiovanni, Isaac Noble, Dahun Kim et al.
Learning Correlation Structures for Vision Transformers
Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid et al.
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation
Yuan Wang, Rui Sun, Naisong Luo et al.
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu, Jintong Li, Yicheng Jiang et al.
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
Fei Ni, Jianye Hao, Shiguang Wu et al.
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee, Xiaohang Li, Qifei Wang et al.
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
Junyan Ye, Qiyan Luo, Jinhua Yu et al.
Federated Generalized Category Discovery
Nan Pu, Wenjing Li, Xinyuan Ji et al.
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang, hongzhen wang, Zonghao Guo et al.
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Zhen Qu, Xian Tao, Xinyi Gong et al.
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.
360+x: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen, Yuqi Hou, Chenyuan Qu et al.
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Jihyun Kim, Changjae Oh, Hoseok Do et al.
Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu
Garment Recovery with Shape and Deformation Priors
Ren Li, Corentin Dumery, Benoît Guillard et al.
Tyche: Stochastic In-Context Learning for Medical Image Segmentation
Marianne Rakic, Hallee Wong, Jose Javier Gonzalez Ortiz et al.
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu, Jie Liu, Jie Tang et al.
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das et al.
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang, Ziwei Zheng, Boxu Chen et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang
Diffusion Time-step Curriculum for One Image to 3D Generation
YI Xuanyu, Zike Wu, Qingshan Xu et al.
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang, Xinyu Xiong, Jie Ma et al.
Training-Free Pretrained Model Merging
Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.
VkD: Improving Knowledge Distillation using Orthogonal Projections
Roy Miles, Ismail Elezi, Jiankang Deng
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
Yuwei Tang, ZhenYi Lin, Qilong Wang et al.
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
Zijie Chen, Lichao Zhang, Fangsheng Weng et al.
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
Supervised Anomaly Detection for Complex Industrial Images
Aimira Baitieva, David Hurych, Victor Besnier et al.
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.
Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
Xianghui Xie, Bharat Lal Bhatnagar, Jan Lenssen et al.
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Zaid Khan, Yun Fu
EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang et al.
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu, Chen Jia, Fan Shi et al.
Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha, Ankit Jha, Shirsha Bose et al.
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou, Xu Gao, Zichong Chen et al.
MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models
Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang, Zeyu Lu et al.
AnimateAnything: Consistent and Controllable Animation for Video Generation
guojun lei, Chi Wang, Rong Zhang et al.
VAREN: Very Accurate and Realistic Equine Network
Silvia Zuffi, Ylva Mellbin, Ci Li et al.
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou, Yicong Liu, Yiman Hu et al.
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen, Yusuke Hirota, Mayu Otani et al.
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li, Weijian Ma, Xueyang Li et al.
Deep Equilibrium Diffusion Restoration with Parallel Sampling
Jiezhang Cao, Yue Shi, Kai Zhang et al.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
Sangyun Shin, Kaichen Zhou, Madhu Vankadari et al.
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang, Biao Gong, Yutong Feng et al.
CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features
Andre Rochow, Max Schwarz, Sven Behnke
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen, Zehuan Huang, Yaohui Wang et al.
6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation
Li Xu, Haoxuan Qu, Yujun Cai et al.
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot, Bogdan Mazoure, Omar Attia et al.
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen, Xiaojie Xu, Wenbo Li et al.
Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
Yujeong Chae, Hyeonseong Kim, Kuk-Jin Yoon
Face2Diffusion for Fast and Editable Face Personalization
Kaede Shiohara, Toshihiko Yamasaki
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang, Yuchang Su, Yiming Liu et al.
SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu, Benran Hu, Chi-Keung Tang et al.