Most Cited CVPR "false positive rate control" Papers
5,589 papers found • Page 8 of 28
Conference
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
Fei Ni, Jianye Hao, Shiguang Wu et al.
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.
Small Scale Data-Free Knowledge Distillation
He Liu, Yikai Wang, Huaping Liu et al.
Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation
Yuan Wang, Rui Sun, Naisong Luo et al.
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
Guangyang Wu, Xin Tao, Changlin Li et al.
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
Lue Fan, Hao ZHANG, Qitai Wang et al.
Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang
Federated Generalized Category Discovery
Nan Pu, Wenjing Li, Xinyuan Ji et al.
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
Zhuohong Li, Wei He, Jiepan Li et al.
360+x: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen, Yuqi Hou, Chenyuan Qu et al.
MoDE: CLIP Data Experts via Clustering
Jiawei Ma, Po-Yao Huang, Saining Xie et al.
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
Tom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou, Dan Guo, Ruohao Guo et al.
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
Zijie Chen, Lichao Zhang, Fangsheng Weng et al.
FastMAC: Stochastic Spectral Sampling of Correspondence Graph
Yifei Zhang, Hao Zhao, Hongyang Li et al.
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Xue Song, Jiequan Cui, Hanwang Zhang et al.
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu, Chen Jia, Fan Shi et al.
CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing
Zhen Guo, Hongping Gan
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li, Weijian Ma, Xueyang Li et al.
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo, Xiaodong Gu
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset
Jiazhen Liu, Yuhan Fu, Ruobing Xie et al.
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
Minyoung Hwang, Luca Weihs, Chanwoo Park et al.
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu et al.
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan, Wenguan Wang, Zhibo Tian et al.
Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation
Ying Jin, Jinlong Peng, Qingdong He et al.
MANUS: Markerless Grasp Capture using Articulated 3D Gaussians
Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.
Diffusion Time-step Curriculum for One Image to 3D Generation
YI Xuanyu, Zike Wu, Qingshan Xu et al.
Learning Equi-angular Representations for Online Continual Learning
Minhyuk Seo, Hyunseo Koh, Wonje Jeung et al.
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.
VkD: Improving Knowledge Distillation using Orthogonal Projections
Roy Miles, Ismail Elezi, Jiankang Deng
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Fengyuan Shi, Jiaxi Gu, Hang Xu et al.
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw, Seunghyun Yoon, Fabian Caba Heilbron et al.
Open-Vocabulary Object 6D Pose Estimation
Jaime Corsetti, Davide Boscaini, Changjae Oh et al.
ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing
Zhongze Wang, Haitao Zhao, Jingchao Peng et al.
Face2Diffusion for Fast and Editable Face Personalization
Kaede Shiohara, Toshihiko Yamasaki
CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
Yifan Gao, Zihang Lin, Chuanbin Liu et al.
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
Yuwei Tang, ZhenYi Lin, Qilong Wang et al.
Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model
Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.
Tyche: Stochastic In-Context Learning for Medical Image Segmentation
Marianne Rakic, Hallee Wong, Jose Javier Gonzalez Ortiz et al.
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel, Changhoon Kim, Sheng Cheng et al.
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen, Zehuan Huang, Yaohui Wang et al.
Test-Time Adaptation for Depth Completion
Hyoungseob Park, Anjali W Gupta, Alex Wong
Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha, Ankit Jha, Shirsha Bose et al.
SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu, Benran Hu, Chi-Keung Tang et al.
Garment Recovery with Shape and Deformation Priors
Ren Li, Corentin Dumery, Benoît Guillard et al.
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma, Luoxin Ye, Nessa McWeeney et al.
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models
Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot, Bogdan Mazoure, Omar Attia et al.
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
CONG MA, Qiao Lei, Chengkai Zhu et al.
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
Anqi Zhu, Qiuhong Ke, Mingming Gong et al.
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen, Xiaojie Xu, Wenbo Li et al.
Deep Equilibrium Diffusion Restoration with Parallel Sampling
Jiezhang Cao, Yue Shi, Kai Zhang et al.
Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu
EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang et al.
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Jihyun Kim, Changjae Oh, Hoseok Do et al.
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi, Svetlana Orlova, Daan de Geus et al.
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
Yutao Feng, Xiang Feng, Yintong Shang et al.
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Linfeng Yuan, Miaojing Shi, Zijie Yue et al.
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang, Xinyu Xiong, Jie Ma et al.
MC^2: Multi-concept Guidance for Customized Multi-concept Generation
Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.
GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
Xiaotian Li, Baojie Fan, Jiandong Tian et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
Training-Free Pretrained Model Merging
Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das et al.
AnimateAnything: Consistent and Controllable Animation for Video Generation
guojun lei, Chi Wang, Rong Zhang et al.
Rethinking Multi-view Representation Learning via Distilled Disentangling
Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li, Yingyi Liu, XINYU WANG et al.
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Boyang Wang, Fengyu Yang, Xihang Yu et al.
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
Sangyun Shin, Kaichen Zhou, Madhu Vankadari et al.
PhysAnimator: Physics-Guided Generative Cartoon Animation
Tianyi Xie, Yiwei Zhao, Ying Jiang et al.
PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon
Large Language Models are Good Prompt Learners for Low-Shot Image Classification
Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Gihyun Kwon, Simon Jenni, Ding Li et al.
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Thomas Wimmer, Peter Wonka, Maks Ovsjanikov
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin, Shuai Wang, RIZHAO CAI et al.
MeshArt: Generating Articulated Meshes with Structure-Guided Transformers
Daoyi Gao, Mohd Yawar Nihal Siddiqui, Lei Li et al.
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Yuan Dong, Chuan Fang, Liefeng Bo et al.
Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
Hongwei Yan, Liyuan Wang, Kaisheng Ma et al.
PromptHMR: Promptable Human Mesh Recovery
Yufu Wang, Yu Sun, Priyanka Patel et al.
Masked and Shuffled Blind Spot Denoising for Real-World Images
Hamadi Chihaoui, Paolo Favaro
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim, Feng Liu, Yiyang Su et al.
Bayesian Diffusion Models for 3D Shape Reconstruction
Haiyang Xu, Yu lei, Zeyuan Chen et al.
PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
Jingbo Wang, Zhengyi Luo, Ye Yuan et al.
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang, Bohan Zhuang, Qi Wu
RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses
bedrettin cetinkaya, Sinan Kalkan, Emre Akbas
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou, Yicong Liu, Yiman Hu et al.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.
Domain Prompt Learning with Quaternion Networks
Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao, Sanghwan Kim, Iuliana Georgescu et al.
VAREN: Very Accurate and Realistic Equine Network
Silvia Zuffi, Ylva Mellbin, Ci Li et al.
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content
Zicheng Zhang, Tengchuan Kou, Chunyi Li et al.
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai, Hiroyasu Akada, Jian Wang et al.
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction
Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
Wonjoon Jin, Qi Dai, Chong Luo et al.
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
Han Wang, Gang Wang, Huan Zhang
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang, Yuchang Su, Yiming Liu et al.
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
Yudi Shi, Shangzhe Di, Qirui Chen et al.
Dynamic LiDAR Re-simulation using Compositional Neural Fields
Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger et al.
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.
Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
Yujeong Chae, Hyeonseong Kim, Kuk-Jin Yoon
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva, Andrew Zisserman
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.
Overcoming Generic Knowledge Loss with Selective Parameter Update
Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features
Andre Rochow, Max Schwarz, Sven Behnke
Continual Forgetting for Pre-trained Vision Models
Hongbo Zhao, Bolin Ni, Junsong Fan et al.
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang, Biao Gong, Yutong Feng et al.
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng, Danqing Huang, YU QIAO et al.
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao et al.
DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF
Jie Long Lee, Chen Li, Gim Hee Lee
TinyFusion: Diffusion Transformers Learned Shallow
Gongfan Fang, Kunjun Li, Xinyin Ma et al.
POPDG: Popular 3D Dance Generation with PopDanceSet
Zhenye Luo, Min Ren, Xuecai Hu et al.
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
Biwen Lei, Kai Yu, Mengyang Feng et al.
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang, Jianmin Bao, Shuyang Gu et al.
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation
WEIMING ZHANG, Yexin Liu, Xu Zheng et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.
PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo et al.
Latent Modulated Function for Computational Optimal Continuous Image Representation
Zongyao He, Zhi Jin
Language-Driven Anchors for Zero-Shot Adversarial Robustness
Xiao Li, Wei Zhang, Yining Liu et al.
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Peng Dai, Yang Zhang, Tao Liu et al.
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
Bozhou Zhang, Nan Song, Xin Jin et al.
Spatio-Temporal Turbulence Mitigation: A Translational Perspective
Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang, Donghyun Kim, Zihang Meng et al.
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Songchun Zhang, Yibo Zhang, Quan Zheng et al.
Targeted Representation Alignment for Open-World Semi-Supervised Learning
Ruixuan Xiao, Lei Feng, Kai Tang et al.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
Semantic-aware SAM for Point-Prompted Instance Segmentation
Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong, Xing Zhang, Bihui Chen et al.
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment
yiming ren, xiao han, Chengfeng Zhao et al.
Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation
zhuangzhuang chen, Zhuonan Lai, Jie Chen et al.
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Liao Wang, Kaixin Yao, Chengcheng Guo et al.
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Bo Li, Haoke Xiao, Lv Tang
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son, Jaehun Park, Kwangsu Kim
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Litian Liu, Yao Qin
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu et al.
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.
Category-Level Multi-Part Multi-Joint 3D Shape Assembly
Yichen Li, Kaichun Mo, Yueqi Duan et al.
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie, Jiahao Li, Hao Tan et al.
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin, Xueyang Yu, Ziqi Pang et al.
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He, Garvita Tiwari, Tolga Birdal et al.
Time- Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid et al.
COCONut: Modernizing COCO Segmentation
Xueqing Deng, Qihang Yu, Peng Wang et al.
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang, Dan Song, pengxin zhan et al.
Material Anything: Generating Materials for Any 3D Object via Diffusion
Xin Huang, Tengfei Wang, Ziwei Liu et al.
CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.
Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.
Language-guided Image Reflection Separation
Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
Jingbo Zhang, Xiaoyu Li, Qi Zhang et al.
VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan, Difan Liu, Shantanu Agarwal et al.
Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park, Chanhwi Jeong, Junoh Lee et al.
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang, Sule Bai, Guangyi Chen et al.
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang, Jiluan Fan, Anli Yan et al.
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan, Zechen Bai, Tianjun Xiao et al.
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu, Jingyang Zhang, Tian Fang et al.
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud, Yapeng Tian, Diana Marculescu
Implicit Event-RGBD Neural SLAM
Delin Qu, Chi Yan, Dong Wang et al.
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
Quan Liu, Hongzi Zhu, Zhenxi Wang et al.
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
Jeongsoo Park, Andrew Owens
Active Prompt Learning in Vision Language Models
Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Chenjie Cao, Yunuo Cai, Qiaole Dong et al.
MonoHair: High-Fidelity Hair Modeling from a Monocular Video
Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni, Bernhard Egger, Suhas Lohit et al.
Neural Spline Fields for Burst Image Fusion and Layer Separation
Ilya Chugunov, David Shustin, Ruyu Yan et al.
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang, Yang Liu, Yang Hua et al.
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.
Real-time 3D-aware Portrait Video Relighting
Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.
Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan et al.
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.