Most Cited ICCV "representation spaces" Papers
2,701 papers found • Page 5 of 14
Conference
Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong et al.
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.
DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing
Aniruddha Bala, Rohit Chowdhury, Rohan Jaiswal et al.
Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks
Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov et al.
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
Yana Hasson, Pauline Luc, Liliane Momeni et al.
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye, Yongkun Du, Yunbo Tao et al.
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu, Yizhou Wang, Xiangyu Yue et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.
Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion
Enyu Liu, En Yu, Sijia Chen et al.
GT-Loc: Unifying When and Where in Images through a Joint Embedding Space
David G. Shatwell, Ishan Rajendrakumar Dave, Swetha Sirnam et al.
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu, Song-Li Wu, Sule Bai et al.
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition
Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur
Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
Wenyao Zhang, Hongsi Liu, Bohan Li et al.
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
Seunggwan Lee, Hwanhee Jung, ByoungSoo Koh et al.
Trust but Verify: Programmatic VLM Evaluation in the Wild
Viraj Prabhu, Senthil Purushwalkam, An Yan et al.
AnyPortal: Zero-Shot Consistent Video Background Replacement
Wenshuo Gao, Xicheng Lan, Shuai Yang
Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts
Viet Nguyen, Anh Nguyen, Trung Dao et al.
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Tao Han, Wanghan Xu, Junchao Gong et al.
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin, Ruohan Gao
Identity-aware Language Gaussian Splatting for Open-vocabulary 3D Semantic Segmentation
SungMin Jang, Wonjun Kim
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Mainak Singha, Subhankar Roy, Sarthak Mehrotra et al.
DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup
Zhen Qu, Xian Tao, Xinyi Gong et al.
Denoising Token Prediction in Masked Autoregressive Models
Ting Yao, Yehao Li, Yingwei Pan et al.
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.
EVT: Efficient View Transformation for Multi-Modal 3D Object Detection
Yongjin Lee, Hyeon-Mun Jeong, Yurim Jeon et al.
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.
DASH: 4D Hash Encoding with Self-Supervised Decomposition for Real-Time Dynamic Scene Rendering
Jie Chen, Zhangchi Hu, Peixi Wu et al.
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders
Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng, Junqiang Wu, Yan Li et al.
Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model
Chengxu Liu, Lu Qi, Jinshan Pan et al.
SDMatte: Grafting Diffusion Models for Interactive Matting
Longfei Huang, Yu Liang, Hao Zhang et al.
SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing
Heyi Sun, Cong Wang, Tian-Xing Xu et al.
Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation
HIroyasu Akada, Jian Wang, Vladislav Golyanik et al.
Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent
En Ci, Shanyan Guan, Yanhao Ge et al.
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.
Online Language Splatting
Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo et al.
Consensus-Driven Active Model Selection
Justin Kay, Grant Horn, Subhransu Maji et al.
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Jaeseok Jeong, Junho Kim, Youngjung Uh et al.
Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Zihua Zhao, Feng Hong, Mengxi Chen et al.
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
Zeyi Sun, Tong Wu, Pan Zhang et al.
Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection
Anja Delić, Matej Grcic, Siniša Šegvić
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.
Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling
Chao Zhou, Tianyi Wei, Nenghai Yu
Demeter: A Parametric Model of Crop Plant Morphology from the Real World
Tianhang Cheng, Albert Zhai, Evan Chen et al.
Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection
Jiasheng Guo, Xin Gao, Yuxiang Yan et al.
A Unified Framework for Motion Reasoning and Generation in Human Interaction
Jeongeun Park, Sungjoon Choi, Sangdoo Yun
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Ruiyang Ha, Songyi Jiang, Bin Li et al.
Stable Score Distillation
Haiming Zhu, Yangyang Xu, Chenshu Xu et al.
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement
Tewodros W. Ayalew, Xiao Zhang, Kevin Y Wu et al.
Multimodal Prompt Alignment for Facial Expression Recognition
Fuyan Ma, Yiran He, Bin Sun et al.
AdsQA: Towards Advertisement Video Understanding
Xinwei Long, Kai Tian, Peng Xu et al.
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
Anand Kumar, Jiteng Mu, Nuno Vasconcelos
AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model
Wenlun Zhang, Yunshan Zhong, Shimpei Ando et al.
Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation
Yihong Cao, Jiaming Zhang, Xu Zheng et al.
Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching
Tianli Liao, Chenyang Zhao, Lei Li et al.
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
LI XIAOJIE, Ronghui Li, Shukai Fang et al.
Global Regulation and Excitation via Attention Tuning for Stereo Matching
Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.
Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars
Yifan Zhan, Qingtian Zhu, Muyao Niu et al.
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas, Deepti Ghadiyaram
GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.
SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
Jessica Bader, Leander Girrbach, Stephan Alaniz et al.
Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising
Xiangbin Wei, Yuanfeng Wang, Ao XU et al.
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella et al.
ChartCap: Mitigating Hallucination of Dense Chart Captioning
Junyoung Lim, Jaewoo Ahn, Gunhee Kim
FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning
qian feng, Jiahang Tu, Mintong Kang et al.
Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber et al.
Multi-Modal Few-Shot Temporal Action Segmentation
Zijia Lu, Ehsan Elhamifar
Object-level Correlation for Few-Shot Segmentation
chunlin wen, Yu Zhang, Jie Fan et al.
Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification
Kunlun Xu, Fan Zhuo, Jiangmeng Li et al.
DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image
Jijun Xiang, Xuan Zhu, Xianqi Wang et al.
Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration
Darshan Thaker, Abhishek Goyal, Rene Vidal
ETA: Energy-based Test-time Adaptation for Depth Completion
Younjoon Chung, Hyoungseob Park, Patrick Rim et al.
ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction
Han Yu, Kehan Li, Dongbai Li et al.
Diffusion Image Prior
Hamadi Chihaoui, Paolo Favaro
RePoseD: Efficient Relative Pose Estimation With Known Depth Information
Yaqing Ding, Viktor Kocur, VACLAV VAVRA et al.
Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
Jeonghyeok Do, Munchurl Kim
Synchronization of Multiple Videos
Avihai Naaman, Ron Shapira Weber, Oren Freifeld
A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba
Ye Lu, Jie Wang, Jianjun Gao et al.
MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments
Zhixuan Liu, Haokun Zhu, Rui Chen et al.
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes
Mai Su, Zhongtao Wang, Huishan Au et al.
SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation
Chen Yi Lu, Mehrab Tanjim, Ishita Dasgupta et al.
Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
In Cho, Youngbeom Yoo, Subin Jeon et al.
Improving Large Vision and Language Models by Learning from a Panel of Peers
Jefferson Hernandez, Jing Shi, Simon Jenni et al.
Occlusion-robust Stylization for Drawing-based 3D Animation
Sunjae Yoon, Gwanhyeong Koo, Younghwan Lee et al.
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang, Arjun Chandra, Aoming Liu et al.
S⁴M: Boosting Semi-Supervised Instance Segmentation with SAM
Heeji Yoon, Heeseong Shin, Eunbeen Hong et al.
Blended Point Cloud Diffusion for Localized Text-guided Shape Editing
Etai Sella, Noam Atia, Ron Mokady et al.
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Saad, Ziad Al-Halah
Purge-Gate: Efficient Backpropagation-Free Test-Time Adaptation for Point Clouds via Token purging
Moslem Yazdanpanah, Ali Bahri, Mehrdad Noori et al.
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint
Enis Simsar, Alessio Tonioni, Yongqin Xian et al.
StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions
Bo-Hsu Ke, You-Zhe Xie, Yu-Lun Liu et al.
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
Hanwen Cao, Haobo Lu, Xiaosen Wang et al.
Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability
Boyong He, Yuxiang Ji, Zhuoyue Tan et al.
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim, Byeongsu Sim
D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang, Bingyao Yu, Yu Zheng et al.
Evading Data Provenance in Deep Neural Networks
Hongyu Zhu, Sichu Liang, Wenwen Wang et al.
SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning
XIN Hu, Ke Qin, Guiduo Duan et al.
SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation
Jiahao Zhu, Zixuan Chen, Guangcong Wang et al.
AnimalClue: Recognizing Animals by their Traces
Risa Shinoda, Nakamasa Inoue, Iro Laina et al.
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution
Phu Tran Dinh, Hung Dao, Daeyoung Kim
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
Hongjae Lee, Myungjun Son, Dongjea Kang et al.
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu, Zhengjian Yao, Lujia Jin et al.
Adversarial Exploitation of Data Diversity Improves Visual Localization
Sihang Li, Siqi Tan, Bowen Chang et al.
StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors
Xiaokun Sun, Zeyu Cai, Ying Tai et al.
ASCENT: Annotation-free Self-supervised Contrastive Embeddings for 3D Neuron Tracking in Fluorescence Microscopy
Haejun Han, Hang Lu
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
PrimHOI: Compositional Human-Object Interaction via Reusable Primitives
Kai Jia, Tengyu Liu, Mingtao Pei et al.
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park, Hyungmin Kim, Sangwoo kim et al.
PINO: Person-Interaction Noise Optimization for Long-Duration and Customizable Motion Generation of Arbitrary-Sized Groups
Sakuya Ota, Qing Yu, Kent Fujiwara et al.
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
Zhiqi Pang, Chunyu Wang, Lingling Zhao et al.
PolarAnything: Diffusion-based Polarimetric Image Synthesis
Kailong Zhang, Youwei Lyu, Heng Guo et al.
Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection
Wenjun Miao, Guansong Pang, Zihan Wang et al.
PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution
Yong Liu, Hang Dong, Jinshan Pan et al.
Efficient Spiking Point Mamba for Point Cloud Analysis
Peixi Wu, Bosong Chai, Menghua Zheng et al.
Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation
Jie Liu, Jiayi Shen, Pan Zhou et al.
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
Yusheng Dai, Chenxi Wang, Chang Li et al.
Discontinuity-aware Normal Integration for Generic Central Camera Models
Francesco Milano, Manuel Lopez-Antequera, Naina Dhingra et al.
A Conditional Probability Framework for Compositional Zero-shot Learning
Peng Wu, Qiuxia Lai, Hao Fang et al.
After the Party: Navigating the Mapping From Color to Ambient Lighting
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu et al.
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem, Piotr Teterwak, Kate Saenko et al.
Revisiting Point Cloud Completion: Are We Ready For The Real-World?
Stuti Pathak, Prashant Kumar, Dheeraj Baiju et al.
Scene Coordinate Reconstruction Priors
Wenjing Bian, Axel Barroso-Laguna, Tommaso Cavallari et al.
AFUNet: Cross-Iterative Alignment-Fusion Synergy for HDR Reconstruction via Deep Unfolding Paradigm
Xinyue Li, Zhangkai Ni, Wenhan Yang
AutoScape: Geometry-Consistent Long-Horizon Scene Generation
Jiacheng Chen, Ziyu Jiang, Mingfu Liang et al.
Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation
CHEN LIANG, Zhicheng Shi, Wenguan Wang et al.
DAMap: Distance-aware MapNet for High Quality HD Map Construction
JINPENG DONG, Chen Li, Yutong Lin et al.
Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning
Tianjiao Jiang, Zhen Zhang, Yuhang Liu et al.
ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning
Xiefan Guo, Miaomiao Cui, Liefeng Bo et al.
Fast Globally Optimal and Geometrically Consistent 3D Shape Matching
Paul Roetzer, Florian Bernard
A Real-world Display Inverse Rendering Dataset
Seokjun Choi, Hoon-Gyu Chung, Yujin Jeon et al.
DLFR-Gen: Diffusion-based Video Generation with Dynamic Latent Frame Rate
Zhihang Yuan, Rui Xie, Yuzhang Shang et al.
Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
Jinsol Song, Jiamu Wang, Anh Nguyen et al.
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün, Bernt Schiele, Jonas Fischer
Progressive Artwork Outpainting via Latent Diffusion Models
Dae-Young Song, Jung-Jae Yu, Donghyeon Cho
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo, Shengkun Tang, Cong Zeng et al.
FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos
Zhaolun Li, Jichang Li, Yinqi Cai et al.
Sparse-Dense Side-Tuner for efficient Video Temporal Grounding
David Pujol-Perich, Sergio Escalera, Albert Clapés
Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors
Shida Sun, Yue Li, Yueyi Zhang et al.
Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis
Inseung Hwang, Kiseok Choi, Hyunho Ha et al.
TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
Yan Xia, Yunxiang Lu, Rui Song et al.
All in One: Visual-Description-Guided Unified Point Cloud Segmentation
Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong et al.
EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision
Dmitrii Torbunov, Yihui Ren, Animesh Ghose et al.
Auto-Regressively Generating Multi-View Consistent Images
JiaKui Hu, Yuxiao Yang, Jialun Liu et al.
MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning
Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion et al.
Stylized-Face: A Million-level Stylized Face Dataset for Face Recognition
Zhengyuan Peng, Jianqing Xu, Yuge Huang et al.
Correspondence-Free Fast and Robust Spherical Point Pattern Registration
Anik Sarker, Alan Asbeck
Aligning Moments in Time using Video Queries
Yogesh Kumar, Uday Agarwal, Manish Gupta et al.
Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations
Dahee Kwon, Sehyun Lee, Jaesik Choi
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim
CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning
Marco P. Apolinario, Sakshi Choudhary, Kaushik Roy
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Liang Xu, Chengqun Yang, Zili Lin et al.
Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement
Junyu Lou, Xiaorui Zhao, Kexuan Shi et al.
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
Yachun Mi, Yu Li, Weicheng Meng et al.
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse et al.
Scheduling Weight Transitions for Quantization-Aware Training
Junghyup Lee, Jeimin Jeon, Dohyung Kim et al.
Seal Your Backdoor with Variational Defense
Ivan Sabolic, Matej Grcic, Siniša Šegvić
A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions
Youliang Zhang, Ronghui Li, Yachao Zhang et al.
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
Yikun Ma, Yiqing Li, Jiawei Wu et al.
FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching
Hui Li, Xiaoyu Ren, Hongjiu Yu et al.
MCOP: Multi-UAV Collaborative Occupancy Prediction
Zefu Lin, Wenbo Chen, Xiaojuan Jin et al.
PoseAnchor: Robust Root Position Estimation for 3D Human Pose Estimation
Jun-Hee Kim, Jumin Han, Seong-Whan Lee
DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
Jiajun Luo, Lizhuo Luo, Jianru Xu et al.
Guiding Noisy Label Conditional Diffusion Models with Score-based Discriminator Correction
Dat Cong, Hieu Tran, Hoang Thanh-Tung
Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack
Xingshuo Han, Xuanye Zhang, Xiang Lan et al.
TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking
Mengmeng Wang, Haonan Wang, Yulong Li et al.
MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction
Yusuke Yoshiyasu, Leyuan Sun, Ryusuke Sagawa
Learning Robust Image Watermarking with Lossless Cover Recovery
jiale chen, Wei Wang, Chongyang Shi et al.
Multi-View 3D Point Tracking
Frano Rajič, Haofei Xu, Marko Mihajlovic et al.
Decoding Correlation-Induced Misalignment in the Stable Diffusion Workflow for Text-to-Image Generation
Yunze Tong, Fengda Zhang, Didi Zhu et al.
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.
Enhancing Numerical Prediction of MLLMs with Soft Labeling
Pei Wang, Zhaowei Cai, Hao Yang et al.
ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users
Xiangyu Yin, Boyuan Yang, Weichen Liu et al.
Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation
Zhixiang Chi, Yanan Wu, Li Gu et al.
ARMO: Autoregressive Rigging for Multi-Category Objects
mingze sun, Shiwei Mao, Keyi Chen et al.
FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection
Xinhua Lu, Runhe Lai, Yanqi Wu et al.
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen, Ting Han, Changshe Zhang et al.
HumorDB: Can AI understand graphical humor?
Vedaant V Jain, Gabriel Kreiman, Felipe Feitosa
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao, Wei Qian, Aobo Chen et al.
Streaming VideoLLMs for Real-Time Procedural Video Understanding
Dibyadip Chatterjee, Edoardo Remelli, Yale Song et al.
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.
OmniVTON: Training-Free Universal Virtual Try-On
Zhaotong Yang, Yuhui Li, Shengfeng He et al.
Towards Fine-grained Interactive Segmentation in Images and Videos
Yuan Yao, Qiushi Yang, Miaomiao Cui et al.
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
KUO WANG, Quanlong Zheng, Junlin Xie et al.
BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation
Yuanhong Yu, Xingyi He, Chen Zhao et al.
FreeDance: Towards Harmonic Free-Number Group Dance Generation via a Unified Framework
Yiwen Zhao, Yang Wang, Liting Wen et al.
CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection
Zhixin Cheng, Jiacheng Deng, Xinjun Li et al.
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
Langyu Wang, Langyu Wang, Yingying Chen et al.
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
Yuanze Li, Shihao Yuan, Haolin Wang et al.
IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization
Subrat Kishore Dutta, Xiao Zhang
Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework
Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo et al.
On the Robustness Tradeoff in Fine-Tuning
Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley et al.
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao, Yannian Fu, Weiqun Wu et al.
Meta-Learning Dynamic Center Distance: Hard Sample Mining for Learning with Noisy Labels
Chenyu Mu, Yijun Qu, Jiexi Yan et al.
Passing the Driving Knowledge Test
Maolin Wei, Wanzhou Liu, Eshed Ohn-Bar