Most Cited 2025 Poster Papers
22,274 papers found • Page 101 of 112
Conference
Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression
Shiyu Qin, Jinpeng Wang, Yimin Zhou et al.
SpectralAR: Spectral Autoregressive Visual Generation
Yuanhui Huang, Weiliang Chen, Wenzhao Zheng et al.
Boosting Adversarial Transferability via Negative Hessian Trace Regularization
Yunfei Long, Zilin Tian, Liguo Zhang et al.
OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars
Jinshu Chen, Bingchuan Li, Fan Zhang et al.
Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings
Haoyu Yao, Bin Yang, Wenke Huang et al.
Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection
Yongkang Zhang, Dongyu She, Zhong Zhou
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
YeonJi Song, Jaein Kim, Suhyung Choi et al.
Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery
Xinhang Wan, Jiyuan Liu, Qian Qu et al.
HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran, Xingyu Ren, Xiang An et al.
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
Yuan Gao, Sangwook Kim, Jianzhong You et al.
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu, Jinghe Wang, Yuan Meng et al.
Dynamic Group Detection using VLM-augmented Temporal Groupness Graph
Kaname Yokoyama, Chihiro Nakatani, Norimichi Ukita
CountSE: Soft Exemplar Open-set Object Counting
Shuai Liu, Peng Zhang, Shiwei Zhang et al.
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, Renshou Wu et al.
MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation
Xinyu Liu, Guolei Sun, Cheng Wang et al.
Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting
Yuekun Dai, Haitian Li, Shangchen Zhou et al.
Generalization-Preserved Learning: Closing the Backdoor to Catastrophic Forgetting in Continual Deepfake Detection
Xueyi Zhang, Peiyin Zhu, Chengwei Zhang et al.
IGD: Instructional Graphic Design with Multimodal Layer Generation
Yadong Qu, Shancheng Fang, Yuxin Wang et al.
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Romain Thoreau, Valerio Marsocci, Dawa Derksen
CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
Yuanyuan Gao, Hao Li, Jiaqi Chen et al.
AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li, Dezhi Li, Cheng Lin et al.
Face Retouching with Diffusion Data Generation and Spectral Restorement
Zhidan Xu, Xiaoqin Zhang, Shijian Lu
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho, Yan-Ying Chen, Matthew Klenk et al.
Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Lukas Höllein, Aljaz Bozic, Michael Zollhöfer et al.
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
Xiao Chen, Tai Wang, Quanyi Li et al.
CA2C: A Prior-Knowledge-Free Approach for Robust Label Noise Learning via Asymmetric Co-learning and Co-training
Mengmeng Sheng, Zeren Sun, Tianfei Zhou et al.
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner
Zhimin Chen, Xuewei Chen, Xiao Guo et al.
MSA2: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition
Yangfu Li, Hongjian Zhan, Qi Liu et al.
MultiModal Action Conditioned Video Simulation
Yichen Li, Antonio Torralba
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen, Shell Xu Hu, Wayne Luk et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
Xiaopeng LIN, Yulong Huang, Hongwei Ren et al.
LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation
Yifei Zhang, Lei Chen
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang et al.
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin et al.
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.
Performing Defocus Deblurring by Modeling its Formation Process
Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani et al.
Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian, Yanhao Chen, Wangyancheng Wangyancheng et al.
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Yuan Liu, Saihui Hou, Saijie Hou et al.
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
Liang Qin, Min Wang, Peiwei Li et al.
GaussianReg: Rapid 2D/3D Registration for Emergency Surgery via Explicit 3D Modeling with Gaussian Primitives
Weihao Yu, Xiaoqing Guo, Xinyu Liu et al.
ArgoTweak: Towards Self-Updating HD Maps through Structured Priors
Lena Wild, Rafael Valencia, Patric Jensfelt
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu, Junwen Chen, Zhanhao Liang et al.
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Hoefler, Karsten Mueller, Wojciech Samek
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
GMMamba: Group Masking Mamba for Whole Slide Image Classification
Tingting Zheng, Hongxun Yao, Kui Jiang et al.
RareCLIP: Rarity-aware Online Zero-shot Industrial Anomaly Detection
Jianfang He, Min Cao, Silong Peng et al.
Temporal Rate Reduction Clustering for Human Motion Segmentation
Xianghan Meng, Zhengyu Tong, Zhiyuan Huang et al.
Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring
Yufei Zhu, Hao Chen, Yongjian Deng et al.
Diversity-Enhanced Distribution Alignment for Dataset Distillation
Hongcheng Li, Yucan Zhou, Xiaoyan Gu et al.
Adapt Foundational Segmentation Models with Heterogeneous Searching Space
Li Yi, Jie Hu, Songan Zhang et al.
Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification
Shenyu Lu, Zhaoying Pan, Xiaoqian Wang
Counting Stacked Objects
Corentin Dumery, Noa Ette, Aoxiang Fan et al.
RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Rank Correlation between Labels
Zhiqiang Kou, Yucheng Xie, Hailin Wang et al.
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
Pinxin Liu, Luchuan Song, Junhua Huang et al.
SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer
Yujie Xue, Huilong Pi, Jiapeng Zhang et al.
TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation
Jiale Zhou, Wenhan Wang, Shikun Li et al.
MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances
Yunzhe Shao, Xinyu Yi, Lu Yin et al.
DeFSS: Image-to-Mask Denoising Learning for Few-shot Segmentation
Zishu Qin, Junhao Xu, Weifeng Ge
TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset
Chang Liu, mingxuzhu mingxuzhu, Zheyuan Zhang et al.
Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer
YuanFu Yang, Hsiu-Hui Hsiao
VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders
Qi Wang, Zeyu Zhang, Dong Wang et al.
Multi-scenario Overlapping Text Segmentation with Depth Awareness
Yang Liu, Xudong Xie, Yuliang Liu et al.
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju, Weicai Ye, Quande Liu et al.
Learning Hierarchical Line Buffer for Image Processing
Jiacheng Li, Feiran Li, Daisuke Iso
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang, Kerui Gu, Ha Linh Nguyen et al.
GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion
Gwanghyun Kim, Xueting Li, Ye Yuan et al.
Stereo Any Video: Temporally Consistent Stereo Matching
Junpeng Jing, Weixun Luo, Ye Mao et al.
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li, Xin Li, Tianqin Li et al.
Cycle-Consistent Learning for Joint Layout-to-Image Generation and Object Detection
Xinhao Cai, Qiuxia Lai, Gensheng Pei et al.
CarGait: Cross-Attention based Re-ranking for Gait recognition
Gavriel Habib, Noa Barzilay, Or Shimshi et al.
StyleSRN: Scene Text Image Super-Resolution with Text Style Embedding
Shengrong Yuan, Runmin Wang, Ke Hao et al.
Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation
Zheng Gao, Jifei Song, Zhensong Zhang et al.
Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
Wenhan Wu, Zhishuai Guo, Chen Chen et al.
Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID
Zechao Hu, Zhengwei Yang, Hao Li et al.
Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond
Xin Qiao, Matteo Poggi, Xing Wei et al.
Discretized Gaussian Representation for Tomographic Reconstruction
Shaokai Wu, Yuxiang Lu, Yapan Guo et al.
3D Test-time Adaptation via Graph Spectral Driven Point Shift
Xin Wei, Qin Yang, Yijie Fang et al.
EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation
Zengyu Wan, Wei Zhai, Yang Cao et al.
KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding
Ran Ran, Jiwei Wei, Shiyuan He et al.
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
STEP-DETR: Advancing DETR-based Semi-Supervised Object Detection with Super Teacher and Pseudo-Label Guided Text Queries
Tahira Shehzadi, Khurram Azeem Hashmi, Shalini Sarode et al.
Completing 3D Partial Assemblies with View-Consistent 2D-3D Correspondence
Weihao Wang, Yu Lan, Mingyu You et al.
Aligning Global Semantics and Local Textures in Generative Video Enhancement
Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling
Hayeon Kim, Ji Ha Jang, Se Young Chun
Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation
Guanyi Qin, Ziyue Wang, Daiyun Shen et al.
AIM: Amending Inherent Interpretability via Self-Supervised Masking
Eyad Alshami, Shashank Agnihotri, Bernt Schiele et al.
One Last Attention for Your Vision-Language Model
Liang Chen, Ghazi Shazan Ahmad, Tianjun Yao et al.
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Chuanyu Fu, Yuqi Zhang, Kunbin Yao et al.
High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation
Runyang Feng, Hyung Jin Chang, Tze Ho Elden Tse et al.
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
Junbo Zhao, Ting Zhang, Jiayu Sun et al.
Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination
Chao Pan, Ke Tang, Li Qing et al.
Consistency Trajectory Matching for One-Step Generative Super-Resolution
Weiyi You, Mingyang Zhang, Leheng Zhang et al.
Amodal Depth Anything: Amodal Depth Estimation in the Wild
Zhenyu Li, Mykola Lavreniuk, Jian Shi et al.
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang, Jiawei Kong, Wenbo Yu et al.
CVPT: Cross Visual Prompt Tuning
Lingyun Huang, Jianxu Mao, Junfei YI et al.
DDB: Diffusion Driven Balancing to Address Spurious Correlations
Aryan Yazdan Parast, Basim Azam, Naveed Akhtar
Geometric Alignment and Prior Modulation for View-Guided Point Cloud Completion on Unseen Categories
Jingqiao Xiu, Yicong Li, Na Zhao et al.
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu, Tengbo Yu, Haoyuan Deng et al.
FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation
Wenbin Teng, Gonglin Chen, Haiwei Chen et al.
CoralSRT: Revisiting Coral Reef Semantic Segmentation by Feature Rectifying via Self-supervised Guidance
Zheng Ziqiang, Wong Kwan, Binh-Son Hua et al.
Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space
Yingping Liang, Yutao Hu, Wenqi Shao et al.
Diagnosing Pretrained Models for Out-of-distribution Detection
Haipeng Xiong, Kai Xu, Angela Yao
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?
Jinhong Ni, Chang-Bin Zhang, Qiang Zhang et al.
Learning Normals of Noisy Points by Local Gradient-Aware Surface Filtering
Qing Li, Huifang Feng, Xun Gong et al.
Bayesian-Inspired Space-Time Superpixels
Kent Gauen, Stanley Chan
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception
yunjiang xu, Yupeng Ouyang, Lingzhi Li et al.
Debiased Curriculum Adaptation for Safe Transfer Learning in Chest X-ray Classification
Mingyang Liu, Xinyang Chen, Yang Shu et al.
PHATNet: A Physics-guided Haze Transfer Network for Domain-adaptive Real-world Image Dehazing
Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin et al.
Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces with Mixture of Experts
Mingqi Fang, Ziguang Li, Lingyun Yu et al.
Information-Bottleneck Driven Binary Neural Network for Change Detection
Kaijie Yin, Zhiyuan Zhang, Shu Kong et al.
Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment
Renye Yan, Jikang Cheng, Yaozhong Gan et al.
Time-Aware Auto White Balance in Mobile Photography
Mahmoud Afifi, Luxi Zhao, Abhijith Punnappurath et al.
Leveraging Panoptic Scene Graph for Evaluating Fine-Grained Text-to-Image Generation
Xueqing Deng, Linjie Yang, Qihang Yu et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
Physical Degradation Model-Guided Interferometric Hyperspectral Reconstruction with Unfolding Transformer
Yuansheng Li, Yunhao Zou, Linwei Chen et al.
VPR-Cloak: A First Look at Privacy Cloak Against Visual Place Recognition
Shuting Dong, Mingzhi Chen, Feng Lu et al.
GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation
Chenzhong Gao, Wei Li, Desheng Weng
GSOT3D: Towards Generic 3D Single Object Tracking in the Wild
Yifan Jiao, Yunhao Li, Junhua Ding et al.
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Yehao Lu, Minghe Weng, Zekang Xiao et al.
WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image
Jiwoo Park, Tae Choi, Youngjun Jun et al.
Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables
Wontae Kim, Keuntek Lee, Nam Ik Cho
Neural Multi-View Self-Calibrated Photometric Stereo without Photometric Stereo Cues
Xu Cao, Takafumi Taketomi
EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching
Pengjie Zhang, Lin Zhu, Xiao Wang et al.
CounterPC: Counterfactual Feature Realignment for Unsupervised Domain Adaptation on Point Clouds
Feng Yang, Yichao Cao, Xiu Su et al.
Liberated-GS: 3D Gaussian Splatting Independent from SfM Point Clouds
Weihong Pan, Xiaoyu Zhang, Hongjia Zhai et al.
Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Yunqi Miao, Zhiyu Qu, Mingqi Gao et al.
Implicit Counterfactual Learning for Audio-Visual Segmentation
Mingfeng Zha, Tianyu Li, Guoqing Wang et al.
STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
Xiaohang Yang, Qing Wang, Jiahao Yang et al.
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities
Haoning Wu, Ziheng Zhao, Ya Zhang et al.
Rethink Sparse Signals for Pose-guided Text-to-image Generation
Wenjie Xuan, Jing Zhang, Juhua Liu et al.
Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras
Petr Hruby, Marc Pollefeys
Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching
Zhankai Li, Weiping Wang, jie li et al.
LDPose: Towards Inclusive Human Pose Estimation for Limb-Deficient Individuals in the Wild
Jiaying Ying, Heming Du, Kaihao Zhang et al.
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Ahmed Nassar, Matteo Omenetti, Maksym Lysak et al.
Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation
Fan Li, Xuanbin Wang, Xuan Wang et al.
ContextFace: Generating Facial Expressions from Emotional Contexts
minjung kim, Minsang Kim, Seung Jun Baek
SMP-Attack: Boosting the Transferability of Feature Importance-based Adversarial Attack with Semantics-aware Multi-granularity Patchout
Wen Yang, Guodong Liu, Di Ming
Spatial-Temporal Forgery Trace based Forgery Image Identification
Yilin Wang, Zunlei Feng, Jiachi Wang et al.
Towards Annotation-Free Evaluation: KPAScore for Human Keypoint Detection
Xiaoxiao Wang, Chunxiao Li, Peng Sun et al.
Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter
JianHui Zhang, Shen Cheng, Qirui Sun et al.
Agreement aware and dissimilarity oriented GLOM
Ru Zeng, Yan Song, Yang ZHANG et al.
MeasureXpert: Automatic Anthropometric Measurement Extraction from Two Unregistered, Partial, Posed, and Dressed Body Scans
Ran Zhao, Xinxin Dai, Pengpeng Hu et al.
DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan, Fabian Caba Heilbron, Bernard Ghanem et al.
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
Randomized Autoregressive Visual Generation
Qihang Yu, Ju He, Xueqing Deng et al.
Unsupervised RGB-D Point Cloud Registration for Scenes with Low Overlap and Photometric Inconsistency
yejun Shou, Haocheng Wang, Lingfeng Shen et al.
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta, Anirban Roy, Rama Chellappa et al.
Training-free Geometric Image Editing on Diffusion Models
Hanshen Zhu, Zhen Zhu, Kaile Zhang et al.
Monocular Facial Appearance Capture in the Wild
Yingyan Xu, Kate Gadola, Prashanth Chandran et al.
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao, Mingyang Wang, Zhou Yu et al.
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong, Necati Cihan Camgoz, Richard Bowden
MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge
Sabbir Ahmed, Jingtao Li, Weiming Zhuang et al.
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Junyuan Zhang, Qintong Zhang, Bin Wang et al.
Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion
Quanmin Liang, Qiang Li, Shuai Liu et al.
Head2Body: Body Pose Generation from Multi-sensory Head-mounted Inputs
Minh Tran, Hongda Mao, Qingshuang Chen et al.
Looking in the Mirror: A Faithful Counterfactual Explanation Method for Interpreting Deep Image Classification Models
Townim Chowdhury, Vu Phan, Kewen Liao et al.
FLSeg: Enhancing Privacy and Robustness in Federated Learning under Heterogeneous Data via Model Segmentation
Zichun Su, Zhi Lu, Yutong Wu et al.
Self-Calibrating Gaussian Splatting for Large Field-of-View Reconstruction
Youming Deng, Wenqi Xian, Guandao Yang et al.
Gradient Decomposition and Alignment for Incremental Object Detection
Wenlong Luo, Shizhou Zhang, De Cheng et al.
MSQ: Memory-Efficient Bit Sparsification Quantization
Seokho Han, Seoyeon Yoon, Jinhee Kim et al.
When and Where do Data Poisons Attack Textual Inversion?
Jeremy Styborski, Mingzhi Lyu, Jiayou Lu et al.
SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement
Liwen Xiao, Zhiyu Pan, Zhicheng Wang et al.
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting
Alexey Kravets, Da Chen, Vinay Namboodiri
AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Hao Li, Ju Dai, Feng Zhou et al.
BokehDiff: Neural Lens Blur with One-Step Diffusion
Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.
Trial-Oriented Visual Rearrangement
Yuyi Liu, Xinhang Song, Tianliang Qi et al.
Debiased Teacher for Day-to-Night Domain Adaptive Object Detection
Yiming Cui, Liang Li, Haibing YIN et al.
SpikePack: Enhanced Information Flow in Spiking Neural Networks with High Hardware Compatibility
Guobin Shen, Jindong Li, Tenglong Li et al.
Social Debiasing for Fair Multi-modal LLMs
Harry Cheng, Yangyang Guo, Qingpei Guo et al.
Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval
Zhe Li, Lei Zhang, Zheren Fu et al.
UPP: Unified Point-Level Prompting for Robust Point Cloud Analysis
Zixiang Ai, Zhenyu Cui, Yuxin Peng et al.
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions
Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong et al.
Probabilistic Inertial Poser (ProbIP): Uncertainty-aware Human Motion Modeling from Sparse Inertial Sensors
Min Kim, Younho Jeon, Sungho Jo
SFUOD: Source-Free Unknown Object Detection
Keon-Hee Park, Seun-An Choe, Gyeong-Moon Park
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal
Jinpei Guo, Zheng Chen, Wenbo Li et al.
ConstStyle: Robust Domain Generalization with Unified Style Transformation
Nam Duong Tran, Nam Nguyen Phuong, Hieu Pham et al.
Golden Noise for Diffusion Models: A Learning Framework
zikai zhou, Shitong Shao, Lichen Bai et al.
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min, Muli Yang, Jinhao Zhang et al.
OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
Jinhong Wang, Shuo Tong, Jintai CHEN et al.
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu, Yufei Yin, Chenchen Jing et al.
LayerAnimate: Layer-level Control for Animation
Yuxue Yang, Lue Fan, Zuzeng Lin et al.
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
shengyuan zhang, An Zhao, Ling Yang et al.
SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM
Yannick Burkhardt, Simon Schaefer, Stefan Leutenegger
FedAGC: Federated Continual Learning with Asymmetric Gradient Correction
Chengchao Zhang, Fanhua Shang, Hongying Liu et al.
Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation
Seunghyun Lee, Tae-Kyun Kim
Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization
Ashutosh Anshul, Shreyas Gopal, Deepu Rajan et al.
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
Zelong Sun, Dong Jing, Zhiwu Lu
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation
Ho Kei Cheng, Alex Schwing
DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
Yue-Jiang Dong, Wang Zhao, Jiale Xu et al.
InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
Wenjie Zhuo, Fan Ma, Hehe Fan
Client2Vec: Improving Federated Learning by Distribution Shifts Aware Client Indexing
Yongxin Guo, Lin Wang, Xiaoying Tang et al.