Most Cited 2025 "uncertainty-aware exploration" Papers
22,274 papers found • Page 80 of 112
Conference
Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization
Haoran Xu, Liyuan Mao, Hui Jin et al.
Vertical Federated Feature Screening
Huajun Yin, Liyuan Wang, Yingqiu Zhu et al.
GenColor: Generative and Expressive Color Enhancement with Pixel-Perfect Texture Preservation
Yi Dong, Yuxi Wang, Xianhui Lin et al.
Low-degree evidence for computational transition of recovery rate in stochastic block model
Jingqiu Ding, Yiding Hua, Lucas Slot et al.
Memorization in Graph Neural Networks
Adarsh Jamadandi, Jing Xu, Adam Dziedzic et al.
ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation
Jiuhong Xiao, Roshan Nayak, Ning Zhang et al.
Continuous Q-Score Matching: Diffusion Guided Reinforcement Learning for Continuous-Time Control
Chengxiu HUA, Jiawen Gu, Yushun Tang
From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs
Xin Li, Xiaotao Zheng, Zhihong Xia
Learning Spatial-Aware Manipulation Ordering
Yuxiang Yan, Zhiyuan Zhou, Xin Gao et al.
MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks
WANTONG XIE, Yi-Xiang Hu, Jieyang Xu et al.
Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model
Yuyao Wang, Yu-Hung Cheng, Debarghya Mukherjee et al.
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla, Christian Stippel, Leon Sick
Streaming Federated Learning with Markovian Data
Khiem HUYNH, Malcolm Egan, Giovanni Neglia et al.
Efficient Verified Unlearning For Distillation
Yijun Quan, Zushu Li, Giovanni Montana
Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
Vlad Hosu, Lorenzo Agnolucci, Daisuke Iso et al.
Neuromanifold-Regularized KANs for Shape-fair Feature Representations
Mazlum Arslan, Weihong Guo, Shuo Li
Less Static, More Private: Towards Transferable Privacy-Preserving Action Recognition by Generative Decoupled Learning
Zhi-Wei Xia, Kun-Yu Lin, Yuan-Ming Li et al.
AdaDCP: Learning an Adapter with Discrete Cosine Prior for Clear-to-Adverse Domain Generalization
Qi Bi, Yixian Shen, Jingjun Yi et al.
MorphoGen: Efficient Unconditional Generation of Long-Range Projection Neuronal Morphology via a Global-to-Local Framework
Tianfang Zhu, Hongyang Zhou, Anan LI
GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars
Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein et al.
Capturing head avatar with hand contacts from a monocular video
Haonan He, Yufeng Zheng, Jie Song
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim et al.
Privacy-centric Deep Motion Retargeting for Anonymization of Skeleton-Based Motion Visualization
Thomas Carr, Depeng Xu, Shuhan Yuan et al.
Highlight What You Want: Weakly-Supervised Instance-Level Controllable Infrared-Visible Image Fusion
Zeyu Wang, Jizheng Zhang, Haiyu Song et al.
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
Quanwei Yang, Luying Huang, Kaisiyuan Wang et al.
KINDLE: Knowledge-Guided Distillation for Prior-Free Gene Regulatory Network Inference
Rui Peng, Yuchen Lu, Qichen Sun et al.
Learning A Unified Template for Gait Recognition
Panjian Huang, Saihui Hou, Junzhou Huang et al.
ZFusion: Efficient Deep Compositional Zero-shot Learning for Blind Image Super-Resolution with Generative Diffusion Prior
Alireza Esmaeilzehi, Hossein Zaredar, Yapeng Tian et al.
FlowDPS : Flow-Driven Posterior Sampling for Inverse Problems
Jeongsol Kim, Bryan Sangwoo Kim, Jong Ye
HADES: Human Avatar with Dynamic Explicit Hair Strands
Zhanfeng Liao, Hanzhang Tu, Cheng Peng et al.
Unfolding-Associative Encoder-Decoder Network with Progressive Alignment for Pansharpening
Shijie Fang, Hongping Gan
MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration
Tao Wang, Peiwen Xia, Bo Li et al.
NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan, Jianing Li, Wei Chen et al.
LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Minwoo Choi et al.
EVDM: Event-based Real-world Video Deblurring with Mamba
Zhijing Sun, Senyan Xu, Kean Liu et al.
Q-Norm: Robust Representation Learning via Quality-Adaptive Normalization
Lanning Zhang, Ying Zhou, Fei Gao et al.
Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction
Yanwen Fang, Wenqi Jia, Xu Cao et al.
π-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
Susan Liang, Chao Huang, Yolo Yunlong Tang et al.
RobAVA: A Large-scale Dataset and Baseline Towards Video based Robotic Arm Action Understanding
Baoli Sun, Ning Wang, Xinzhu Ma et al.
VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
YUE QIU, Yanjun Sun, Takuma Yagi et al.
Generic Event Boundary Detection via Denoising Diffusion
Jaejun Hwang, Dayoung Gong, Manjin Kim et al.
Not All Degradations Are Equal: A Targeted Feature Denoising Framework for Generalizable Image Super-Resolution
hongjun wang, Jiyuan Chen, Zhengwei Yin et al.
Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation
Tanay Agrawal, Abid Ali, Antitza Dantcheva et al.
Fine-Grained 3D Gaussian Head Avatars Modeling from Static Captures via Joint Reconstruction and Registration
Yuan Sun, Xuan Wang, Cong Wang et al.
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li, Yifan Jiao, Dan Meng et al.
MistSense: Versatile Online Detection of Procedural and Execution Mistakes
Constantin Patsch, Yuankai Wu, Marsil Zakour et al.
SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting
Arthur Josi, Luiz Gustavo Hafemann, Abdallah Dib et al.
MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
Syed Talal Wasim, Hamid Suleman, Olga Zatsarynna et al.
GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule
Rui Wang, Yimu Sun, Jingxing Guo et al.
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
Thomas Kreutz, Max Mühlhäuser, Alejandro Sanchez Guinea
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
TimeBooth: Disentangled Facial Invariant Representation for Diverse and Personalized Face Aging
Zepeng Su, zhulin liu, Zongyan Zhang et al.
Penalizing Boundary Activation for Object Completeness in Diffusion Models
Haoyang Xu, Tianhao Zhao, Sibei Yang et al.
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
Zhengyao Lyu, Chenyang Si, Tianlin Pan et al.
Straighten Viscous Rectified Flow via Noise Optimization
Jimin Dai, Jiexi Yan, Jian Yang et al.
Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models
Jianwei Fei, Yunshu Dai, Peipeng Yu et al.
CRAM: Large Scale Video Continual Learning with Bootstrapped Compression
Shivani Mall, Joao F. Henriques
Tree-NeRV: Efficient Non-Uniform Sampling for Neural Video Representation via Tree-Structured Feature Grids
Jiancheng Zhao, Yifan Zhan, Qingtian Zhu et al.
MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer
Nisha Huang, Henglin Liu, Yizhou Lin et al.
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
Robust Adverse Weather Removal via Spectral-based Spatial Grouping
Yuhwan Jeong, Yunseo Yang, Youngho Yoon et al.
LUSD: Localized Update Score Distillation for Text-Guided Image Editing
Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong et al.
FlowChef: Steering of Rectified Flow Models for Controlled Generations
Maitreya Patel, Song Wen, Dimitris Metaxas et al.
Learning Efficient and Generalizable Human Representation with Human Gaussian Model
Yifan Liu, Shengjun Zhang, Chensheng Dai et al.
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Eunseo Koh, SeungHoo Hong, Tae-Young Kim et al.
SynTag: Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking
Han Fang, Kejiang Chen, Zehua Ma et al.
Event-guided HDR Reconstruction with Diffusion Priors
Yixin Yang, jiawei zhang, Yang Zhang et al.
Fast Image Super-Resolution via Consistency Rectified Flow
Jiaqi Xu, Wenbo Li, Haoze Sun et al.
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud, Sergey Lavrushkin, Alexey Kirillov et al.
Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion
Jiwon Kim, Pureum Kim, SeonHwa Kim et al.
Anti-Tamper Protection for Unauthorized Individual Image Generation
Zelin Li, Ruohan Zong, Yifan Liu et al.
Continual Personalization for Diffusion Models
Yu-Chien Liao, Jr-Jen Chen, Chi-Pin Huang et al.
Split-and-Combine: Enhancing Style Augmentation for Single Domain Generalization
Zhen Zhang, Zhen Zhang, Qianlong Dang et al.
Predicting Functional Brain Connectivity with Context-Aware Deep Neural Networks
Alexander Ratzan, Sidharth Goel, Junhao Wen et al.
Zero-Shot Depth Aware Image Editing with Diffusion Models
Rishubh Parihar, Sachidanand VS, Venkatesh Babu Radhakrishnan
Global and Local Entailment Learning for Natural World Imagery
Srikumar Sastry, Aayush Dhakal, Eric Xing et al.
TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring
Zhu Xu, Ting Lei, Zhimin Li et al.
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
Yuran Dong, Mang Ye
Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis
Baoyue Hu, Yang Wei, Junhao Xiao et al.
MBTI: Masked Blending Transformers with Implicit Positional Encoding for Frame-rate Agnostic Motion Estimation
Jungwoo Huh, Yeseung Park, Seongjean Kim et al.
FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process
Bin Fu, Zixuan Wang, Kainan Yan et al.
LDIP: Long Distance Information Propagation for Video Super-Resolution
Michael Bernasconi, Abdelaziz Djelouah, Yang Zhang et al.
ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning
Yuanlin Wang, Ruiqin Xiong, Rui Zhao et al.
TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control
Zhenyu Yan, Jian Wang, Aoqiang Wang et al.
Continuous-Time Human Motion Field from Event Cameras
Ziyun Wang, Ruijun Zhang, Zi-Yan Liu et al.
Beyond Perspective: Neural 360-Degree Video Compression
Andy Regensky, Marc Windsheimer, Fabian Brand et al.
MCID: Multi-aspect Copyright Infringement Detection for Generated Images
Chuanwei Huang, Zexi Jia, Hongyan Fei et al.
Text2Outfit: Controllable Outfit Generation with Multimodal Language Models
Yuanhao Zhai, Yen-Liang Lin, Minxu Peng et al.
WarpHE4D: Dense 4D Head Map toward Full Head Reconstruction
Jongseob Yun, Yong-Hoon Kwon, Min-Gyu Park et al.
One-Step Specular Highlight Removal with Adapted Diffusion Models
Mahir Atmis, LEVENT KARACAN, Mehmet SARIGÜL
DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting
Jingyi Pan, Dan Xu, Qiong Luo
Laboring on less labors: RPCA Paradigm for Pan-sharpening
honghui xu, Chuangjie Fang, Yibin Wang et al.
Punching Bag vs. Punching Person: Motion Transferability in Videos
Raiyaan Abdullah, Jared Claypoole, Michael Cogswell et al.
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco, Rahul Ramesh, Randall Balestriero et al.
Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
Dale Decatur, Thibault Groueix, Wang Yifan et al.
MOF-BFN: Metal-Organic Frameworks Structure Prediction via Bayesian Flow Networks
Rui Jiao, Hanlin Wu, Wenbing Huang et al.
Cross-Granularity Online Optimization with Masked Compensated Information for Learned Image Compression
Haowei Kuang, Wenhan Yang, Zongming Guo et al.
Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation
Rui Yang, Huining Li, Yiyi Long et al.
Robust Test-Time Adaptation for Single Image Denoising Using Deep Gaussian Prior
Qing Ma, Pengwei Liang, Xiong Zhou et al.
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Lei-lei Li, Jianwu Fang, Junbin Xiao et al.
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Minghao Fu, Guo-Hua Wang, Xiaohao Chen et al.
FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models
Minghan LI, Chenxi Xie, Yichen Wu et al.
TrackVerse: A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning
Yibing Wei, Samuel Church, Victor Suciu et al.
DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads
Xiaoxi Liang, Yanbo Fan, Qiya Yang et al.
Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection
Bowen Fu, Wei Wei, Jiaqi Tang et al.
Less is More: Improving Motion Diffusion Models with Sparse Keyframes
Jinseok Bae, Inwoo Hwang, Young-Yoon Lee et al.
Drawing Developmental Trajectory from Cortical Surface Reconstruction
WENXUAN WU, ruowen qu, Zhongliang Liu et al.
Toward Better Out-painting: Improving the Image Composition with Initialization Policy Model
Xuan Han, Yihao Zhao, Yanhao Ge et al.
Blind Noisy Image Deblurring Using Residual Guidance Strategy
Heyan Liu, Jianing Sun, Jun Liu et al.
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu, I Chen, Jindong Gu et al.
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
Zhuoling Li, Haoxuan Qu, Jason Kuen et al.
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos et al.
AM-Adapter: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Siyoon Jin, Jisu Nam, Jiyoung Kim et al.
Diffusion Epistemic Uncertainty with Asymmetric Learning for Diffusion-Generated Image Detection
Yingsong Huang, Hui Guo, Jing Huang et al.
MonSTeR: a Unified Model for Motion, Scene, Text Retrieval
Luca Collorone, Matteo Gioia, Massimiliano Pappa et al.
E-NeMF: Event-based Neural Motion Field for Novel Space-time View Synthesis of Dynamic Scenes
Yan Liu, Zehao Chen, Haojie Yan et al.
Multi-modal Identity Extraction
Ryan Webster, Teddy Furon
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
Eric Slyman, Mehrab Tanjim, Kushal Kafle et al.
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
Zeyi Sun, Ziyang Chu, Pan Zhang et al.
Reference-based Super-Resolution via Image-based Retrieval-Augmented Generation Diffusion
Byeonghun Lee, Hyunmin Cho, Honggyu Choi et al.
Deep Adaptive Unfolded Network via Spatial Morphology Stripping and Spectral Filtration for Pan-sharpening
Hebaixu Wang, Jiayi Ma
Open-World Skill Discovery from Unsegmented Demonstration Videos
Jingwen Deng, Zihao Wang, Shaofei Cai et al.
Instruction-based Image Editing with Planning, Reasoning, and Generation
Liya Ji, Chenyang Qi, Qifeng Chen
HDR Image Generation via Gain Map Decomposed Diffusion
Yuanshen Guan, Ruikang Xu, Yinuo Liao et al.
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee, Kyungho Bae, Kyle Min et al.
ArtEditor: Learning Customized Instructional Image Editor from Few-Shot Examples
Shijie Huang, Yiren Song, Yuxuan Zhang et al.
InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians
Kefan Chen, Sergiu Oprea, Justin Theiss et al.
A3GS: Arbitrary Artistic Style into Arbitrary 3D Gaussian Splatting
Zhiyuan Fang, Rengan Xie, Xuancheng Jin et al.
Universal Few-shot Spatial Control for Diffusion Models
Kiet Nguyen, Chanhyuk Lee, Donggyun Kim et al.
Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Md Ashiqur Rahman, Chiao-An Yang, Michael N Cheng et al.
Free2Guide: Training-Free Text-to-Video Alignment using Image LVLM
Jaemin Kim, Bryan Sangwoo Kim, Jong Ye
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He et al.
DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models
SeungHoo Hong, GeonHo Son, Juhun Lee et al.
GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing
Tianyang Xue, Lin Lu, Yang Liu et al.
LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.
Collaborative and Confidential Junction Trees for Hybrid Bayesian Networks
Roberto Gheda, Abele Mălan, Thiago Guzella et al.
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.
Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
An Lun Liu, Yu-Wei Chao, Yi-Ting Chen
RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration
Longxin Kou, Fei Ni, Jianye HAO et al.
EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding
Xuan-Hao Liu, Bao-liang Lu, Wei-Long Zheng
Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
shangwen zhu, Han Zhang, Zhantao Yang et al.
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu, Ruonan Yu, Xinchao Wang
Semantic Discrepancy-aware Detector for Image Forgery Identification
Wang Ziye, Minghang Yu, Chunyan Xu et al.
What If: Understanding Motion Through Sparse Interactions
Stefan A. Baumann, Nick Stracke, Timy Phan et al.
Fractional Langevin Dynamics for Combinatorial Optimization via Polynomial-Time Escape
Shiyue Wang, Ziao Guo, Changhong Lu et al.
Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP
Trevor Canham, SaiKiran Tedla, Michael Murdoch et al.
Error Recognition in Procedural Videos using Generalized Task Graph
Shih-Po Lee, Ehsan Elhamifar
WalkVLM: Aid Visually Impaired People Walking by Vision Language Model
Zhiqiang Yuan, Ting Zhang, Yeshuang Zhu et al.
Teleportraits: Training-Free People Insertion into Any Scene
Jialu Gao, Joseph K J, Fernando De la Torre
Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes
Tom Fischer, Xiaojie Zhang, Eddy Ilg
Proactive Scene Decomposition and Reconstruction
Baicheng Li, Zike Yan, Dong Wu et al.
PASD: A Pixel-Adaptive Swarm Dynamics Approach for Unsupervised Low-Light Image Enhancement
Shuai Jin, Yuhua Qian, Feijiang Li et al.
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
Tiancheng SHEN, Jun Hao Liew, Zilong Huang et al.
Bridging the Sky and Ground: Towards View-Invariant Feature Learning for Aerial-Ground Person Re-Identification
Wajahat Khalid, Bin Liu, Xulin Li et al.
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu et al.
Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI
Haodong Jing, Dongyao Jiang, Yongqiang Ma et al.
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde, Zihao Lu, Radu Timofte
ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement
KA WONG, Jicheng Zhou, Haiwei Wu et al.
Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification
Wenkui Yang, Jie Cao, Junxian Duan et al.
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
Generative Video Bi-flow
Chen Liu, Tobias Ritschel
Combinative Matching for Geometric Shape Assembly
Nahyuk Lee, Juhong Min, Junhong Lee et al.
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.
Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images
Philipp Wulff, Felix Wimbauer, Dominik Muhle et al.
JPEG Processing Neural Operator for Backward-Compatible Coding
Woo Kyoung Han, Yongjun Lee, Byeonghun Lee et al.
All Parts Matter: A Unified Mask-Free Virtual Try-On Framework
Chenghu Du, Shengwu Xiong, Yi Rong
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
GECO: Geometrically Consistent Embedding with Lightspeed Inference
Regine Hartwig, Dominik Muhle, Riccardo Marin et al.
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma et al.
MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding
Gao Zong lin, Huu-Tai Phung, Yi-Chen Yao et al.
Fast Non-Log-Concave Sampling under Nonconvex Equality and Inequality Constraints with Landing
Kijung Jeon, Michael Muehlebach, Molei Tao
On the Provable Importance of Gradients for Autonomous Language-Assisted Image Clustering
Bo Peng, Jie Lu, Guangquan Zhang et al.
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation
You Huang, Lichao Chen, Jiayi Ji et al.
GenHaze: Pioneering Controllable One-Step Realistic Haze Generation for Real-World Dehazing
Sixiang Chen, Tian Ye, Yunlong Lin et al.
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.
An Efficient Hybrid Vision Transformer for TinyML Applications
Fanhong Zeng, Huanan LI, Juntao Guan et al.
ReCoT: Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models
Mengxue Qu, Yibo Hu, Kunyang Han et al.
CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Olaf Dünkel, Artur Jesslen, Jiahao Xie et al.
Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi, Xiangbo Yin, yeyunchen yeyunchen et al.
MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
Xinhang Liu, Jiawei Shi, Zheng Dang et al.
Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning
Yuyang Deng, Samory Kpotufe
ESCNet:Edge-Semantic Collaborative Network for Camouflaged Object Detection
Sheng Ye, Xin Chen, Yan Zhang et al.
Quasi-Self-Concordant Optimization with $\ell_{\infty}$ Lewis Weights
Alina Ene, Ta Duy Nguyen, Adrian Vladu
ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation
Cihang Peng, Qiming HOU, Zhong Ren et al.
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
Constantin Venhoff, Ashkan Khakzar, Sonia Joseph et al.
Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation
Shuo Jin, Siyue Yu, Bingfeng Zhang et al.
DiffPS: Leveraging Prior Knowledge of Diffusion Model for Person Search
Giyeol Kim, Sooyoung Yang, Jihyong Oh et al.
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu, Jingwen Fu, Yang Wu et al.
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko et al.
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
Yefei He, Feng Chen, Jing Liu et al.
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
Xiaoqi Wang, Clint Sebastian, Wenbin He et al.
LaCoOT: Layer Collapse through Optimal Transport
Victor Quétu, Zhu LIAO, Nour Hezbri et al.
Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong et al.
Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models
Jieun Kim, Jinmyeong Kim, Yoonji Kim et al.
Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement
Xin Shen, Xinyu Wang, Lei Shen et al.
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.
RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement
Hao LU, Yuting Zhang, Jiaqi Tang et al.
On the Recovery of Cameras from Fundamental Matrices
Rakshith Madhavan, Federica Arrigoni
RnGCam: High-speed video from rolling & global shutter measurements
Kevin Tandi, Xiang Dai, Chinmay Talegaonkar et al.
The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning
Xinyang Zhou, Fanyue Wei, Lixin Duan et al.