Most Cited 2025 "hardware robotic control" Papers
22,274 papers found • Page 83 of 112
Conference
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
Xiaohui Li, Yihao Liu, Shuo Cao et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
ESCNet:Edge-Semantic Collaborative Network for Camouflaged Object Detection
Sheng Ye, Xin Chen, Yan Zhang et al.
Power of Cooperative Supervision: Multiple Teachers Framework for Advanced 3D Semi-Supervised Object Detection
Jin-Hee Lee, Jae-keun Lee, Jeseok Kim et al.
Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining
Qi Fan, Kaiqi Liu, Nian Liu et al.
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang, Kun-Yu Lin, Chaolei Tan et al.
ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching
Yuxuan Yuan, Luyao Tang, Chaoqi Chen et al.
DADet: Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection
Hongwei Yu, Xinlong Ding, Jiawei Li et al.
Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi, Xiangbo Yin, yeyunchen yeyunchen et al.
CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Olaf Dünkel, Artur Jesslen, Jiahao Xie et al.
LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation
Yifei Zhang, Lei Chen
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
Yuntao Shou, Xiangyong Cao, PeiqiangYan PeiqiangYan et al.
COVTrack: Continuous Open-Vocabulary Tracking via Adaptive Multi-Cue Fusion
Zekun Qian, Ruize Han, Zhixiang Wang et al.
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang et al.
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
Ren-Jie Lu, Yu Zhou, hao cheng et al.
An Efficient Hybrid Vision Transformer for TinyML Applications
Fanhong Zeng, Huanan LI, Juntao Guan et al.
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin et al.
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.
The Burden of Interactive Alignment with Inconsistent Preferences
Ali Shirali
Performing Defocus Deblurring by Modeling its Formation Process
Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani et al.
CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance
Peiqi Chen, Lei Yu, Yi Wan et al.
Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian, Yanhao Chen, Wangyancheng Wangyancheng et al.
Collective Counterfactual Explanations: Balancing Individual Goals and Collective Dynamics
Ahmad-Reza Ehyaei, Ali Shirali, Samira Samadi
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao, Yu Zhong, Ziqi Wang et al.
Blind Video Super-Resolution based on Implicit Kernels
Qiang Zhu, Yuxuan Jiang, Shuyuan Zhu et al.
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Yuan Liu, Saihui Hou, Saijie Hou et al.
Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts
Chiao-An Yang, Kuan-Chuan Peng, Raymond A. Yeh
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection
Jeffri Erwin Murrugarra Llerena, José Henrique Marques, Claudio Jung
More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
Luong Tran, Thieu Vo, Anh Nguyen et al.
Self supervised learning for in vivo localization of microelectrode arrays using raw local field potential
Tianxiao He, Malhar Patel, Chenyi Li et al.
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.
HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos
Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta
DCHM: Depth-Consistent Human Modeling for Multiview Detection
Jiahao Ma, Tianyu Wang, Miaomiao Liu et al.
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision
Ömer Veysel Çağatan, Ömer TAL, M. Emre Gursoy
HPSv3: Towards Wide-Spectrum Human Preference Score
Yuhang Ma, Keqiang Sun, Xiaoshi Wu et al.
GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Tao Liu, Chongyu Wang, Rongjie Li et al.
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
Liang Qin, Min Wang, Peiwei Li et al.
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation
You Huang, Lichao Chen, Jiayi Ji et al.
UNIS: A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering
Junkai Deng, Hanting Niu, Jiaze Li et al.
On the Provable Importance of Gradients for Autonomous Language-Assisted Image Clustering
Bo Peng, Jie Lu, Guangquan Zhang et al.
MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding
Gao Zong lin, Huu-Tai Phung, Yi-Chen Yao et al.
IntrinsicControlNet: Cross-distribution Image Generation with Real and Unreal
Jiayuan Lu, Rengan Xie, Zixuan Xie et al.
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
Junyu Chen, Dongyun Zou, Wenkun He et al.
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma et al.
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
Loss Functions for Predictor-based Neural Architecture Search
Han Ji, Yuqi Feng, Jiahao Fan et al.
Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation
Yu Lei, Bingde Liu, Qingsong Xie et al.
Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning
Xianghua Zeng, Hao Peng, Yicheng Pan et al.
Steering Guidance for Personalized Text-to-Image Diffusion Models
Sunghyun Park, Seokeon Choi, Hyoungwoo Park et al.
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
Zifu Wan, Ce Zhang, Silong Yong et al.
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis
Neeraj Kumar, Chad Vanderbilt
Precise Diffusion Inversion: Towards Novel Samples and Few-Step Models
Jing Zuo, Luoping Cui, Chuang Zhu et al.
Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
Pei He, Lingling Li, Licheng Jiao et al.
All Parts Matter: A Unified Mask-Free Virtual Try-On Framework
Chenghu Du, Shengwu Xiong, Yi Rong
GaussianReg: Rapid 2D/3D Registration for Emergency Surgery via Explicit 3D Modeling with Gaussian Primitives
Weihao Yu, Xiaoqing Guo, Xinyu Liu et al.
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Marwa Abdulhai, Ryan Cheng, Donovan Clay et al.
ArgoTweak: Towards Self-Updating HD Maps through Structured Priors
Lena Wild, Rafael Valencia, Patric Jensfelt
Event-aided Dense and Continuous Point Tracking: Everywhere and Anytime
Zhexiong Wan, Jianqin Luo, Yuchao Dai et al.
Context-Aware Academic Emotion Dataset and Benchmark
Luming Zhao, Jingwen Xuan, Jiamin Lou et al.
FlowSeek: Optical Flow Made Easier with Depth Foundation Models and Motion Bases
Matteo Poggi, Fabio Tosi
TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-view Imaging
QingleiCao QingleiCao, Ziyao Tang, Xiaoqin Tang
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.
JPEG Processing Neural Operator for Backward-Compatible Coding
Woo Kyoung Han, Yongjun Lee, Byeonghun Lee et al.
Efficient Visual Place Recognition Through Multimodal Semantic Knowledge Integration
Sitao Zhang, Hongda Mao, Qingshuang Chen et al.
COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets
Lingyu Chen, Yawen Zeng, Yue Wang et al.
NATRA: Noise-Agnostic Framework for Trajectory Prediction with Noisy Observations
Rongqing Li, Changsheng Li, Ruilin Lv et al.
MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling
Guan Luo, Jianfeng Zhang
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.
UniDxMD: Towards Unified Representation for Cross-Modal Unsupervised Domain Adaptation in 3D Semantic Segmentation
Zhengyin Liang, Hui Yin, Min Liang et al.
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu, Junwen Chen, Zhanhao Liang et al.
PLAN: Proactive Low-Rank Allocation for Continual Learning
XIEQUN WANG, Zhan Zhuang, Yu Zhang
Leveraging Spatial Invariance to Boost Adversarial Transferability
Zihan Zhou, LI LI, Yanli Ren et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Hoefler, Karsten Mueller, Wojciech Samek
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel et al.
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao, Taein Kwon, Paul Streli et al.
Generative Video Bi-flow
Chen Liu, Tobias Ritschel
FHGS: Feature-Homogenized Gaussian Splatting
qigeng duan, Benyun ZHAO, Mingqiao Han et al.
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
Transformer-based Tooth Alignment Prediction with Occlusion and Collision Constraints
DongZhenXing DongZhenXing, Jiazhou Chen
Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification
Wenkui Yang, Jie Cao, Junxian Duan et al.
Memory-Integrated Reconfigurable Adapters: A Unified Framework for Settings with Multiple Tasks
Susmit Agrawal, Krishn Vishwas Kher, Saksham Mittal et al.
ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement
KA WONG, Jicheng Zhou, Haiwei Wu et al.
SD2Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation
lijiayi jiayi
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde, Zihao Lu, Radu Timofte
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou, Zongsheng Yue, Xiaoming Li et al.
Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI
Haodong Jing, Dongyao Jiang, Yongqiang Ma et al.
Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
Guibao SHEN, Luozhou Wang, Jiantao Lin et al.
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
Hongchi Ma, Guanglei Yang, Debin Zhao et al.
GMMamba: Group Masking Mamba for Whole Slide Image Classification
Tingting Zheng, Hongxun Yao, Kui Jiang et al.
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction
Dadong Jiang, Zhi Hou, Zhihui Ke et al.
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu et al.
RareCLIP: Rarity-aware Online Zero-shot Industrial Anomaly Detection
Jianfang He, Min Cao, Silong Peng et al.
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue, Jinouwen Zhang, Yazhe Niu et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
Temporal Rate Reduction Clustering for Human Motion Segmentation
Xianghan Meng, Zhengyu Tong, Zhiyuan Huang et al.
Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction
Hongyang Sun, Qinglin Yang, Jiawei Wang et al.
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
Tiancheng SHEN, Jun Hao Liew, Zilong Huang et al.
Backdoor Mitigation by Distance-Driven Detoxification
Shaokui Wei, Jiayin Liu, Hongyuan Zha
Democratizing High-Fidelity Co-Speech Gesture Video Generation
Xu Yang, Shaoli Huang, Shenbo Xie et al.
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
Fangwei Zhong, Kui Wu, Churan Wang et al.
HFD-Teacher: High-Frequency Depth Distillation from Depth Foundation Models for Enhanced Depth Completion
Zhiyuan Yang, Anqi Cheng, Haiyue Zhu et al.
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo, Yawei Li, Taolin Zhang et al.
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan, jiawei zhang, Xin Jin et al.
Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring
Yufei Zhu, Hao Chen, Yongjian Deng et al.
Teleportraits: Training-Free People Insertion into Any Scene
Jialu Gao, Joseph K J, Fernando De la Torre
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni, Meet Soni, Sirisha Rambhatla
Diversity-Enhanced Distribution Alignment for Dataset Distillation
Hongcheng Li, Yucan Zhou, Xiaoyan Gu et al.
Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection
Hanshi Wang, Jin Gao, Weiming Hu et al.
SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking
Sixian Chan, Zedong Li, Xiaoqin Zhang et al.
Two Losses, One Goal: Balancing Conflict Gradients for Semi-supervised Semantic Segmentation
Rui Sun, Huayu Mai, Wangkai Li et al.
Region-based Cluster Discrimination for Visual Representation Learning
Yin Xie, Kaicheng Yang, Xiang An et al.
CMB-ML: A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task
James Amato, Yunan Xie, Leonel Medina-Varela et al.
Adapt Foundational Segmentation Models with Heterogeneous Searching Space
Li Yi, Jie Hu, Songan Zhang et al.
Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification
Shenyu Lu, Zhaoying Pan, Xiaoqian Wang
Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures
Nina Vesseron, Louis Bethune, Marco Cuturi
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Xinli Xu, Wenhang Ge, Jiantao Lin et al.
Dropout Regularization Versus l2-Penalization in the Linear Model
Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber
Shape of Motion: 4D Reconstruction from a Single Video
Qianqian Wang, Vickie Ye, Hang Gao et al.
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
shaojin wu, Mengqi Huang, wenxu wu et al.
EditCLIP: Representation Learning for Image Editing
Qian Wang, Aleksandar Cvejic, Abdelrahman Eldesokey et al.
Counting Stacked Objects
Corentin Dumery, Noa Ette, Aoxiang Fan et al.
Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP
Trevor Canham, SaiKiran Tedla, Michael Murdoch et al.
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal
Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization
Weiying Xie, Zihan Meng, Jitao Ma et al.
MOVE: Motion-Guided Few-Shot Video Object Segmentation
Kaining Ying, Hengrui Hu, Henghui Ding
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang, Fagui Liu, Quan Tang
mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework
Bingyi Liu, Jian Teng, Hongfei Xue et al.
FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers
Junjie Zhang, Haisheng Su, Feixiang Song et al.
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
Pinxin Liu, Luchuan Song, Junhua Huang et al.
SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer
Yujie Xue, Huilong Pi, Jiapeng Zhang et al.
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
Dmitrii M Petrov, Pradyumn Goyal, Divyansh Shivashok et al.
TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation
Jiale Zhou, Wenhan Wang, Shikun Li et al.
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
Teng Li, Guangcong Zheng, Rui Jiang et al.
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances
Yunzhe Shao, Xinyu Yi, Lu Yin et al.
Semantic Discrepancy-aware Detector for Image Forgery Identification
Wang Ziye, Minghang Yu, Chunyan Xu et al.
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang, Cong Han, Yafei Li et al.
DeFSS: Image-to-Mask Denoising Learning for Few-shot Segmentation
Zishu Qin, Junhao Xu, Weifeng Ge
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu, Ruonan Yu, Xinchao Wang
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
Zerui Gong, Zhonghua Wu, Qingyi Tao et al.
TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset
Chang Liu, mingxuzhu mingxuzhu, Zheyuan Zhang et al.
Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning
Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.
Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer
YuanFu Yang, Hsiu-Hui Hsiao
Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
shangwen zhu, Han Zhang, Zhantao Yang et al.
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
hahyeon choi, Junhoo Lee, Nojun Kwak
Task-Specific Gradient Adaptation for Few-Shot One-Class Classification
Yunlong Li, Xiabi Liu, Liyuan Pan et al.
EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding
Xuan-Hao Liu, Bao-liang Lu, Wei-Long Zheng
VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders
Qi Wang, Zeyu Zhang, Dong Wang et al.
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.
MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency
Xingbo YAO, xuanmin Wang, Hao WU et al.
RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning
Chengyu Zheng, Honghua Chen, Jin Huang et al.
Multi-scenario Overlapping Text Segmentation with Depth Awareness
Yang Liu, Xudong Xie, Yuliang Liu et al.
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow, Evelien Riddell, Yimu Wang et al.
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju, Weicai Ye, Quande Liu et al.
SC-Lane: Slope-aware and Consistent Road Height Estimation Framework for 3D Lane Detection
Chaesong Park, Eunbin Seo, JihyeonHwang JihyeonHwang et al.
Exploring the Visual Feature Space for Multimodal Neural Decoding
Weihao Xia, Cengiz Oztireli
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu, Linxuan Li, Kai Wang et al.
ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement
Habin Lim, Youngseob Won, Juwon Seo et al.
ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources
Jason Wu, Yuyang Yuan, Kang Yang et al.
Backdoor Defense via Enhanced Splitting and Trap Isolation
Hongrui Yu, Lu Qi, Wanyu Lin et al.
Learning Hierarchical Line Buffer for Image Processing
Jiacheng Li, Feiran Li, Daisuke Iso
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
Soonwoo Cha, Jiwoo Song, Juan Yeo et al.
Preserve Anything: Controllable Image Synthesis with Object Preservation
Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava et al.
Information Retrieval Induced Safety Degradation in AI Agents
Cheng Yu, Benedikt Stroebl, Diyi Yang et al.
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang, Kerui Gu, Ha Linh Nguyen et al.
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
Chende Zheng, Ruiqi suo, Chenhao Lin et al.
Understanding the Statistical Accuracy-Communication Trade-off in Personalized Federated Learning with Minimax Guarantees
Xin Yu, Zelin He, Ying Sun et al.
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Pierre Ablin, Angelos Katharopoulos, Skyler Seto et al.
Core Context Aware Transformers for Long Context Language Modeling
Yaofo Chen, Zeng You, Shuhai Zhang et al.
Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning
Fangwen Wu, Lechao Cheng, Shengeng Tang et al.
Does Data Scaling Lead to Visual Compositional Generalization?
Arnas Uselis, Andrea Dittadi, Seong Joon Oh
An Asymptotically Optimal Approximation Algorithm for Multiobjective Submodular Maximization at Scale
Fabian Spaeh, Atsushi Miyauchi
Learning curves theory for hierarchically compositional data with power-law distributed features
Francesco Cagnetta, Hyunmo Kang, Matthieu Wyart
Neurosymbolic World Models for Sequential Decision Making
Leonardo Hernandez Cano, Maxine Perroni-Scharf, Neil Dhir et al.
Self-Disentanglement and Re-Composition for Cross-Domain Few-Shot Segmentation
Jintao Tong, Yixiong Zou, Guangyao Chen et al.
Model-Based Exploration in Monitored Markov Decision Processes
Alireza Kazemipour, Matthew Taylor, Michael Bowling
Deep Bayesian Filter for Bayes-Faithful Data Assimilation
Yuta Tarumi, Keisuke Fukuda, Shin-ichi Maeda
KAN-AD: Time Series Anomaly Detection with Kolmogorov–Arnold Networks
Quan Zhou, Changhua Pei, Fei Sun et al.
Oscillation-Reduced MXFP4 Training for Vision Transformers
Yuxiang Chen, Haocheng Xi, Jun Zhu et al.
Policy Filtration for RLHF to Mitigate Noise in Reward Models
Chuheng Zhang, Wei Shen, Li Zhao et al.
On the Importance of Embedding Norms in Self-Supervised Learning
Andrew Draganov, Sharvaree Vadgama, Sebastian Damrich et al.
Robust Multi-bit Text Watermark with LLM-based Paraphrasers
Xiaojun Xu, jinghan jia, Yuanshun Yao et al.
Should Decision-Makers Reveal Classifiers in Online Strategic Classification?
Han Shao, Shuo Xie, Kunhe Yang
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
Changsheng Wang, Yihua Zhang, jinghan jia et al.
Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective
Joonkyu Kim, Yejin Kim, Jy-yong Sohn
Online Conformal Prediction via Online Optimization
Felipe Areces, Christopher Mohri, Tatsunori Hashimoto et al.
Integer Programming for Generalized Causal Bootstrap Designs
Jennifer Brennan, Sébastien Lahaie, Adel Javanmard et al.
Retraining with Predicted Hard Labels Provably Increases Model Accuracy
Rudrajit Das, Inderjit Dhillon, Alessandro Epasto et al.
Self-Supervised Learning of Intertwined Content and Positional Features for Object Detection
Kang-Jun Liu, Masanori Suganuma, Takayuki Okatani