Most Cited 2025 "image editing manipulation" Papers
22,274 papers found • Page 22 of 112
Conference
SteerConf: Steering LLMs for Confidence Elicitation
Ziang Zhou, Tianyuan Jin, Jieming Shi et al.
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Yaniv Nikankin, Dana Arad, Yossi Gandelsman et al.
From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
Zhiwei Huang, Hailin Yu, Yichun Shentu et al.
No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization
Martino Bernasconi, Matteo Castiglioni, Andrea Celli
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang, Changle Zhou, Jiawei Kong et al.
LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale
Miran Özdogan, Gilad Landau, Gereon Elvers et al.
Auto-Regressive Diffusion for Generating 3D Human-Object Interactions
Zichen Geng, Zeeshan Hayder, Wei Liu et al.
On scalable and efficient training of diffusion samplers
Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.
BrainACTIV: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation
Diego García Cerdas, Christina Sartzetaki, Magnus Petersen et al.
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
MOS: Modeling Object-Scene Associations in Generalized Category Discovery
Zhengyuan Peng, Jinpeng Ma, Zhimin Sun et al.
EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization
Mujin Cheon, Jay Lee, Dong-Yeun Koh et al.
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Brian Zheng, Alisa Liu, Orevaoghene Ahia et al.
Volume Optimality in Conformal Prediction with Structured Prediction Sets
Chao Gao, Liren Shan, Vaidehi Srinivas et al.
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
Yiming Liu, Kezhao Liu, Yao Xiao et al.
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers
Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag
Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution
Cuong Nguyen, Thanh-Toan Do, Gustavo Carneiro
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Rui Dai, Sile Hu, Xu Shen et al.
DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts
Zheng-Peng Duan, Jiawei Zhang, Zheng Lin et al.
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
Xingzhuo Guo, Yu Zhang, Baixu Chen et al.
Split Gibbs Discrete Diffusion Posterior Sampling
Wenda Chu, Zihui Wu, Yifan Chen et al.
Toward Efficient Kernel-Based Solvers for Nonlinear PDEs
Zhitong Xu, Da Long, Yiming Xu et al.
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Shaopeng Fu, Liang Ding, Jingfeng ZHANG et al.
Uncertain Multimodal Intention and Emotion Understanding in the Wild
Qu Yang, QingHongYa Shi, Tongxin Wang et al.
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo, Jie Hu, Yansong Qu et al.
ELICIT: LLM Augmentation Via External In-context Capability
Futing Wang, Jianhao (Elliott) Yan, Yue Zhang et al.
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
Yucheng Shi, Quanzheng Li, Jin Sun et al.
Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers
Hang Zhou, Yuezhou Ma, Haixu Wu et al.
QT-DoG: Quantization-Aware Training for Domain Generalization
Saqib Javed, Hieu Le, Mathieu Salzmann
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Siran Chen, Yuxiao Luo, Yue Ma et al.
Understanding and Mitigating Memorization in Diffusion Models for Tabular Data
Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen et al.
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
Darshana Saravanan, Varun Gupta, Darshan Singh S et al.
Active Task Disambiguation with LLMs
Katarzyna Kobalczyk, Nicolás Astorga, Tennison Liu et al.
Video Perception Models for 3D Scene Synthesis
Rui Huang, Guangyao Zhai, Zuria Bauer et al.
Mask in the Mirror: Implicit Sparsification
Tom Jacobs, Rebekka Burkholz
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models
Benjamin Walker, Lingyi Yang, Nicola Muca Cirone et al.
Active Fine-Tuning of Multi-Task Policies
Marco Bagatella, Jonas Hübotter, Georg Martius et al.
Estimating Model Performance Under Covariate Shift Without Labels
Jakub Białek, Juhani Kivimäki, Wojciech Kuberski et al.
Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?
Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen et al.
WaterDiffusion: Learning a Prior-involved Unrolling Diffusion for Joint Underwater Saliency Detection and Visual Restoration
Laibin Chang, Yunke Wang, Longxiang Deng et al.
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning
Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan et al.
ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation
Hamed Ayoobi, Nico Potyka, Francesca Toni
Scaffolding Dexterous Manipulation with Vision-Language Models
Vincent de Bakker, Joey Hejna, Tyler Lum et al.
Bayesian Experimental Design Via Contrastive Diffusions
Jacopo Iollo, Christophe Heinkelé, Pierre Alliez et al.
Improving Gaussian Splatting with Localized Points Management
Haosen Yang, Chenhao Zhang, Wenqing Wang et al.
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
Jiankang Chen, Tianke Zhang, Changyi Liu et al.
Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction
Luyao Tang, Kunze Huang, Yuxuan Yuan et al.
PROXSPARSE: REGULARIZED LEARNING OF SEMI-STRUCTURED SPARSITY MASKS FOR PRETRAINED LLMS
Hongyi Liu, Rajarshi Saha, Zhen Jia et al.
Generating Multimodal Driving Scenes via Next-Scene Prediction
Yanhao Wu, Haoyang Zhang, Tianwei Lin et al.
Exploit Your Latents: Coarse-Grained Protein Backmapping with Latent Diffusion Models
Rongchao Zhang, Yu Huang, Yiwei Lou et al.
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
Lise Le Boudec, Emmanuel de Bézenac, Louis Serrano et al.
Attention Mechanism, Max-Affine Partition, and Universal Approximation
Hude Liu, Jerry Yao-Chieh Hu, Zhao Song et al.
Audio Super-Resolution with Latent Bridge Models
Chang Li, Zehua Chen, Liyuan Wang et al.
Revisiting a Design Choice in Gradient Temporal Difference Learning
Xiaochi Qian, Shangtong Zhang
DEALing with Image Reconstruction: Deep Attentive Least Squares
Mehrsa Pourya, Erich Kobler, Michael Unser et al.
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
Jiawen Qin, Haonan Yuan, Qingyun Sun et al.
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li, Luyuan Zhang, Zedong Wang et al.
Understanding the Limits of Deep Tabular Methods with Temporal Shift
Haorun Cai, Han-Jia Ye
Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models
Fusheng Liu, Qianxiao Li
MindSimulator: Exploring Brain Concept Localization via Synthetic fMRI
Qi Zhang, Qi Zhang, Zixuan Gong et al.
PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection
Xiaoran Xu, Jiangang Yang, Wenhui Shi et al.
Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation
hanzhuo tan, Xiaolong Tian, Hanrui Qi et al.
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Quan Zhang, Yuxin Qi, Xi Tang et al.
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
Ge Zheng, Jiaye Qian, Jiajin Tang et al.
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Roger Creus Castanyer, Johan Obando Ceron, Lu Li et al.
Understanding Adam Requires Better Rotation Dependent Assumptions
Tianyue Zhang, Lucas Maes, Alan Milligan et al.
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li, Yanqing Liu, Haoqin Tu et al.
SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints
Ziqi Sheng, Wei Lu, Xiangyang Luo et al.
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Hyunwoo Lee, Hayoung Choi, Hyunju Kim
DualCP: Rehearsal-Free Domain-Incremental Learning via Dual-Level Concept Prototype
Qiang Wang, Yuhang He, Songlin Dong et al.
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models
Yu Zhou, Xingyu Wu, Jibin Wu et al.
CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting
Haoxin Wang, Yipeng Mo, Kunlan Xiang et al.
SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization
Xiaofeng Tan, Hongsong Wang, Xin Geng et al.
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang, Quan Cui, Bingchen Zhao et al.
Flowing Datasets with Wasserstein over Wasserstein Gradient Flows
Clément Bonet, Christophe Vauthier, Anna Korba
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations
Pengcheng Jiang, Cao Xiao, Tianfan Fu et al.
MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Jingjing Hu, Dan Guo, Zhan Si et al.
Prediction-Powered Causal Inferences
Riccardo Cadei, Ilker Demirel, Piersilvio De Bartolomeis et al.
Motion Modes: What Could Happen Next?
Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha et al.
Logits DeConfusion with CLIP for Few-Shot Learning
Shuo Li, Fang Liu, Zehua Hao et al.
Runtime Analysis for Multi-Objective Evolutionary Algorithms in Unbounded Integer Spaces
Benjamin Doerr, Martin S. Krejca, Günter Rudolph
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Yuqian Yuan, Ronghao Dang, long li et al.
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
Tongda Xu, Jiahao Li, Bin Li et al.
Hypergraph Attacks via Injecting Homogeneous Nodes into Elite Hyperedges
Meixia He, Peican Zhu, Keke Tang et al.
Locally Convex Global Loss Network for Decision-Focused Learning
Haeun Jeon, Hyunglip Bae, Minsu Park et al.
Provable Scaling Laws for the Test-Time Compute of Large Language Models
Yanxi Chen, Xuchen Pan, Yaliang Li et al.
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.
ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks
Santiago Cadena, Andrea Merlo, Emanuel Laude et al.
Golden Cudgel Network for Real-Time Semantic Segmentation
Guoyu Yang, Yuan Wang, Daming Shi et al.
Large Language Models Think Too Fast To Explore Effectively
Lan Pan, Hanbo Xie, Robert Wilson
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
Yehonathan Refael, Guy Smorodinsky, Tom Tirer et al.
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
Georgy Ponimatkin, Martin Cífka, Tomas Soucek et al.
Momentum Multi-Marginal Schrödinger Bridge Matching
Panagiotis Theodoropoulos, Augustinos Saravanos, Evangelos Theodorou et al.
Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
Xiao Li, Zekai Zhang, Xiang Li et al.
End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler
Denis Blessing, Xiaogang Jia, Gerhard Neumann
GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
Celia Rubio-Madrigal, Adarsh Jamadandi, Rebekka Burkholz
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang, Dading Chong, Feng Jiang et al.
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang, Yu Xia, Hai-Long Sun et al.
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.
Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data
Zhenqing Ling, Daoyuan Chen, Liuyi Yao et al.
Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
Runchuan Zhu, Zhipeng Ma, Jiang Wu et al.
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting
Jingyu Lin, Jiaqi Gu, Lubin Fan et al.
Cached Multi-Lora Composition for Multi-Concept Image Generation
Xiandong Zou, Mingzhu Shen, Christos-Savvas Bouganis et al.
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
Jiaxing Xu, Yongqiang Chen, Xia Dong et al.
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
Ziqi Jiang, Zhen Wang, Long Chen
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou, Tammy Riklin Raviv
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas, Edward Fish, Richard Bowden
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
Jun Zhang, Jue Wang, Huan Li et al.
Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection
Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks
Steffen Schotthöfer, Lexie Yang, Stefan Schnake
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis
Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.
Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
Andrés Guzmán-Cordero, Felix Dangel, Gil Goldshlager et al.
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Aymane El Firdoussi, Mohamed El Amine Seddik, Soufiane Hayou et al.
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
Yuheng Jiang, Zhehao Shen, Chengcheng Guo et al.
Expressivity of Neural Networks with Random Weights and Learned Biases
Ezekiel Williams, Alexandre Payeur, Avery Ryoo et al.
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal, Yushi Hu, Oscar Michel et al.
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Xiaoqian Shen, Wenxuan Zhang, Jun Chen et al.
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
Zhihao Shi, Dong Huo, Yuhongze Zhou et al.
Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval
Guangyuan Ma, Yongliang Ma, Xing Wu et al.
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
Hugo Thimonier, José Lucas De Melo Costa, Fabrice Popineau et al.
Scene-Centric Unsupervised Panoptic Segmentation
Oliver Hahn, Christoph Reich, Nikita Araslanov et al.
Boosting Short Text Classification with Multi-Source Information Exploration and Dual-Level Contrastive Learning
Yonghao Liu, Mengyu Li, Wei Pang et al.
Truth over Tricks: Measuring and Mitigating Shortcut Learning in Misinformation Detection
Herun Wan, Jiaying Wu, Minnan Luo et al.
UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping
Yanjie Li, Kaisheng Liang, Bin Xiao
Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse
Seung Hyun Cheon, Anneke Wernerfelt, Sorelle Friedler et al.
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Hung Le, Dung Nguyen, Kien Do et al.
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment
Yang Bai, Yucheng Ji, Min Cao et al.
SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
Hongda Liu, Longguang Wang, Ye Zhang et al.
Seeking and Updating with Live Visual Knowledge
Mingyang Fu, Yuyang Peng, Dongping Chen et al.
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors
Changlong Shi, He Zhao, Bingjie Zhang et al.
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz, Kate Sanders, David Etter et al.
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Liliang Ren, Congcong Chen, Haoran Xu et al.
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
Xinyue Fang, Zhen Huang, Zhiliang Tian et al.
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Yankai Jiang, Wenhui Lei, Xiaofan Zhang et al.
Advantage Alignment Algorithms
Juan Duque, Milad Aghajohari, Timotheus Cooijmans et al.
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
S P Sharan, Minkyu Choi, Sahil Shah et al.
MIRE: Matched Implicit Neural Representations
Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping
Jinfeng Liu, Lingtong Kong, Bo Li et al.
KAC: Kolmogorov-Arnold Classifier for Continual Learning
Yusong Hu, Zichen Liang, Fei Yang et al.
3D-MVP: 3D Multiview Pretraining for Manipulation
Shengyi Qian, Kaichun Mo, Valts Blukis et al.
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models
Jianqun Zhou, Yuanlei Zheng, Wei Chen et al.
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
Hsun-Yu Kuo, Yin-Hsiang Liao, Yu-Chieh Chao et al.
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
Yuejiang Liu, Jubayer Hamid, Annie Xie et al.
Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection
Kedi Chen, Qin Chen, Jie Zhou et al.
Keyframe-Guided Creative Video Inpainting
Yuwei Guo, Ceyuan Yang, Anyi Rao et al.
Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems
Zhuohui Zhang, Bin He, Bin Cheng et al.
SparseAlign: a Fully Sparse Framework for Cooperative Object Detection
Yunshuang Yuan, Yan Xia, Daniel Cremers et al.
ReDit: Reward Dithering for Improved LLM Policy Optimization
Chenxing Wei, Jiarui Yu, Ying He et al.
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets
Benjamin Dupuis, Paul Viallard, George Deligiannidis et al.
Multi-Agent Motion Planning for Differential Drive Robots Through Stationary State Search
Jingtian Yan, Jiaoyang Li
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
Weihang Li, Hongli XU, Junwen Huang et al.
Towards In-the-wild 3D Plane Reconstruction from a Single Image
Jiachen Liu, Rui Yu, Sili Chen et al.
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
Xiaolei Wang, Xinyu Tang, Junyi Li et al.
REVECA: Adaptive Planning and Trajectory-Based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity
SeungWon Seo, SeongRae Noh, Junhyeok Lee et al.
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models
Jintao Tong, Wenwei Jin, Pengda Qin et al.
Stealthy Shield Defense: A Conditional Mutual Information-Based Approach against Black-Box Model Inversion Attacks
Tianqu Zhuang, Hongyao Yu, Yixiang Qiu et al.
Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning
Yu Zhang, Jialei Zhou, Xinchen Li et al.
Token Perturbation Guidance for Diffusion Models
Javad Rajabi, Soroush Mehraban, Seyedmorteza Sadat et al.
ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression
Kai Yao, Zhaorui Tan, Tiandi Ye et al.
Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment
Haoyuan Wu, Haisheng Zheng, Yuan Pu et al.
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.
Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection
wenbing zhu, Lidong Wang, Ziqing Zhou et al.
TCFG: Tangential Damping Classifier-free Guidance
Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.
MLC-NC: Long-Tailed Multi-Label Image Classification Through the Lens of Neural Collapse
Zijian Tao, Shao-Yuan Li, Wenhai Wan et al.
Towards Realistic Example-based Modeling via 3D Gaussian Stitching
Xinyu Gao, Ziyi Yang, Bingchen Gong et al.
Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA
Shuangyi Chen, Yuanxin Guo, Yue Ju et al.
VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
Zhifeng Wang, Renjiao Yi, Xin Wen et al.
CAMEx: Curvature-aware Merging of Experts
Dung Viet Nguyen, Minh Nguyen, Luc Nguyen et al.
Tight Clusters Make Specialized Experts
Stefan Nielsen, Rachel Teo, Laziz Abdullaev et al.
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
Quan Zhang, Jinwei Fang, Rui Yuan et al.
Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang, Jie Lu, En Yu
Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting
Wei Chen, Yuxuan Liang
IDInit: A Universal and Stable Initialization Method for Neural Network Training
Yu Pan, Chaozheng Wang, Zekai Wu et al.
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Anh Tuan Luu, Yue Liu et al.
DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
Maregu Assefa, Muzammal Naseer, IYYAKUTTI IYAPPAN GANAPATHI et al.
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
Bin Cao, Yang Liu, Zinan Zheng et al.
Rethinking Spiking Self-Attention Mechanism: Implementing α-XNOR Similarity Calculation in Spiking Transformers
Yichen Xiao, Shuai Wang, Dehao Zhang et al.
Bridging the Gap between Database Search and \emph{De Novo} Peptide Sequencing with SearchNovo
Jun Xia, Sizhe Liu, Jingbo Zhou et al.
Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception
Zihan Ding, Jiahui Fu, Si Liu et al.
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Leying Zhang, Yao Qian, Xiaofei Wang et al.
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
Abduljalil Radman, Jorma Laaksonen
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Junhyeong Cho, Kim Youwang, Hunmin Yang et al.
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu, Lei Fan, Maurice Pagnucco et al.
Constrained Generative Modeling with Manually Bridged Diffusion Models
Saeid Naderiparizi, Xiaoxuan Liang, Berend Zwartsenberg et al.
It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Dominik Schnaus, Nikita Araslanov, Daniel Cremers
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Yuheng Yuan, Qiuhong Shen, Xingyi Yang et al.
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu, Qizhi Chen, Zhigang Wang et al.