Most Cited 2025 "open-ended benchmarks" Papers
22,274 papers found • Page 21 of 112
Conference
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu et al.
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Feiran Li, Haiyang Jiang, Daisuke Iso
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Wen-Sheng Chu et al.
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Joey Wilson, Marcelino M. de Almeida, Sachit Mahajan et al.
Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization
Chenbei Lu, Laixi Shi, Zaiwei Chen et al.
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei, Feng Li, Shengeng Tang et al.
COLUMBUS: Evaluating COgnitive Lateral Understanding Through Multiple-Choice reBUSes
Koen Kraaijveld, Yifan Jiang, Kaixin Ma et al.
PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation
Pablo Lemos, Sammy Sharief, Nikolay Malkin et al.
Ultra-Resolution Adaptation with Ease
Ruonan Yu, Songhua Liu, Zhenxiong Tan et al.
Generating Freeform Endoskeletal Robots
Muhan Li, Lingji Kong, Sam Kriegman
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Zhou, Jonathan Chang et al.
NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
Cosmin Bercea, Jun Li, Philipp Raffler et al.
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang, BIN CHEN, Yulin Li et al.
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
Rui Li, Zeyu Zhang, Xiaohe Bo et al.
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
Gideon Stein, Maha Shadaydeh, Jan Blunk et al.
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
Li Ren, Chen Chen, Liqiang Wang et al.
Doubly Contrastive Learning for Source-Free Domain Adaptive Person Search
Yizhen Jia, Rong Quan, Yue Feng et al.
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Zichen Liu, Yihao Meng, Hao Ouyang et al.
Robustness Auditing for Linear Regression: To Singularity and Beyond
Ittai Rubinstein, Samuel Hopkins
3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
yating wang, Xuan Wang, Ran Yi et al.
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung et al.
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Pierre-David Letourneau, Manish Singh, Hsin-Pai Cheng et al.
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai et al.
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Michal Nauman, Marek Cygan, Carmelo Sferrazza et al.
Extrapolated Urban View Synthesis Benchmark
Xiangyu Han, Zhen Jia, Boyi Li et al.
NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks in Open Domains
Wonje Choi, Jinwoo Park, Sanghyun Ahn et al.
GauSTAR: Gaussian Surface Tracking and Reconstruction
Chengwei Zheng, Lixin Xue, Juan Jose Zarate et al.
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Chengyue Huang, Junjiao Tian, Brisa Maneechotesuwan et al.
Multi-Perspective Data Augmentation for Few-shot Object Detection
Anh-Khoa Nguyen Vu, Quoc Truong Truong, Vinh-Tiep Nguyen et al.
SEMU: Singular Value Decomposition for Efficient Machine Unlearning
Marcin Sendera, Łukasz Struski, Kamil Książek et al.
CTSyn: A Foundation Model for Cross Tabular Data Generation
Xiaofeng Lin, Chenheng Xu, Matthew Yang et al.
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen, Lingting Zhu, Zeyu HU et al.
Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA
Zhen Yang, Ziwei Du, Minghan Zhang et al.
Exploring Historical Information for RGBE Visual Tracking with Mamba
Chuanyu Sun, Jiqing Zhang, Yang Wang et al.
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang et al.
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
Zhengrui Guo, Conghao Xiong, Jiabo MA et al.
GaussRender: Learning 3D Occupancy with Gaussian Rendering
Loick Chambon, Eloi Zablocki, Alexandre Boulch et al.
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Yaxin Luo, Zhaoyi Li, Jiacheng Liu et al.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go, Byeongjun Park, Hyelin Nam et al.
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng, Caroline Chan, Fredo Durand et al.
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
Feng Han, Kai Chen, Chao Gong et al.
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson, Qiyang Li, Kevin Frans et al.
Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution
Zhanyi Sun, Shuran Song
Lifelong Safety Alignment for Language Models
Haoyu Wang, Yifei Zhao, Zeyu Qin et al.
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
Xiang Xu, Lingdong Kong, Song Wang et al.
Out of Length Text Recognition with Sub-String Matching
Yongkun Du, Zhineng Chen, Caiyan Jia et al.
AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling
Alexander Capstick, Rahul G. Krishnan, Payam Barnaghi
Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data
Yunhao Tang, Sid Wang, Lovish Madaan et al.
RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors
Avinash Paliwal, xilong zhou, Wei Ye et al.
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
Straight-Line Diffusion Model for Efficient 3D Molecular Generation
Yuyan Ni, Shikun Feng, Haohan Chi et al.
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.
Safe Planner: Empowering Safety Awareness in Large Pre-Trained Models for Robot Task Planning
Siyuan Li, Feifan Liu, Lingfei Cui et al.
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
Divij Handa, Pavel Dolin, Shrinidhi Kumbhar et al.
Training-Free Constrained Generation With Stable Diffusion Models
Stefano Zampini, Jacob K Christopher, Luca Oneto et al.
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu, Faisal Hamman, Sanghamitra Dutta
MATCHA: Towards Matching Anything
Fei Xue, Sven Elflein, Laura Leal-Taixe et al.
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Siran Chen, Yuxiao Luo, Yue Ma et al.
On the Relation between Rectified Flows and Optimal Transport
Johannes Hertrich, Antonin Chambolle, Julie Delon
MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking
Xinqi Liu, Li Zhou, Zikun Zhou et al.
WaterDiffusion: Learning a Prior-involved Unrolling Diffusion for Joint Underwater Saliency Detection and Visual Restoration
Laibin Chang, Yunke Wang, Longxiang Deng et al.
Toward Efficient Kernel-Based Solvers for Nonlinear PDEs
Zhitong Xu, Da Long, Yiming Xu et al.
DMWM: Dual-Mind World Model with Long-Term Imagination
Lingyi Wang, Rashed Shelim, Walid Saad et al.
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
Zeqi Gu, Yin Cui, Max Li et al.
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice, Philipp Kreer, Nathan Helm-Burger et al.
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Lorenzo Basile, Santiago Acevedo, Luca Bortolussi et al.
BrainACTIV: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation
Diego García Cerdas, Christina Sartzetaki, Magnus Petersen et al.
Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers
Hang Zhou, Yuezhou Ma, Haixu Wu et al.
QT-DoG: Quantization-Aware Training for Domain Generalization
Saqib Javed, Hieu Le, Mathieu Salzmann
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Yuran Wang, Ruihai Wu, Yue Chen et al.
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
Xingzhuo Guo, Yu Zhang, Baixu Chen et al.
ELICIT: LLM Augmentation Via External In-context Capability
Futing Wang, Jianhao (Elliott) Yan, Yue Zhang et al.
Understanding and Mitigating Memorization in Diffusion Models for Tabular Data
Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen et al.
Exploit Your Latents: Coarse-Grained Protein Backmapping with Latent Diffusion Models
Rongchao Zhang, Yu Huang, Yiwei Lou et al.
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Ziyi Wang, Yanran Zhang, Jie Zhou et al.
Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?
Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen et al.
Flowing Datasets with Wasserstein over Wasserstein Gradient Flows
Clément Bonet, Christophe Vauthier, Anna Korba
Always Skip Attention
Yiping Ji, Hemanth Saratchandran, Peyman Moghadam et al.
PROXSPARSE: REGULARIZED LEARNING OF SEMI-STRUCTURED SPARSITY MASKS FOR PRETRAINED LLMS
Hongyi Liu, Rajarshi Saha, Zhen Jia et al.
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Melody Li, Kumar Krishna Agrawal, Arna Ghosh et al.
Mask in the Mirror: Implicit Sparsification
Tom Jacobs, Rebekka Burkholz
ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation
Hamed Ayoobi, Nico Potyka, Francesca Toni
The Persistence of Neural Collapse Despite Low-Rank Bias
Connall Garrod, Jonathan Keating
Linear combinations of latents in generative models: subspaces and beyond
Erik Bodin, Alexandru Stere, Dragos Margineantu et al.
Understanding the Limits of Deep Tabular Methods with Temporal Shift
Haorun Cai, Han-Jia Ye
DEALing with Image Reconstruction: Deep Attentive Least Squares
Mehrsa Pourya, Erich Kobler, Michael Unser et al.
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
Changshuo Wang, Shuting He, Xiang Fang et al.
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Xuanming Zhang, Yuxuan Chen, Samuel (Min-Hsuan) Yeh et al.
Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
Guanchen Li, Yixing Xu, Zeping Li et al.
PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection
Xiaoran Xu, Jiangang Yang, Wenhui Shi et al.
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin, Zhenbo Yu, Yang Shen et al.
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning
Zhi Jing, Siyuan Yang, Jicong Ao et al.
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns
Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations
Pengcheng Jiang, Cao Xiao, Tianfan Fu et al.
Are Expressive Models Truly Necessary for Offline RL?
Guan Wang, Haoyi Niu, Jianxiong Li et al.
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li, Jing Cheng, Shaoyong Jia et al.
GIViC: Generative Implicit Video Compression
Ge Gao, Siyue Teng, Tianhao Peng et al.
SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints
Ziqi Sheng, Wei Lu, Xiangyang Luo et al.
CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting
Haoxin Wang, Yipeng Mo, Kunlan Xiang et al.
DualCP: Rehearsal-Free Domain-Incremental Learning via Dual-Level Concept Prototype
Qiang Wang, Yuhang He, Songlin Dong et al.
Hypergraph Attacks via Injecting Homogeneous Nodes into Elite Hyperedges
Meixia He, Peican Zhu, Keke Tang et al.
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Xuran Ma, Yexin Liu, Yaofu LIU et al.
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
Han Lin, Jaemin Cho, Amir Zadeh et al.
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
Jiankang Chen, Tianke Zhang, Changyi Liu et al.
MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Jingjing Hu, Dan Guo, Zhan Si et al.
Reconstructing Humans with a Biomechanically Accurate Skeleton
Yan Xia, Xiaowei Zhou, Etienne Vouga et al.
Runtime Analysis for Multi-Objective Evolutionary Algorithms in Unbounded Integer Spaces
Benjamin Doerr, Martin S. Krejca, Günter Rudolph
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Guanning Zeng, Xiang Zhang, Zirui Wang et al.
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
Yang You, Yixin Li, Congyue Deng et al.
Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
Hao Cheng, Erjia Xiao, Jing Shao et al.
Locally Convex Global Loss Network for Decision-Focused Learning
Haeun Jeon, Hyunglip Bae, Minsu Park et al.
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
Huanpeng Chu, Wei Wu, Guanyu Feng et al.
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.
Revisiting a Design Choice in Gradient Temporal Difference Learning
Xiaochi Qian, Shangtong Zhang
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
Jeongho Kim, Hoiyeong Jin, Sunghyun Park et al.
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu et al.
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Rui Dai, Sile Hu, Xu Shen et al.
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
Runze Zhang, Guoguang Du, Xiaochuan Li et al.
MindSimulator: Exploring Brain Concept Localization via Synthetic fMRI
Qi Zhang, Qi Zhang, Zixuan Gong et al.
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang, Dading Chong, Feng Jiang et al.
Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution
Cuong Nguyen, Thanh-Toan Do, Gustavo Carneiro
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Yongqi Huang, Peng Ye, Chenyu Huang et al.
End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler
Denis Blessing, Xiaogang Jia, Gerhard Neumann
SteerConf: Steering LLMs for Confidence Elicitation
Ziang Zhou, Tianyuan Jin, Jieming Shi et al.
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Quan Zhang, Yuxin Qi, Xi Tang et al.
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Yaniv Nikankin, Dana Arad, Yossi Gandelsman et al.
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
Yiming Liu, Kezhao Liu, Yao Xiao et al.
Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models
Fusheng Liu, Qianxiao Li
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
Yucheng Shi, Quanzheng Li, Jin Sun et al.
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang, Changle Zhou, Jiawei Kong et al.
LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale
Miran Özdogan, Gilad Landau, Gereon Elvers et al.
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
Jun Zhang, Jue Wang, Huan Li et al.
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Brian Zheng, Alisa Liu, Orevaoghene Ahia et al.
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
Ziqi Jiang, Zhen Wang, Long Chen
Probing Equivariance and Symmetry Breaking in Convolutional Networks
Sharvaree Vadgama, Mohammad Islam, Domas Buracas et al.
Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search
Haoran Sun, Yankai Jiang, Wenjie Lou et al.
Reference-Based 3D-Aware Image Editing with Triplanes
Bahri Batuhan Bilecen, Yiğit Yalın, Ning Yu et al.
Active Task Disambiguation with LLMs
Katarzyna Kobalczyk, Nicolás Astorga, Tennison Liu et al.
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
Lise Le Boudec, Emmanuel de Bézenac, Louis Serrano et al.
Bayesian Experimental Design Via Contrastive Diffusions
Jacopo Iollo, Christophe Heinkelé, Pierre Alliez et al.
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
Runchuan Zhu, Zhipeng Ma, Jiang Wu et al.
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Shaopeng Fu, Liang Ding, Jingfeng ZHANG et al.
Hearing Anywhere in Any Environment
Xiulong Liu, Anurag Kumar, Paul Calamia et al.
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
Jiawen Qin, Haonan Yuan, Qingyun Sun et al.
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
Guangda Ji, Silvan Weder, Francis Engelmann et al.
Boosting Short Text Classification with Multi-Source Information Exploration and Dual-Level Contrastive Learning
Yonghao Liu, Mengyu Li, Wei Pang et al.
Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval
Guangyuan Ma, Yongliang Ma, Xing Wu et al.
Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework
Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He et al.
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
Georgy Ponimatkin, Martin Cífka, Tomas Soucek et al.
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
Tomas Soucek, Prajwal Gatti, Michael Wray et al.
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models
Benjamin Walker, Lingyi Yang, Nicola Muca Cirone et al.
Video Perception Models for 3D Scene Synthesis
Rui Huang, Guangyao Zhai, Zuria Bauer et al.
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
Huangbiao Xu, Xiao Ke, Huanqi Wu et al.
Towards Doctor-Like Reasoning: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
Yuxing Lu, Gecheng Fu, Wei Wu et al.
GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
Celia Rubio-Madrigal, Adarsh Jamadandi, Rebekka Burkholz
Denoising Functional Maps: Diffusion Models for Shape Correspondence
Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik
Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing
Shiyang Zhou, Haijin Zeng, Yunfan Lu et al.
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Lingen Li, Zhaoyang Zhang, Yaowei Li et al.
Functionality Understanding and Segmentation in 3D Scenes
Jaime Corsetti, Francesco Giuliari, Alice Fasoli et al.
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Yudong Jin, Sida Peng, Xuan Wang et al.
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer
Qingyu Shi, Jianzong Wu, Jinbin Bai et al.
Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation
hanzhuo tan, Xiaolong Tian, Hanrui Qi et al.
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Roger Creus Castanyer, Johan Obando Ceron, Lu Li et al.
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
Zhengyuan Li, Kai Cheng, Anindita Ghosh et al.
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios
Ziming Huang, Xurui Li, Haotian Liu et al.
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Yuqian Yuan, Ronghao Dang, long li et al.
On scalable and efficient training of diffusion samplers
Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.
Expressivity of Neural Networks with Random Weights and Learned Biases
Ezekiel Williams, Alexandre Payeur, Avery Ryoo et al.
Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection
Kedi Chen, Qin Chen, Jie Zhou et al.
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
Shuyang Hao, Bryan Hooi, Jun Liu et al.
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models
Jianqun Zhou, Yuanlei Zheng, Wei Chen et al.
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
Xinyue Fang, Zhen Huang, Zhiliang Tian et al.
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
Hugo Thimonier, José Lucas De Melo Costa, Fabrice Popineau et al.
FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution
Gene Chou, Wenqi Xian, Guandao Yang et al.
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Andong Deng, Zhongpai Gao, Anwesa Choudhuri et al.
Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems
Zhuohui Zhang, Bin He, Bin Cheng et al.
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
REVECA: Adaptive Planning and Trajectory-Based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity
SeungWon Seo, SeongRae Noh, Junhyeok Lee et al.
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers
Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag
Multi-Agent Motion Planning for Differential Drive Robots Through Stationary State Search
Jingtian Yan, Jiaoyang Li
Cached Multi-Lora Composition for Multi-Concept Image Generation
Xiandong Zou, Mingzhu Shen, Christos-Savvas Bouganis et al.
UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping
Yanjie Li, Kaisheng Liang, Bin Xiao
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
Jiaxing Xu, Yongqiang Chen, Xia Dong et al.
Split Gibbs Discrete Diffusion Posterior Sampling
Wenda Chu, Zihui Wu, Yifan Chen et al.
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.
ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression
Kai Yao, Zhaorui Tan, Tiandi Ye et al.
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.
Large Language Models Think Too Fast To Explore Effectively
Lan Pan, Hanbo Xie, Robert Wilson
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Hung Le, Dung Nguyen, Kien Do et al.
Stealthy Shield Defense: A Conditional Mutual Information-Based Approach against Black-Box Model Inversion Attacks
Tianqu Zhuang, Hongyao Yu, Yixiang Qiu et al.
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
Yanrui Bin, Wenbo Hu, Haoyuan Wang et al.
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
Yueru Jia, Jiaming Liu, Sixiang Chen et al.
AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration
Javier Tirado-Garín, Javier Civera
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Aymane El Firdoussi, Mohamed El Amine Seddik, Soufiane Hayou et al.