Most Cited 2025 "human-ai interaction" Papers
22,274 papers found • Page 32 of 112
Conference
Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction
Yitong Duan, Weiran Wang, Jian Li
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz, Kate Sanders, David Etter et al.
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent
Xinyao Liao, Xianfang Zeng, Liao Wang et al.
Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models
Junyi Li, Hwee Tou Ng
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang, Yongchao Feng, Ziqi Liu et al.
Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action
Yuhao Sun, Zhenyi Zhang, Zihan Wang et al.
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie, Zequn Zeng, Hao Zhang et al.
Is Complex Query Answering Really Complex?
Cosimo Gregucci, Bo Xiong, Daniel Hernández et al.
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu, Qizhi Chen, Zhigang Wang et al.
TVNet: A Novel Time Series Analysis Method Based on Dynamic Convolution and 3D-Variation
Chenghan Li, Mingchen LI, Ruisheng Diao
Physics-Informed Generative Modeling of Wireless Channels
Benedikt Böck, Andreas Oeldemann, Timo Mayer et al.
Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent
Ya-Chi Chu, Wenzhi Gao, Yinyu Ye et al.
CWNet: Causal Wavelet Network for Low-Light Image Enhancement
Tongshun Zhang, Pingping Liu, Yubing Lu et al.
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
Shuhan Tan, John Wheatley Lambert, Hong Jeon et al.
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Liuyi Wang, Xinyuan Xia, Hui Zhao et al.
Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection
Kedi Chen, Qin Chen, Jie Zhou et al.
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang, Johan Obando-Ceron, Pablo Samuel Castro et al.
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Jieming Cui, Tengyu Liu, Ziyu Meng et al.
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Chen Chen, Daochang Liu, Mubarak Shah et al.
Efficient Active Imitation Learning with Random Network Distillation
Emilien Biré, Anthony Kobanda, Ludovic Denoyer et al.
Attributing Culture-Conditioned Generations to Pretraining Corpora
Huihan Li, Arnav Goel, Keyu He et al.
ROPO: Robust Preference Optimization for Large Language Models
Xize Liang, Chao Chen, Shuang Qiu et al.
Privacy Attacks on Image AutoRegressive Models
Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch et al.
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Michelle Zhao, Henny Admoni, Reid Simmons et al.
The Bandit Whisperer: Communication Learning for Restless Bandits
Yunfan Zhao, Tonghan Wang, Dheeraj Mysore Nagaraj et al.
TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan, Fei Kong, Hao Cheng et al.
Multi-Agent Motion Planning for Differential Drive Robots Through Stationary State Search
Jingtian Yan, Jiaoyang Li
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Kaidong Zhang, Rongtao Xu, Ren Pengzhen et al.
Fully Test-time Adaptation for Tabular Data
Zhi Zhou, Kun-Yang Yu, Lan-Zhe Guo et al.
Factor Augmented Tensor-on-Tensor Neural Networks
Guanhao Zhou, Yuefeng Han, Xiufan Yu
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
Andreas Opedal, Haruki Shirakami, Bernhard Schölkopf et al.
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
Yanming Xiu, Maria Gorlatova
CO-MOT: Boosting End-to-end Transformer-based Multi-Object Tracking via Coopetition Label Assignment and Shadow Sets
feng yan, Weixin Luo, Yujie Zhong et al.
On the Expressiveness and Length Generalization of Selective State Space Models on Regular Languages
Aleksandar Terzic, Michael Hersche, Giacomo Camposampiero et al.
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
Ziqi Jiang, Zhen Wang, Long Chen
Scene Map-based Prompt Tuning for Navigation Instruction Generation
Sheng Fan, Rui Liu, Wenguan Wang et al.
AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems
Yu Shang, Peijie Liu, Yuwei Yan et al.
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
Jun Zhang, Jue Wang, Huan Li et al.
Depth-Bounds for Neural Networks via the Braid Arrangement
Moritz Grillo, Christoph Hertrich, Georg Loho
Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting
Zhining Liu, Ze Yang, Xiao Lin et al.
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
Weihang Li, Hongli XU, Junwen Huang et al.
TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference
Jack Min Ong, Matthew Di Ferrante, Aaron Pazdera et al.
Error Bounds for Gaussian Process Regression Under Bounded Support Noise with Applications to Safety Certification
Robert Reed, Luca Laurenti, Morteza Lahijanian
Improving Complex Reasoning with Dynamic Prompt Corruption: A Soft Prompt Optimization Approach
Sinan Fan, Liang Xie, Chen Shen et al.
Doubly Robust Conformalized Survival Analysis with Right-Censored Data
Matteo Sesia, vladimir svetnik
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
Xiaomeng Xu, Yifan Hou, Zeyi Liu et al.
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
Jiaxing Xu, Yongqiang Chen, Xia Dong et al.
DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery
Yuanpei Liu, Kai Han
Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection
Lei Fan, Junjie Huang, Donglin Di et al.
GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting
Yusen XIE, Zhenmin Huang, Jin Wu et al.
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun, Penghan Wang, Fan Lai
Can Students Beyond the Teacher? Distilling Knowledge from Teacher’s Bias
Jianhua Zhang, Yi Gao, Ruyu Liu et al.
Adaptive Gradient Clipping for Robust Federated Learning
Youssef Allouah, Rachid Guerraoui, Nirupam Gupta et al.
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
Xingyu Zheng, Xianglong Liu, Haotong Qin et al.
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.
Expressivity of Neural Networks with Random Weights and Learned Biases
Ezekiel Williams, Alexandre Payeur, Avery Ryoo et al.
Continual Learning Using a Kernel-Based Method Over Foundation Models
Saleh Momeni, Sahisnu Mazumder, Bing Liu
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?
Jinhong Ni, Chang-Bin Zhang, Qiang Zhang et al.
AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks
Shibing Mo, Kai Wu, Qixuan Gao et al.
Sequential Conditional Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness
Agathe Fernandes Machado, Arthur Charpentier, Ewen Gallic
LEDiff: Latent Exposure Diffusion for HDR Generation
Chao Wang, Zhihao Xia, Thomas Leimkuehler et al.
UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation
Huimin LU, Masaru Isonuma, Junichiro Mori et al.
Kinetic Langevin Diffusion for Crystalline Materials Generation
François Cornet, Federico Bergamin, Arghya Bhowmik et al.
Training Consistent Mixture-of-Experts-Based Prompt Generator for Continual Learning
Yue Lu, Shizhou Zhang, De Cheng et al.
Distance-Based Tree-Sliced Wasserstein Distance
Viet-Hoang Tran, Minh-Khoi Nguyen-Nhat, Trang Pham et al.
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
Itamar Zimerman, ameen ali ali, Lior Wolf
MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices
HAILONG YAN, Ao Li, Xiangtao Zhang et al.
RelationField: Relate Anything in Radiance Fields
Sebastian Koch, Johanna Wald, Mirco Colosi et al.
AnoLLM: Large Language Models for Tabular Anomaly Detection
Che-Ping Tsai, Ganyu Teng, Phillip Wallis et al.
Causal Representation Learning from Multimodal Biomedical Observations
Yuewen Sun, Lingjing Kong, Guangyi Chen et al.
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition
Zimo Wang, Cheng Wang, Taiki Yoshino et al.
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.
Assessing Pre-Trained Models for Transfer Learning Through Distribution of Spectral Components
Tengxue Zhang, Yang Shu, Xinyang Chen et al.
Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse
Seung Hyun Cheon, Anneke Wernerfelt, Sorelle Friedler et al.
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding, Ruqi Zhang
Balancing Multimodal Training Through Game-Theoretic Regularization
Konstantinos Kontras, Thomas Strypsteen, Christos Chatzichristos et al.
AtomSurf: Surface Representation for Learning on Protein Structures
Vincent Mallet, Yangyang Miao, Souhaib Attaiki et al.
CREIMBO: Cross-Regional Ensemble Interactions in Multi-view Brain Observations
Noga Mudrik, Ryan Ly, Oliver Ruebel et al.
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
Hao Ju, Shaofei Huang, Si Liu et al.
ARIG: Autoregressive Interactive Head Generation for Real-time Conversations
Ying Guo, Xi Liu, Cheng Zhen et al.
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields
Zhengqin Li, Dilin Wang, Ka chen et al.
Object-centric binding in Contrastive Language-Image Pretraining
Rim Assouel, Pietro Astolfi, Florian Bordes et al.
Controllable Satellite-to-Street-View Synthesis with Precise Pose Alignment and Zero-Shot Environmental Control
Xianghui Ze, Zhenbo Song, Qiwei Wang et al.
Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge
Zhenyi Zhang, Zihan Wang, Yuhao Sun et al.
Emergent Response Planning in LLMs
Zhichen Dong, Zhanhui Zhou, Zhixuan Liu et al.
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
Ming Li, Jike Zhong, Tianle Chen et al.
Small Singular Values Matter: A Random Matrix Analysis of Transformer Models
Max Staats, Matthias Thamm, Bernd Rosenow
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment
Jinhao Jiang, Junyi Li, Xin Zhao et al.
VCT: Training Consistency Models with Variational Noise Coupling
Gianluigi Silvestri, Luca Ambrogioni, Chieh-Hsin Lai et al.
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu, Jingwei Sun, Yueqian Lin et al.
Enhancing 3D Reconstruction for Dynamic Scenes
Jisang Han, Honggyu An, Jaewoo Jung et al.
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.
On the Transfer of Object-Centric Representation Learning
Aniket Rajiv Didolkar, Andrii Zadaianchuk, Anirudh Goyal et al.
Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors
Tianchun Wang, Yuanzhou Chen, Zichuan Liu et al.
Improving Language Model Distillation through Hidden State Matching
Sayantan Dasgupta, Trevor Cohn
On the Completeness of Invariant Geometric Deep Learning Models
Zian Li, Xiyuan Wang, Shijia Kang et al.
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
Liang CHEN, Xueting Han, Li Shen et al.
Provable Maximum Entropy Manifold Exploration via Diffusion Models
Riccardo De Santi, Marin Vlastelica, Ya-Ping Hsieh et al.
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Yuze He, Yanning Zhou, Wang Zhao et al.
It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Dominik Schnaus, Nikita Araslanov, Daniel Cremers
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
Hao Cheng, Erjia Xiao, Jiayan Yang et al.
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar, Johan S Obando Ceron, Aaron Courville et al.
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Hmrishav Bandyopadhyay, Yi-Zhe Song
Aligning Protein Conformation Ensemble Generation with Physical Feedback
Jiarui Lu, Xiaoyin Chen, Stephen Lu et al.
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Chen Dun, Mirian Del Carmen Hipolito Garcia, Guoqing Zheng et al.
Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System?
Yancheng Cai, Fei Yin, Dounia Hammou et al.
Image Quality Assessment: From Human to Machine Preference
Chunyi Li, Yuan Tian, Xiaoyue Ling et al.
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
Zekai Zhao, Qi Liu, Kun Zhou et al.
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.
Training Language Models to Generate Quality Code with Program Analysis Feedback
Feng Yao, Zilong Wang, Liyuan Liu et al.
Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity
Sung Ju Lee, Nam Ik Cho
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction
Seungtae Nam, Xiangyu Sun, Gyeongjin Kang et al.
Pose Priors from Language Models
Sanjay Subramanian, Evonne Ng, Lea Müller et al.
InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly
James Enouen, Yan Liu
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin, Tushar Nagarajan, Nicolas Ballas et al.
Stochastic Online Instrumental Variable Regression: Regrets for Endogeneity and Bandit Feedback
Riccardo Della Vecchia, Debabrota Basu
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
Yuejiang Liu, Jubayer Hamid, Annie Xie et al.
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos
Rundong Luo, Matthew Wallingford, Ali Farhadi et al.
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Jaehyeon Son, Soochan Lee, Gunhee Kim
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi, Hyeyoon Lee, Dain Kwon et al.
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
Yanghao Wang, Long Chen
Enhancing Target-unspecific Tasks through a Features Matrix
Fangming Cui, Yonggang Zhang, Xuan Wang et al.
Graph Neural Ricci Flow: Evolving Feature from a Curvature Perspective
Jialong Chen, Bowen Deng, Zhen WANG et al.
Generalized Dimension Reduction Using Semi-Relaxed Gromov-Wasserstein Distance
Ranthony A. Clark, Tom Needham, Thomas Weighill
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
Mashrur M. Morshed, Vishnu Naresh Boddeti
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
Yang Cai, Gabriele Farina, Julien Grand-Clément et al.
PWM: Policy Learning with Multi-Task World Models
Ignat Georgiev, Varun Giridhar, Nick Hansen et al.
CAX: Cellular Automata Accelerated in JAX
Maxence Faldor, Antoine Cully
SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning
Wenqian Li, Pengfei Fang, Hui Xue
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
Rui Tian, Qi Dai, Jianmin Bao et al.
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng, Han Li, Wenrui Dai et al.
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Romain Thoreau, Valerio Marsocci, Dawa Derksen
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Hongbo Liu, Jingwen He, Yi Jin et al.
Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment
Haoyuan Wu, Haisheng Zheng, Yuan Pu et al.
Loosely Synchronized Rule-Based Planning for Multi-Agent Path Finding with Asynchronous Actions
Shuai Zhou, Shizhe Zhao, Zhongqiang Ren
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi et al.
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi et al.
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
Xinghao Wang, Pengyu Wang, Bo Wang et al.
Adjoint Schrödinger Bridge Sampler
Guan-Horng Liu, Jaemoo Choi, Yongxin Chen et al.
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
Qiong Wu, Zhaoxi Ke, Yiyi Zhou et al.
DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning
Chao Li, Ziwei Deng, Chenxing Lin et al.
Uncertainty Quantification with the Empirical Neural Tangent Kernel
Joseph Wilson, Chris van der Heide, Liam Hodgkinson et al.
Time-o1: Time-Series Forecasting Needs Transformed Label Alignment
Hao Wang, Licheng Pan, Zhichao Chen et al.
Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation
Fangyuan Wang, Shipeng Lyu, Peng Zhou et al.
Safe Planner: Empowering Safety Awareness in Large Pre-Trained Models for Robot Task Planning
Siyuan Li, Feifan Liu, Lingfei Cui et al.
Can Generative Video Models Help Pose Estimation?
Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.
Understanding Individual Agent Importance in Multi-Agent System via Counterfactual Reasoning
Jianming Chen, Yawen Wang, Junjie Wang et al.
Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning
Yu Zhang, Jialei Zhou, Xinchen Li et al.
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Shengjun Zhang, Jinzhao Li, Xin Fei et al.
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
Ruichen Shao, Bei Li, Gangao Liu et al.
Dense Video Object Captioning from Disjoint Supervision
Xingyi Zhou, Anurag Arnab, Chen Sun et al.
Int2Planner: An Intention-based Multi-modal Motion Planner for Integrated Prediction and Planning
Xiaolei Chen, Junchi Yan, Wenlong Liao et al.
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei, Faizan Siddiqui, Jiacong Xu et al.
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.
ChatHuman: Chatting about 3D Humans with Tools
Jing Lin, Yao Feng, Weiyang Liu et al.
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
Chengyou Jia, Changliang Xia, Zhuohang Dang et al.
Equivariant Symmetry Breaking Sets
YuQing Xie, Tess Smidt
Long-Term EEG Partitioning for Seizure Onset Detection
Zheng Chen, Yasuko Matsubara, Yasushi Sakurai et al.
Relative Pose Estimation through Affine Corrections of Monocular Depth Priors
Yifan Yu, Shaohui Liu, Rémi Pautrat et al.
GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching
Ziming Zhang, Fangzhou Lin, Haotian Liu et al.
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang, Maixuan Xue, Xinran Liu et al.
SITE: towards Spatial Intelligence Thorough Evaluation
Wenqi Wang, Reuben Tan, Pengyue Zhu et al.
Enhancing Language Model Agents using Diversity of Thoughts
Vijay Chandra Lingam, Behrooz Tehrani, sujay sanghavi et al.
Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes
Haotian Wu, Gongpu Chen, Deniz Gunduz
Learned Image Transmission with Hierarchical Variational Autoencoder
Guangyi Zhang, Hanlei Li, Yunlong Cai et al.
Learning Distances from Data with Normalizing Flows and Score Matching
Peter Sorrenson, Daniel Behrend-Uriarte, Christoph Schnörr et al.
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
Siqi Luo, Haoran Yang, Yi Xin et al.
Better autoregressive regression with LLMs via regression-aware fine-tuning
Michal Lukasik, Zhao Meng, Harikrishna Narasimhan et al.
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu et al.
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie, Daniil Pakhomov, Zhonghao Wang et al.
Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation
Chenyu Zhang, Xu Chen, Xuan Di
AdaDPCC: Adaptive Rate Control and Rate-Distortion-Complexity Optimization for Dynamic Point Cloud Compression
Chenhao Zhang, Wei Gao
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
Event-based Tiny Object Detection: A Benchmark Dataset and Baselines
Nuo Chen, Chao Xiao, Yimian Dai et al.
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen et al.
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Damien Teney, Liangze Jiang, Florin Gogianu et al.
Stochastic Process Learning via Operator Flow Matching
Yaozhong Shi, Zachary Ross, Domniki Asimaki et al.
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
Qizheng Zhang, Michael Wornow, Kunle Olukotun
Exact Expressive Power of Transformers with Padding
Will Merrill, Ashish Sabharwal
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang, Qirui Chen, Cilin Yan et al.
StableCodec: Taming One-Step Diffusion for Extreme Image Compression
Tianyu Zhang, Xin Luo, Li Li et al.
Implicit Neural Surface Deformation with Explicit Velocity Fields
Lu Sang, Zehranaz Canfes, Dongliang Cao et al.
Reasoning Elicitation in Language Models via Counterfactual Feedback
Alihan Hüyük, Xinnuo Xu, Jacqueline Maasch et al.
Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control
Songyuan Zhang, Oswin So, Mitchell Black et al.
On the Robustness of Reward Models for Language Model Alignment
Jiwoo Hong, Noah Lee, Eunki Kim et al.
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
Michael Kirchhof, James Thornton, Louis Béthune et al.
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Leying Zhang, Yao Qian, Xiaofei Wang et al.
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Ziyao Wang, Muneeza Azmat, Ang Li et al.
Rethinking Fair Representation Learning for Performance-Sensitive Tasks
Charles Jones, Fabio De Sousa Ribeiro, Mélanie Roschewitz et al.
GDiffRetro: Retrosynthesis Prediction with Dual Graph Enhanced Molecular Representation and Diffusion Generation
Shengyin Sun, Wenhao Yu, Yuxiang Ren et al.
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Zichen Liu, Yihao Meng, Hao Ouyang et al.
Let Your Features Tell The Differences: Understanding Graph Convolution By Feature Splitting
Yilun Zheng, Xiang Li, Sitao Luan et al.
Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs
Xiaqiang Tang, Jian Li, Nan Du et al.
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Wen-Sheng Chu et al.
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
Shicheng Xu, Liang Pang, Huawei Shen et al.
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Shijie Wang, Samaneh Azadi, Rohit Girdhar et al.
Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
Abdulkadir Gokce, Martin Schrimpf