Most Cited 2025 "parameterized environment configurations" Papers
22,274 papers found • Page 8 of 112
Conference
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee et al.
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Boyi Deng, Wenjie Wang, Fengbin Zhu et al.
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu, Tim Xiao, Weiyang Liu et al.
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang, Jiawei Feng, Weibin Luo et al.
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li, Yuenan Hou, Xiaohan Xing et al.
Self-Adapting Language Models
Adam Zweiger, Jyo Pari, Han Guo et al.
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man, De-An Huang, Guilin Liu et al.
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets
Yuzhe YANG, Yifei Zhang, Minghao Wu et al.
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Tian-Xing Xu, Xiangjun Gao, Wenbo Hu et al.
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Jiarui Jin, Haoyu Wang, Hongyan Li et al.
Learning Long Range Dependencies on Graphs via Random Walks
Dexiong Chen, Till Schulz, Karsten Borgwardt
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
Yue Zhang, Zhiyang Xu, Ying Shen et al.
Influence-Guided Diffusion for Dataset Distillation
Mingyang Chen, Jiawei Du, Bo Huang et al.
Emergence of meta-stable clustering in mean-field transformer models
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
Benchmarking Agentic Workflow Generation
Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.
SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
Zouying Cao, Yifei Yang, Hai Zhao
Cut Your Losses in Large-Vocabulary Language Models
Erik Wijmans, Brody Huval, Alexander Hertzberg et al.
Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li et al.
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu, Zhengyang Liang, Yueze Wang et al.
Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models
Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy et al.
GameArena: Evaluating LLM Reasoning through Live Computer Games
Lanxiang Hu, Qiyu Li, Anze Xie et al.
QMambaBSR: Burst Image Super-Resolution with Query State Space Model
Xin Di, Long Peng, Peizhe Xia et al.
CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.
Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Shizhe Diao, Yu Yang, Yonggan Fu et al.
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang, Jiluan Fan, Anli Yan et al.
WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting
Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Div Garg, Diego Caples, Andis Draguns et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan, Jiaze Li, Jianzhong Ju et al.
Reducing Tool Hallucination via Reliability Alignment
Hongshen Xu, Zichen Zhu, Lei Pan et al.
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Gang Liu, Michael Sun, Wojciech Matusik et al.
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
A Rainbow in Deep Network Black Boxes
Florentin Guth, Brice Ménard, Gaspar Rochette et al.
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong, Qi Jiang, Jingyi Yu et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
E(n) Equivariant Topological Neural Networks
Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
Hui Zhang, Zuxuan Wu, Zhen Xing et al.
On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Quentin Bertrand, Anne Gagneux, Mathurin Massias et al.
Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution
Zeyu Xiao, Zhuoyuan Li, Wei Jia
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
Efficiently Scaling LLM Reasoning Programs with Certaindex
Yichao Fu, Junda Chen, Siqi Zhu et al.
Online Preference Alignment for Language Models via Count-based Exploration
Chenjia Bai, Yang Zhang, Shuang Qiu et al.
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.
Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models
Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee et al.
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck, Maximilian Baader, Martin Vechev
Textured Gaussians for Enhanced 3D Scene Appearance Modeling
Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.
Understanding Optimization in Deep Learning with Central Flows
Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
Quanhao Li, Zhen Xing, Rui Wang et al.
HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation
Haoran Luo, Haihong E, Guanting Chen et al.
Universal Length Generalization with Turing Programs
Kaiying Hou, David Brandfonbrener, Sham Kakade et al.
First-Person Fairness in Chatbots
Tyna Eloundou, Alex Beutel, David Robinson et al.
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
Seth Aycock, David Stap, Di Wu et al.
Generalization through variance: how noise shapes inductive biases in diffusion models
John Vastola
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
Xiao Wang, Yu Jin, Wentao Wu et al.
Cubify Anything: Scaling Indoor 3D Object Detection
Justin Lazarow, David Griffiths, Gefen Kohavi et al.
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models
Zeyu Yang, Zijie Pan, Chun Gu et al.
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking
Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Anthony Zhou, Zijie Li, Michael Schneier et al.
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.
Do as We Do, Not as You Think: the Conformity of Large Language Models
Zhiyuan Weng, Guikun Chen, Wenguan Wang
Non-myopic Generation of Language Models for Reasoning and Planning
Chang Ma, Haiteng Zhao, Junlei Zhang et al.
Delta Decompression for MoE-based LLMs Compression
Hao Gu, Wei Li, Lujun Li et al.
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
MoonCast: High-Quality Zero-Shot Podcast Generation
Zeqian Ju, Dongchao Yang, Shen Kai et al.
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
Junfeng Ni, Yu Liu, Ruijie Lu et al.
UniK3D: Universal Camera Monocular 3D Estimation
Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.
MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
Zhenlong Yuan, Cong Liu, Fei Shen et al.
Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators
Wenhan Gao, Ruichen Xu, Yuefan Deng et al.
Benchmarking Predictive Coding Networks -- Made Simple
Luca Pinchetti, Chang Qi, Oleh Lokshyn et al.
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
Video Motion Transfer with Diffusion Transformers
Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu, Ming Ma, Xiaomin Yu et al.
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang, Zhiquan Luo
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
Jiale Xu, Shenghua Gao, Ying Shan
Commit0: Library Generation from Scratch
Wenting Zhao, Nan Jiang, Celine Lee et al.
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao, Ran Cheng, Xingyu Wu et al.
Encryption-Friendly LLM Architecture
Donghwan Rho, Taeseong Kim, Minje Park et al.
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems
Fu Luo, Xi Lin, Yaoxin Wu et al.
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Bhavya, Stelian Coros, Andreas Krause et al.
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Rui Ye, Jingyi Chai, Xiangrui Liu et al.
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen, Chao Wang, Renzhuo Huang et al.
VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs
Qiucheng Wu, Handong Zhao, Michael Saxon et al.
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.
Forking Paths in Neural Text Generation
Eric Bigelow, Ari Holtzman, Hidenori Tanaka et al.
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
Chen Guo, Junxuan Li, Yash Kant et al.
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Majhi et al.
The Illusion of Empathy: How AI Chatbots Shape Conversation Perception
Tingting Liu, Salvatore Giorgi, Ankit Aich et al.
A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection
Junjun Pan, Yixin Liu, Xin Zheng et al.
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
zhenwei Wang, Tengfei Wang, Zexin He et al.
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
Yukang Cao, Liang Pan, Kai Han et al.
FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency
Han Huang, Yulun Wu, Chao Deng et al.
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
Marcel Kollovieh, Marten Lienen, David Lüdke et al.
SELF-EVOLVED REWARD LEARNING FOR LLMS
Chenghua Huang, Zhizhen Fan, Lu Wang et al.
Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
Anke Tang, Enneng Yang, Li Shen et al.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
Hulingxiao He, Geng Li, Zijun Geng et al.
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Chaoya Jiang, Yongrui Heng, Wei Ye et al.
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Youjia Zhang, Anpei Chen, Yumin Wan et al.
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.
Scaling Optimal LR Across Token Horizons
Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.
Towards Adversarially Robust Dataset Distillation by Curvature Regularization
Eric Xue, Yijiang Li, Haoyang Liu et al.
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Simran Kaur, Simon Park, Anirudh Goyal et al.
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Fan Lu, Wei Wu, Kecheng Zheng et al.
Block Verification Accelerates Speculative Decoding
Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Stefan Lionar, Jiabin Liang, Gim Hee Lee
TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation
Yanyong Huang, Minghui Lu, Wei Huang et al.
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
Chengan He, Xin Sun, Zhixin Shu et al.
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao et al.
Progress or Regress? Self-Improvement Reversal in Post-training
Ting Wu, Xuefeng Li, Pengfei Liu
Learning to Discretize Denoising Diffusion ODEs
Vinh Tong, Trung-Dung Hoang, Anji Liu et al.
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
Bojun Xiong, Jialun Liu, JiaKui Hu et al.
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
Zhitong Xu, Haitao Wang, Jeff Phillips et al.
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
Yihong Luo, Xiaolong Chen, Xinghua Qu et al.
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
Hash3D: Training-free Acceleration for 3D Generation
Xingyi Yang, Songhua Liu, Xinchao Wang
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
Yutao Jiang, Qiong Wu, Wenhao Lin et al.
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda TAO, Can Qin et al.
Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems
jindong tian, Yuxuan Liang, Ronghui Xu et al.
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
Xue zhucun, Jiangning Zhang, Teng Hu et al.
Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Ziheng Zhou, Jinxing Zhou, Wei Qian et al.
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
Miruna Cretu, Charles Harris, Ilia Igashov et al.
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang, Mengzhen Liu, Lichen Li et al.
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
ZeMing Gong, Austin Wang, Xiaoliang Huo et al.
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
Issar Tzachor, Boaz Lerner, Matan Levy et al.
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Yue Liu, Shengfang Zhai, Mingzhe Du et al.
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang, Yucheol Cho, Suin Lee et al.
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
Yuan Wang, Ouxiang Li, Tingting Mu et al.
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu, Renda Li, Yong Wang
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li, Yifei Xing, Xiangyuan Lan et al.
Perturbation-Restrained Sequential Model Editing
Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang, William Gilpin
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
Zuwei Long, Yunhang Shen, Chaoyou Fu et al.
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations
Song Wang, Junhong Lin, Xiaojie Guo et al.
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Junxuan Wang, Xuyang Ge, Wentao Shu et al.
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark
Yili Wang, Yixin Liu, Xu Shen et al.
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
Hongyu Qu, Jianan Wei, Xiangbo Shu et al.
3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
Xiaobiao Du, Yida Wang, Haiyang Sun et al.
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Wujiang Xu, Qitian Wu, Zujie Liang et al.
Controllable Context Sensitivity and the Knob Behind It
Julian Minder, Kevin Du, Niklas Stoehr et al.
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen, Xingyu Chen, Anpei Chen et al.
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Litian Liu, Yao Qin
DarkBench: Benchmarking Dark Patterns in Large Language Models
Esben Kran, Hieu Minh Nguyen, Akash Kundu et al.
Learning 3D Persistent Embodied World Models
Siyuan Zhou, Yilun Du, Yuncong Yang et al.
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.
Graph Sparsification via Mixture of Graphs
Guibin Zhang, Xiangguo SUN, Yanwei Yue et al.
A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning
Chen-Yu Liu, Chao-Han Huck Yang, Hsi-Sheng Goan et al.
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.
Gradient-Free Generation for Hard-Constrained Systems
Chaoran Cheng, Boran Han, Danielle Maddix et al.
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang et al.
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Yining Hong, Beide Liu, Maxine Wu et al.
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
Jinglei Zhang, Jiankang Deng, Chao Ma et al.
MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences
Genta Winata, David Anugraha, Lucky Susanto et al.
Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving
Yuhang Lu, Yichen Yao, Jiadong Tu et al.
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
Xin Zhang, Robby T. Tan
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.
Design Principle Transfer in Neural Architecture Search via Large Language Models
Xun Zhou, Xingyu Wu, Liang Feng et al.
Optimal Transport for Time Series Imputation
Hao Wang, zhengnan li, Haoxuan Li et al.
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Jiangyong Huang, Baoxiong Jia, Yan Wang et al.
LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement
Jieming Bian, Lei Wang, Letian Zhang et al.
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yuqing Wang, Zhijie Lin, Yao Teng et al.
From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation
Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč et al.
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.
Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach
Zhiwei Li, Guodong Long, Tianyi Zhou et al.
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
Li, Nikolaos Tsagkas, Jifei Song et al.
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
Wei Guo, Molei Tao, Yongxin Chen
DeLLMa: Decision Making Under Uncertainty with Large Language Models
Ollie Liu, Deqing Fu, Dani Yogatama et al.
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun, Tao Lei, Bowen Zhang et al.
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Winata, Anirban Das et al.
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubic, Federico Soldà, Aurelio Sulser et al.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
Charles Blake, Constantin Eichenberg, Josef Dean et al.