Most Cited 2025 "dialogue state tracking" Papers
22,274 papers found • Page 7 of 112
Conference
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Yoad Tewel, Rinon Gal, Dvir Samuel et al.
Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting
Yifan Hu, Peiyuan Liu, Peng Zhu et al.
LLM-Powered User Simulator for Recommender System
Zijian Zhang, Shuchang Liu, Ziru Liu et al.
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Yifan Lu, Xuanchi Ren, Jiawei Yang et al.
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan, Tianyu Pang, Chao Du et al.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.
A New Perspective on Shampoo's Preconditioner
Depen Morwani, Itai Shapira, Nikhil Vyas et al.
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling
Zhihao Li, Yufei Wang, Heliang Zheng et al.
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Haoquan Fang, Markus Grotz, Wilbert Pumacay et al.
WISA: World simulator assistant for physics-aware text-to-video generation
Jing Wang, Ao Ma, Ke Cao et al.
SCALM: Detecting Bad Practices in Smart Contracts Through LLMs
Zongwei Li, Xiaoqi Li, Wenkai Li et al.
Harnessing the Universal Geometry of Embeddings
Rishi Jha, Collin Zhang, Vitaly Shmatikov et al.
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
Tianchen Deng, Guole Shen, Chen Xun et al.
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Enshen Zhou, Qi Su, Cheng Chi et al.
Scaling Laws of Synthetic Data for Language Model
Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.
Reconstructive Visual Instruction Tuning
Haochen Wang, Anlin Zheng, Yucheng Zhao et al.
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank
Tianhe Wu, Jian Zou, Jie Liang et al.
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
chaocan xue, Bineng Zhong, Qihua Liang et al.
Modular Duality in Deep Learning
Jeremy Bernstein, Laker Newhouse
Improving Retrieval Augmented Language Model with Self-Reasoning
Yuan Xia, Jingbo Zhou, Zhenhui Shi et al.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Hao Li, Changyao TIAN, Jie Shao et al.
Can LLMs Understand Time Series Anomalies?
Zihao Zhou, Rose Yu
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
Tao Liu, Kai Wang, Senmao Li et al.
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Robert Hönig, Javier Rando, Nicholas Carlini et al.
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
Shengji Tang, Weicai Ye, Peng Ye et al.
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
Qijian Tian, Xin Tan, Yuan Xie et al.
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.
DeFoG: Discrete Flow Matching for Graph Generation
Yiming Qin, Manuel Madeira, Dorina Thanou et al.
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion
Sophia Tang, Yinuo Zhang, Pranam Chatterjee, PhD
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun et al.
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
yifei xia, Suhan Ling, Fangcheng Fu et al.
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.
RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing
Jinyao Guo, Chengpeng Wang, Xiangzhe Xu et al.
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li, Xuyang Hu, Xiaoye Qu et al.
Multi-Agent Collaboration via Evolving Orchestration
Yufan Dang, Chen Qian, Xueheng Luo et al.
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Mark YU, Wenbo Hu, Jinbo Xing et al.
TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation
Juntong Shi, Minkai Xu, Harper Hua et al.
Detecting Strategic Deception with Linear Probes
Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim et al.
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning
Yuncong Yang, Han Yang, Jiachen Zhou et al.
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
Ziyu Zhao, tao shen, Didi Zhu et al.
ToolGen: Unified Tool Retrieval and Calling via Generation
Renxi Wang, Xudong Han, Lei Ji et al.
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
Hailey Joren, Jianyi Zhang, Chun-Sung Ferng et al.
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
Daniel Marczak, Simone Magistri, Sebastian Cygert et al.
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Ahmed Nassar, Matteo Omenetti, Maksym Lysak et al.
DEEM: Diffusion models serve as the eyes of large language models for image perception
Run Luo, Yunshui Li, Longze Chen et al.
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
Siwei Wen, junyan ye, Peilin Feng et al.
Text4Seg: Reimagining Image Segmentation as Text Generation
Mengcheng Lan, Chaofeng Chen, Yue Zhou et al.
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang, Nishanth Kumar, Hao Tang et al.
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Xi Chen, Kaituo Feng, Changsheng Li et al.
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
Zhe Kong, Feng Gao, Yong Zhang et al.
GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents
Yuqi Zhou, Sunhao Dai, Shuai Wang et al.
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding
Mingyu Jin, Kai Mei, Wujiang Xu et al.
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li, Bin Lin, Yang Ye et al.
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Perampalli Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin et al.
A₀ : An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu, Jian Zhang, Minghao Guo et al.
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Ezra Karger, Houtan Bastani, Chen Yueh-Han et al.
Generative Gaussian Splatting for Unbounded 3D City Generation
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.
One Diffusion to Generate Them All
Duong H. Le, Tuan Pham, Sangho Lee et al.
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
Yantai Yang, Yuhao Wang, Zichen Wen et al.
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
Qi Wang, Yanrui Yu, Ye Yuan et al.
PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis
Yan Wu, Esther Wershof, Sebastian Schmon et al.
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Tiantian Geng, Jinrui Zhang, Qingni Wang et al.
What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Kunhao Zheng, Juliette Decugis, Jonas Gehring et al.
O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions
Gen Li, Yuling Yan
An Analysis of Quantile Temporal-Difference Learning
Mark Rowland, Remi Munos, Mohammad Gheshlaghi Azar et al.
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail
Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning
Yichen Wu, Hongming Piao, Long-Kai Huang et al.
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Lukas Höllein, Aljaz Bozic, Michael Zollhöfer et al.
Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
Mingyuan Zhou, Huangjie Zheng, Yi Gu et al.
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
Maximillian Chen, Ruoxi Sun, Tomas Pfister et al.
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
Derong Xu, Xinhang Li, Ziheng Zhang et al.
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng, Tri Cao, Zhirui Chen et al.
SuperBPE: Space Travel for Language Models
Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.
Robust Autonomy Emerges from Self-Play
Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.
Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming
Yilun Hao, Yang Zhang, Chuchu Fan
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
Haoran Xu, Kenton Murray, Philipp Koehn et al.
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models
Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother et al.
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng, Diego Doimo, Corentin Kervadec et al.
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang, Dian Yu, Baolin Peng et al.
Tensor Product Attention Is All You Need
Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
Xinze Li, Sen Mei, Zhenghao Liu et al.
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.
On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning
Alvaro Arroyo, Alessio Gravina, Benjamin Gutteridge et al.
What to align in multimodal contrastive learning?
Benoit Dufumier, Javiera Castillo Navarro, Devis Tuia et al.
The Leaderboard Illusion
Shivalika Singh, Yiyang Nan, Alex Wang et al.
On the Emergence of Position Bias in Transformers
Xinyi Wu, Yifei Wang, Stefanie Jegelka et al.
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
Mintong Kang, Bo Li
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola et al.
AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models
Zheng Lian, Haoyu Chen, Lan Chen et al.
Interpreting the Second-Order Effects of Neurons in CLIP
Yossi Gandelsman, Alexei Efros, Jacob Steinhardt
TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents
Geon Lee, Wenchao Yu, Kijung Shin et al.
Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
Ge Wu, Shen Zhang, Ruijing Shi et al.
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenda Xu, Rujun Han, Zifeng Wang et al.
Guided Real Image Dehazing Using YCbCr Color Space
Wenxuan Fang, Junkai Fan, Yu Zheng et al.
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang, Yeyun Gong, Xiao Liu et al.
On the Relation between Trainability and Dequantization of Variational Quantum Learning Models
Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.
MAT-Agent: Adaptive Multi-Agent Training Optimization
jusheng zhang, Kaitong Cai, Yijia Fan et al.
AgentRefine: Enhancing Agent Generalization through Refinement Tuning
Dayuan Fu, Keqing He, Yejie Wang et al.
World-consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
Will Merrill, Ashish Sabharwal
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
Ximing Lu, Melanie Sclar, Skyler Hallinan et al.
VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing
Xiangpeng Yang, Linchao Zhu, Hehe Fan et al.
EgoLife: Towards Egocentric Life Assistant
Jingkang Yang, Shuai Liu, Hongming Guo et al.
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang, Zhong-Zhi Li, Fei Yin et al.
Fair Text-to-Image Diffusion via Fair Mapping
Jia Li, Lijie Hu, Jingfeng Zhang et al.
3D-HGS: 3D Half-Gaussian Splatting
Haolin Li, Jinyang Liu, Mario Sznaier et al.
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li, Bing Hu, Rui Shao et al.
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Zhongwei Ren, Yunchao Wei, Xun Guo et al.
Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.
gRNAde: Geometric Deep Learning for 3D RNA inverse design
Chaitanya Joshi, Arian Jamasb, Ramon Viñas et al.
Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating
Olivia Wiles, Chuhan Zhang, Isabela Albuquerque et al.
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Taowen Wang, Cheng Han, James Liang et al.
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.
Hyper-Connections
Defa Zhu, Hongzhi Huang, Zihao Huang et al.
Reasoning Models Better Express Their Confidence
Dongkeun Yoon, Seungone Kim, Sohee Yang et al.
Diverging Preferences: When do Annotators Disagree and do Models Know?
Michael Zhang, Zhilin Wang, Jena Hwang et al.
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.
DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion
Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Zixuan Ye, Huijuan Huang, Xintao Wang et al.
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Sangmin Bae, Yujin Kim, Reza Bayat et al.
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang, Yucheng Zhao, Tiancai Wang et al.
Scaling Wearable Foundation Models
Girish Narayanswamy, Xin Liu, Kumar Ayush et al.
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
Chaoyu Li, Eun Woo Im, Pooyan Fazli
Improved Noise Schedule for Diffusion Training
Tiankai Hang, Shuyang Gu, Jianmin Bao et al.
Empowering LLMs to Understand and Generate Complex Vector Graphics
XiMing Xing, Juncheng Hu, Guotao Liang et al.
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Lital Binyamin, Yoad Tewel, Hilit Segev et al.
The dark side of the forces: assessing non-conservative force models for atomistic machine learning
Filippo Bigi, Marcel Langer, Michele Ceriotti
Steering Large Language Model Activations in Sparse Spaces
Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.
PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models
Minghao Chen, Roman Shapovalov, Iro Laina et al.
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey, Bin Zhang, Lorenzo Noci et al.
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu, Yuhang Ma, Zhen Yang et al.
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun, Mark Rothermel, Marcus Rohrbach et al.
High-Dimensional Prediction for Sequential Decision Making
Georgy Noarov, Ramya Ramalingam, Aaron Roth et al.
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Qihan Huang, Siming Fu, Jinlong Liu et al.
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Yuxuan Zhang, Qing Zhang, Yiren Song et al.
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Qingdong He, Jiangning Zhang, Jinlong Peng et al.
Evolutionary Large Language Model for Automated Feature Transformation
Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.
Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series
Yuxiao Hu, Qian Li, Dongxiao Zhang et al.
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
Chi Zhang, Huaping Zhong, Kuan Zhang et al.
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
Zhongwei Wan, Zhihao Dou, Che Liu et al.
Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents
Yifei Zhou, Qianlan Yang, Kaixiang Lin et al.
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan, Jiaze Li, Jianzhong Ju et al.
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
Zhejun Zhang, Peter Karkus, Maximilian Igl et al.
Cross-modal Information Flow in Multimodal Large Language Models
Zhi Zhang, Srishti Yadav, Fengze Han et al.
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Katie Matton, Robert Ness, John Guttag et al.
Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models
Ruofan Liang, Žan Gojčič, Huan Ling et al.
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
Yiyou Sun, Shawn Hu, Georgia Zhou et al.
Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms
Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Ziran Qin, Yuchen Cao, Mingbao Lin et al.
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
Aiwei Liu, Haoping Bai, Zhiyun Lu et al.
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong, Lujun Li, Yuedong Zhong et al.
ACPBench: Reasoning About Action, Change, and Planning
Harsha Kokel, Michael Katz, Kavitha Srinivas et al.
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
Guo Chen, Zhiqi Li, Shihao Wang et al.
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang, Wentao Yang, Songxuan Lai et al.
Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong, Li Dong, Xingxing Zhang et al.
ICLR: In-Context Learning of Representations
Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.
DarkIR: Robust Low-Light Image Restoration
Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks
Hailong Guo, Bohan Zeng, Yiren Song et al.
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu, Shibo Hao, Zhiting Hu et al.
Number Cookbook: Number Understanding of Language Models and How to Improve It
Haotong Yang, Yi Hu, Shijia Kang et al.
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschläger, Jannes Elstner, Simon Geisler et al.
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Weifeng Lin, Xinyu Wei, Ruichuan An et al.
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences
Hongkai Zheng, Wenda Chu, Bingliang Zhang et al.
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Huiyi Wang, Haodong Lu, Lina Yao et al.
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Yang Liu, Chuanchen Luo, Zhongkai Mao et al.
OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation
Yuchen Lin, Chenguo Lin, Jianjin Xu et al.
Retrieval-Augmented Generation with Conflicting Evidence
Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo, Liangbing Zhao, Sayak Paul et al.
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Yingyu Liang, Jiangxuan Long, Zhenmei Shi et al.
Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data
Asad Aali, Giannis Daras, Brett Levac et al.
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Xiu Yuan, Tongzhou Mu, Stone Tao et al.
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin, Shangqian Gao, James Smith et al.
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova, Erik Brinkman, Krithika Iyer et al.
MonSter: Marry Monodepth to Stereo Unleashes Power
JunDa Cheng, Longliang Liu, Gangwei Xu et al.
LT3SD: Latent Trees for 3D Scene Diffusion
Quan Meng, Lei Li, Matthias Nießner et al.
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
Alireza Rezazadeh, Zichao Li, Wei Wei et al.
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan, Matanel Oren, Yuval Reif et al.
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
Jintao Zhang, Jia wei, Haoxu Wang et al.