Most Cited 2025 "lossless language model inference" Papers
22,274 papers found • Page 9 of 112
Conference
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas Zollo, Andrew Siah, Naimeng Ye et al.
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
Yunzhen Feng, Elvis Dohmatob, Pu Yang et al.
Erasing Undesirable Influence in Diffusion Models
Jing Wu, Trung Le, Munawar Hayat et al.
Simple ReFlow: Improved Techniques for Fast Flow Models
Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
Shuming Liu, Chen Zhao, Tianqi Xu et al.
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Yiming Zhang, Harshita Diddee, Susan Holm et al.
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Zimu Lu, Aojun Zhou, Ke Wang et al.
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park, Jeehye Na, Jinyoung Kim et al.
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
Liang Pan, Zeshi Yang, Zhiyang Dou et al.
Uncovering Overfitting in Large Language Model Editing
Mengqi Zhang, Xiaotian Ye, Qiang Liu et al.
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu, Changsheng Zhao, Hanxian Huang et al.
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang, Yixiao Fang, Peng Xing et al.
Star Attention: Efficient LLM Inference over Long Sequences
Shantanu Acharya, Fei Jia, Boris Ginsburg
Chain-of-Retrieval Augmented Generation
Liang Wang, Haonan Chen, Nan Yang et al.
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi, Shunfan Zheng, Linlin Wang et al.
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.
A Geometric Framework for Understanding Memorization in Generative Models
Brendan Ross, Hamidreza Kamkari, Tongzi Wu et al.
The Superposition of Diffusion Models Using the Itô Density Estimator
Marta Skreta, Lazar Atanackovic, Joey Bose et al.
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou, Jiachun Jin, Zhihong Liu et al.
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Zijian Liu, Zhengyuan Zhou
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani et al.
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
Yuqi Wu, Wenzhao Zheng, Jie Zhou et al.
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Keon Lee, Dong Won Kim, Jaehyeon Kim et al.
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Belinda Mo, Kyssen Yu, Joshua Kazdan et al.
MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
Cong Wang, Di Kang, Heyi Sun et al.
CREAM: Consistency Regularized Self-Rewarding Language Models
Zhaoyang Wang, Weilei He, Zhiyuan Liang et al.
Subspace Optimization for Large Language Models with Convergence Guarantees
Yutong He, Pengrui Li, Yipeng Hu et al.
Improving Uncertainty Estimation through Semantically Diverse Language Generation
Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi et al.
AutoPresent: Designing Structured Visuals from Scratch
Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.
Diffusion-based Neural Network Weights Generation
Bedionita Soro, Bruno Andreis, Hayeon Lee et al.
From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models
Etowah Adams, Liam Bai, Minji Lee et al.
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
Haiyang SHEN, Yue Li, Desong Meng et al.
Scalable Equilibrium Sampling with Sequential Boltzmann Generators
Charlie Tan, Joey Bose, Chen Lin et al.
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
Contrastive Localized Language-Image Pre-Training
Hong-You Chen, Zhengfeng Lai, Haotian Zhang et al.
Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems
Ruochen Jiao, Shaoyuan Xie, Justin Yue et al.
Efficient Visual State Space Model for Image Deblurring
Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.
Language-Image Models with 3D Understanding
Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen, Shangzhe Di, Weidi Xie
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
Siyu Xu, Yunke Wang, Chenghao Xia et al.
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang et al.
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang, qiuyu Huang, Junjie Liu et al.
MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors
Qingming LIU, Yuan Liu, Jiepeng Wang et al.
InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling
Muhammad Gohar Javed, chuan guo, Li Cheng et al.
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen, Lin Gu, Liang Li et al.
Results of the Big ANN: NeurIPS’23 competition
Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu, Jintong Li, Yicheng Jiang et al.
Multi-Robot Motion Planning with Diffusion Models
Yorai Shaoul, Itamar Mishani, Shivam Vats et al.
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Rui Xie, Yinhong Liu, Penghao Zhou et al.
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
Minjie Zhu, Yichen Zhu, Ning Liu et al.
MuPT: A Generative Symbolic Music Pretrained Transformer
Xingwei Qu, yuelin bai, Yinghao MA et al.
Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.
When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline
Ming Li, Yongchun Gu, Yi Wang et al.
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Yiming Wang, Pei Zhang, Jialong Tang et al.
Steering Large Language Models between Code Execution and Textual Reasoning
Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun, Yunyi Shen, Jean-Francois Ton
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
Mingzhen Sun, Weining Wang, Li et al.
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Cong Lu, Shengran Hu, Jeff Clune
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.
TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting
Yifan Hu, Guibin Zhang, Peiyuan Liu et al.
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Siyu Wang, Cailian Chen, Xinyi Le et al.
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi et al.
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent, Kyle Hsu, Justin Johnson et al.
Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning
Hao Chen, Jiaming Liu, Chenyang Gu et al.
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
Yun Zhu, Jia-Chen Gu, Caitlin Sikora et al.
What Makes a Good Diffusion Planner for Decision Making?
Haofei Lu, Dongqi Han, Yifei Shen et al.
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
Yuhao Wang, Yang Liu, Aihua Zheng et al.
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
Jiawen Yu, Hairuo Liu, Qiaojun Yu et al.
Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?
Yanchen Xu, Siqi Huang, Hongyuan Zhang et al.
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
Rongchang Xie, Chen Du, Ping Song et al.
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
Yiran Xu, Taesung Park, Richard Zhang et al.
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Wei Shen, Guanlin Liu, Yu Yue et al.
Perception-Guided Jailbreak Against Text-to-Image Models
Yihao Huang, Le Liang, Tianlin Li et al.
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He et al.
EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
Zexuan Yan, Yue Ma, Chang Zou et al.
Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer
Lei Su, Xiaochen Ma, Xuekang Zhu et al.
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.
Estimating Body and Hand Motion in an Ego‑sensed World
Brent Yi, Vickie Ye, Maya Zheng et al.
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula, Shuo Li, Botong Zhang et al.
KV-Edit: Training-Free Image Editing for Precise Background Preservation
Tianrui Zhu, Shiyi Zhang, Jiawei Shao et al.
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
Shenghai Yuan, Xianyi He, Yufan Deng et al.
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
Hengwei Bian, Lingdong Kong, Haozhe Xie et al.
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li, Zhelun Shi, Xuhao Hu et al.
Reward Guided Latent Consistency Distillation
William Wang, Jiachen Li, Weixi Feng et al.
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov et al.
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song, Muxi Diao, Guanting Dong et al.
Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
Bojia Zi, Penghui Ruan, Marco Chen et al.
Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering
Cheng Sun, Jaesung Choe, Charles Loop et al.
Autoformulation of Mathematical Optimization Models Using LLMs
Nicolás Astorga, Tennison Liu, Yuanzhang Xiao et al.
EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration
Allen Nie, Yi Su, Bo Chang et al.
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Junlong Cheng, Bin Fu, Jin Ye et al.
Training a Scientific Reasoning Model for Chemistry
Siddharth Narayanan, James Braza, Ryan-Rhys Griffiths et al.
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.
Diffusion Beats Autoregressive in Data-Constrained Settings
Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Thomas Robert, Mher Safaryan, Ionut-Vlad Modoranu et al.
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang, hongzhen wang, Zonghao Guo et al.
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Zhen Qu, Xian Tao, Xinyi Gong et al.
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee, Xiaohang Li, Qifei Wang et al.
FineVQ: Fine-Grained User Generated Content Video Quality Assessment
Huiyu Duan, Qiang Hu, Wang Jiarui et al.
Concept Bottleneck Large Language Models
Chung-En Sun, Tuomas Oikarinen, Berk Ustun et al.
MagicQuill: An Intelligent Interactive Image Editing System
Zichen Liu, Yue Yu, Hao Ouyang et al.
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
Zewei Zhang, Huan Liu, Jun Chen et al.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang, Yinheng Li, Dan Zhao et al.
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
Guosheng Zhao, Xiaofeng Wang, Chaojun Ni et al.
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang, Zeyu Lu et al.
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
Moral Alignment for LLM Agents
Elizaveta Tennant, Stephen Hailes, Mirco Musolesi
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Lucio La Cava, Andrea Tagarelli
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu, Jie Liu, Jie Tang et al.
EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
Talor Abramovich, Meet Udeshi, Minghao Shao et al.
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou, Haote Yang, Dairong Chen et al.
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
Ke Ji, Jiahao Xu, Tian Liang et al.
REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han, Rui Yang, Huan Liao et al.
DisCo: Graph-Based Disentangled Contrastive Learning for Cold-Start Cross-Domain Recommendation
Hourun Li, Yifan Wang, Zhiping Xiao et al.
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
GenMol: A Drug Discovery Generalist with Discrete Diffusion
Seul Lee, Karsten Kreis, Srimukh Veccham et al.
R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO
Huanjin Yao, Qixiang Yin, Jingyi Zhang et al.
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan, Shurong Zheng, Yousong Zhu et al.
PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization
André Hottung, Mridul Mahajan, Kevin Tierney
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies
Jian Gao, Weidong Cao, Junyi Yang et al.
Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
Jinlong Pang, Na Di, Zhaowei Zhu et al.
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano, Min Woo Sun, James Burgess et al.
FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
Beilin Chu, Xuan Xu, Xin Wang et al.
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Guanyu Zhou, Yibo Yan, Xin Zou et al.
Light3R-SfM: Towards Feed-forward Structure-from-Motion
Sven Elflein, Qunjie Zhou, Laura Leal-Taixe
Discrete Copula Diffusion
Anji Liu, Oliver Broadrick, Mathias Niepert et al.
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Hongxiang Li, Yaowei Li, Yuhang Yang et al.
Robust Tracking via Mamba-based Context-aware Token Learning
Jinxia Xie, Bineng Zhong, Qihua Liang et al.
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Guangqi Jiang, Yifei Sun, Tao Huang et al.
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
João Loula, Benjamin LeBrun, Li Du et al.
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen, sunhaoze, Tianpeng Li et al.
ResearchTown: Simulator of Human Research Community
Haofei Yu, Zhaochen Hong, Zirui Cheng et al.
Data Selection via Optimal Control for Language Models
Yuxian Gu, Li Dong, Hongning Wang et al.
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.
Language Representations Can be What Recommenders Need: Findings and Potentials
Leheng Sheng, An Zhang, Yi Zhang et al.
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang, Ziwei Zheng, Boxu Chen et al.
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia, mingsheng li, Hancheng Ye et al.
Scaling Laws for Native Multimodal Models
Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa et al.
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Canyu Zhao, Yanlong Sun, Mingyu Liu et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu, Haoyu Liu, Hongming Fu et al.
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
Yongxian Wei, Anke Tang, Li Shen et al.
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li, Lu Yin, Shiwei Liu
Underdamped Diffusion Bridges with Applications to Sampling
Denis Blessing, Julius Berner, Lorenz Richter et al.
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Jiaru Zou, Ling Yang, Jingwen Gu et al.
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
Bowei Zhang, Lei Ke, Adam Harley et al.
BLADE: Enhancing Black-Box Large Language Models with Small Domain-Specific Models
Haitao Li, Qingyao Ai, Jia Chen et al.
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.
Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models
Jie Ma, Zhitao Gao, Qi Chai et al.
Can LLMs Solve Longer Math Word Problems Better?
Xin Xu, Tong Xiao, Zitong Chao et al.
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation
Xiaofeng Wang, Kang Zhao, Feng Liu et al.
How Do Large Language Monkeys Get Their Power (Laws)?
Rylan Schaeffer, Joshua Kazdan, John Hughes et al.
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
Yun Liu, Chengwen Zhang, Ruofan Xing et al.
AffordDP: Generalizable Diffusion Policy with Transferable Affordance
Shijie Wu, Yihang Zhu, Yunao Huang et al.
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
Siyuan Liang, Jiawei Liang, Tianyu Pang et al.
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Julien Siems, Timur Carstensen, Arber Zela et al.
DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance
Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.
MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
Rui Ye, shuo tang, Rui Ge et al.
An Intelligent Agentic System for Complex Image Restoration Problems
Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.
CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
Wei Li, Renshan Zhang, Rui Shao et al.
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin, Xinyu Wei, Renrui Zhang et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck, Maximilian Baader, Martin Vechev
Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.
FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model
Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models
Kazi Hasan Ibn Arif, JinYi Yoon, Dimitrios S. Nikolopoulos et al.
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang, Wei Cheng, Sijin Chen et al.
Artificial Kuramoto Oscillatory Neurons
Takeru Miyato, Sindy Löwe, Andreas Geiger et al.
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation
Qidong Liu, Xian Wu, Wanyu Wang et al.
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
Zixuan Gong, Qi Zhang, Guangyin Bao et al.
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao et al.
Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case
Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou, Xu Gao, Zichong Chen et al.
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka et al.
3D Vision-Language Gaussian Splatting
Qucheng Peng, Benjamin Planche, Zhongpai Gao et al.
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu, Meng Lou, Yizhou Yu
Self-Adapting Language Models
Adam Zweiger, Jyo Pari, Han Guo et al.
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.