Most Cited 2025 Paper "parameterized environment configurations" Papers
21,856 papers found • Page 1 of 110
Conference
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Clemencia Siro, Guy Gur-Ari, Gaurav Mishra et al.
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu, Peixian Chen, Yunhang Shen et al.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain, Han, Alex Gu et al.
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin, Zhelun Shi, Jiwen Yu et al.
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Javier Rando, Tony Wang, Stewart Slocum et al.
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Jipeng Zhang, Hanze Dong, Tong Zhang et al.
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo, Qingfeng Sun, Can Xu et al.
VGGT: Visual Geometry Grounded Transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev et al.
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue, Zhiqi Chen, Rui Lu et al.
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie, Weijia Mao, Zechen Bai et al.
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim et al.
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Chenhao Tan, Robert Ness, Amit Sharma et al.
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, francesco croce, Nicolas Flammarion
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu, Lingxuan Wu, Bangguo Li et al.
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Xingyao Wang, Boxuan Li, Yufan Song et al.
Generative Verifiers: Reward Modeling as Next-Token Prediction
Lunjun Zhang, Arian Hosseini, Hritik Bansal et al.
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang et al.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang, Shusheng Yang, Anjali W. Gupta et al.
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.
From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline
Tianle Li, Wei-Lin Chiang, Evan Frick et al.
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.
Training Language Models to Self-Correct via Reinforcement Learning
Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.
Scaling and evaluating sparse autoencoders
Leo Gao, Tom Dupre la Tour, Henk Tillman et al.
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou, Lili Yu, Arun Babu et al.
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong, Delong Ran, Jinyuan Liu et al.
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang, Jue Wang, Ben Athiwaratkun et al.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Junyi Zhang, Charles Herrmann, Junhwa Hur et al.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu, Xinggang Wang, Xinlong Wang
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.
OmniGen: Unified Image Generation
Shitao Xiao, Yueze Wang, Junjie Zhou et al.
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks, Can Rager, Eric Michaud et al.
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
yi yang, Xiaoxuan He, Hongkun Pan et al.
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye, Haiyang Xu, Haowei Liu et al.
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li et al.
Continuous 3D Perception Model with Persistent State
Qianqian Wang, Yifei Zhang, Aleksander Holynski et al.
LoRA Learns Less and Forgets Less
Jonathan Frankle, Jose Javier Gonzalez Ortiz, Cody Blakeney et al.
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Jimeng Sun, Shubhendu Trivedi, Zhen Lin
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin, Zhicheng Sun, Ningyuan Li et al.
Generative Representational Instruction Tuning
Niklas Muennighoff, Hongjin SU, Liang Wang et al.
OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang et al.
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
Nikita Karaev, Iurii Makarov, Jianyuan Wang et al.
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang, zehai he, Wenyi Hong et al.
Self-Play Preference Optimization for Language Model Alignment
Yue Wu, Zhiqing Sun, Rina Hughes et al.
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao, Yao Lu, Moo Jin Kim et al.
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan, Rui Xie, Penghao Zhou et al.
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.
Infinity∞: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Jian Han, Jinlai Liu, Yi Jiang et al.
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri, Melissa Z Pan, Shuyi Yang et al.
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu, Xinchao Wang
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Jingyang Ou, Shen Nie, Kaiwen Xue et al.
One Step Diffusion via Shortcut Models
Kevin Frans, Danijar Hafner, Sergey Levine et al.
Inverse Scaling: When Bigger Isn't Better
Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Chris Rawles, Sarah Clinckemaillie, Yifan Chang et al.
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
Revisiting Feature Prediction for Learning Visual Representations from Video
Quentin Garrido, Yann LeCun, Michael Rabbat et al.
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Xiaoxi Li, Jiajie Jin, Guanting Dong et al.
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang, Chao Qu, Zuming Huang et al.
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Jiahui Gao, Renjie Pi, Jipeng Zhang et al.
VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola, Aaron Gokaslan, Justin Chiu et al.
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Yiheng Xu, Zekun Wang, Junli Wang et al.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Jingfeng Yao, Bin Yang, Xinggang Wang
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi, Jaechan Lee, Yangsibo Huang et al.
Diffusion Models Are Real-Time Game Engines
Dani Valevski, Yaniv Leviathan, Moab Arar et al.
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang, Haoyue Zhan, Liwei Liu et al.
Training Language Models to Reason Efficiently
Daman Arora, Andrea Zanette
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.
ToolRL: Reward is All Tool Learning Needs
Cheng Qian, Emre Can Acikgoz, Qi He et al.
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
Yangzhen Wu, Zhiqing Sun, Shanda Li et al.
Gated Delta Networks: Improving Mamba2 with Delta Rule
Songlin Yang, Jan Kautz, Ali Hatamizadeh
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu, Wilson Yan, Matei Zaharia et al.
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu, Zheng Liu, Peitian Zhang et al.
Mean Flows for One-step Generative Modeling
Zhengyang Geng, Mingyang Deng, Xingjian Bai et al.
Physics of Language Models: Part 3.2, Knowledge Manipulation
Zeyuan Allen-Zhu, Yuanzhi Li
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie, Xiangyu Qi, Yi Zeng et al.
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
Junfeng Fang, Houcheng Jiang, Kun Wang et al.
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
Xuanchi Ren, Tianchang Shen, Jiahui Huang et al.
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.
Diffusion Policy Policy Optimization
Allen Ren, Justin Lidard, Lars Ankile et al.
Navigation World Models
Amir Bar, Gaoyue Zhou, Danny Tran et al.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong, Shivam Agarwal, Yizhe Zhang et al.
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
Bofei Gao, Feifan Song, Zhe Yang et al.
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu et al.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Jonas Geiping, Sean McLeish, Neel Jain et al.
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan, Xingyao Wang, Graham Neubig et al.
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu, Haojia Lin, Xiong Wang et al.
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities
CHENMING ZHU, Tai Wang, Wenwei Zhang et al.
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Jun Shern Chan, Neil Chowdhury, Oliver Jaffe et al.
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang, Shoutao Guo, Yan Zhou et al.
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver
Zhenting Qi, Mingyuan MA, Jiahang Xu et al.
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji, Ziyue Jiang, Wen Wang et al.
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Thomas Bush, Stephen Chung, Usman Anwar et al.
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin, Linjie Li, Difei Gao et al.
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Xun Huang, Zhengqi Li, Guande He et al.
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Ziyang Li, Saikat Dutta, Mayur Naik
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds
Jiahui Lei, Yijia Weng, Adam W Harley et al.
WonderWorld: Interactive 3D Scene Generation from a Single Image
Hong-Xing Yu, Haoyi Duan, Charles Herrmann et al.
Automated Design of Agentic Systems
Shengran Hu, Cong Lu, Jeff Clune
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He, Haodong Li, Wei Yin et al.
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Liao Qu, Huichao Zhang, Yiheng Liu et al.
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Tianwei Yin, Qiang Zhang, Richard Zhang et al.
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Xeron Du, Yifan Yao, Kaijing Ma et al.
Layer by Layer: Uncovering Hidden Representations in Language Models
Oscar Skean, Md Rifat Arefin, Dan Zhao et al.
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad, Michael Toker, Zorik Gekhman et al.
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
Carles Domingo i Enrich, Michal Drozdzal, Brian Karrer et al.
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi, Fuxiao Liu, Shihao Wang et al.
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
Mohammadreza Pourreza, Hailong Li, Ruoxi Sun et al.
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu, Qihang Zhao, Jason Eshraghian et al.
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren, Yang Liu, Yadong Lu et al.
Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection
Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.
C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen et al.
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
Chengzu Li, Wenshan Wu, Huanyu Zhang et al.
Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation
Wei Wang, Haifeng Xia, Chao Huang et al.
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Fanqi Lin, Yingdong Hu, Pingyue Sheng et al.
ToolACE: Winning the Points of LLM Function Calling
Weiwen Liu, Xu Huang, Xingshan Zeng et al.
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin et al.
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu et al.
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang, XINGYU FU, James Y. Huang et al.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi, Xiao Liu, Iat Long Iong et al.
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan, Tianhong Li, Siyang Qin et al.
Improving Video Generation with Human Feedback
Jie Liu, Gongye Liu, Jiajun Liang et al.
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Zeyuan Allen-Zhu, Yuanzhi Li
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley, Daniel Tan, Niels Warncke et al.
Taming Rectified Flow for Inversion and Editing
Jiangshan Wang, Junfu Pu, Zhongang Qi et al.
Scaling up Masked Diffusion Models on Text
Shen Nie, Fengqi Zhu, Chao Du et al.
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao et al.
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
Zeyi Liao, Lingbo Mo, Chejian Xu et al.
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Yang et al.
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu, Peiran Li, G. Edward Suh et al.
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Rundi Wu, Ruiqi Gao, Ben Poole et al.
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Xierui Wang, Siming Fu, Qihan Huang et al.
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
Kaiwen Zheng, Yongxin Chen, Hanzi Mao et al.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang, Jingyuan Huang, Kai Mei et al.
OmniRe: Omni Urban Scene Reconstruction
Ziyu Chen, Jiawei Yang, Jiahui Huang et al.
A General Framework for Inference-time Scaling and Steering of Diffusion Models
Raghav Singhal, Zachary Horvitz, Ryan Teehan et al.
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang, Yangze Li, Chaoyou Fu et al.
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal, Zongyu Lin, Tianyi Xie et al.
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Huanjin Yao, Jiaxing Huang, Wenhao Wu et al.
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li, Fangyun Wei, Chao Zhang et al.
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
Autoregressive Video Generation without Vector Quantization
Haoge Deng, Ting Pan, Haiwen Diao et al.
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang, Carlos E Jimenez, Alex Zhang et al.
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Yiwen Chen, Tong He, Di Huang et al.
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Saaket Agashe, Jiuzhou Han, Shuyu Gan et al.
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.
On the self-verification limitations of large language models on reasoning and planning tasks
Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai, Jiajie Zhang, Xin Lv et al.
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Xianjie Wu, Jian Yang, Linzheng Chai et al.
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Mingjie Liu, Shizhe Diao, Ximing Lu et al.
FoundationStereo: Zero-Shot Stereo Matching
Bowen Wen, Matthew Trepte, Oluwaseun Joseph Aribido et al.
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
Yuxin Zuo, Shang Qu, Yifei Li et al.
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Tian Ye, Zicheng Xu, Yuanzhi Li et al.
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo, Yiyi Zhou, Yuxin Zhang et al.
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Litu Rout, Yujia Chen, Nataniel Ruiz et al.
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Jiaxing Cui, Wei-Lin Chiang, Ion Stoica et al.
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu, Zijun Yao, Rui Min et al.
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Matt Deitke, Christopher Clark, Sangho Lee et al.
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao, Zitao Liu et al.
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin et al.
Transformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He et al.
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng, Junlin Lv, Yukun Cao et al.
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu, Jiaxuan Gao, Xujie Shen et al.
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong, Xiyao Wang, Dong Guo et al.
MoBA: Mixture of Block Attention for Long-Context LLMs
Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.
DEIM: DETR with Improved Matching for Fast Convergence
Shihua Huang, Zhichao Lu, Xiaodong Cun et al.
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou, Yan Shu, Bo Zhao et al.
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views
Shangzhan Zhang, Jianyuan Wang, Yinghao Xu et al.
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Xinlei Chen, Zhuang Liu, Saining Xie et al.
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng, Yuxin Cui, Haomiao Tang et al.
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse, Hugues Sibille, Tony Wu et al.
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie, Zhenheng Yang, Mike Zheng Shou
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
When Attention Sink Emerges in Language Models: An Empirical View
Xiangming Gu, Tianyu Pang, Chao Du et al.
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Haian Jin, Hanwen Jiang, Hao Tan et al.
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji, Huajie Tan, Jiayu Shi et al.