Most Cited 2025 "neurosymbolic method" Papers

22,274 papers found • Page 3 of 112

#401

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Mufei Li, Siqi Miao, Pan Li

ICLR 2025posterarXiv:2410.20724
57
citations
#402

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen, Zichen Wen, Yichao Du et al.

NEURIPS 2025posterarXiv:2407.04842
57
citations
#403

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

Julian Parker, Anton Smirnov, Jordi Pons et al.

ICLR 2025posterarXiv:2411.19842
57
citations
#404

Long Context Tuning for Video Generation

Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.

ICCV 2025posterarXiv:2503.10589
57
citations
#405

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Edward LOO, Tianyu HUANG, Peng Li et al.

CVPR 2025highlightarXiv:2412.03079
57
citations
#406

MUSt3R: Multi-view Network for Stereo 3D Reconstruction

Yohann Cabon, Lucas Stoffl, Leonid Antsfeld et al.

CVPR 2025highlightarXiv:2503.01661
57
citations
#407

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Hanshi Sun, Li-Wen Chang, Wenlei Bao et al.

ICML 2025spotlightarXiv:2410.21465
56
citations
#408

RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts

Hjalmar Wijk, Tao Lin, Joel Becker et al.

ICML 2025spotlightarXiv:2411.15114
56
citations
#409

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Chen Ziwen, Hao Tan, Kai Zhang et al.

ICCV 2025highlightarXiv:2410.12781
56
citations
#410

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Guangkai Xu, yongtao ge, Mingyu Liu et al.

ICLR 2025posterarXiv:2403.06090
56
citations
#411

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NEURIPS 2025posterarXiv:2503.19470
56
citations
#412

Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems

Guibin Zhang, Yanwei Yue, Zhixun Li et al.

ICLR 2025oralarXiv:2410.02506
56
citations
#413

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Lvmin Zhang, Shengqu Cai, Muyang Li et al.

NEURIPS 2025spotlightarXiv:2504.12626
56
citations
#414

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois et al.

CVPR 2025posterarXiv:2412.10360
55
citations
#415

Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Yangning Li, Yinghui Li, Xinyu Wang et al.

ICLR 2025posterarXiv:2411.02937
55
citations
#416

Hymba: A Hybrid-head Architecture for Small Language Models

Xin Dong, Yonggan Fu, Shizhe Diao et al.

ICLR 2025posterarXiv:2411.13676
55
citations
#417

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Qi Qin, Le Zhuo, Yi Xin et al.

ICCV 2025posterarXiv:2503.21758
55
citations
#418

Self-Improvement in Language Models: The Sharpening Mechanism

Audrey Huang, Adam Block, Dylan Foster et al.

ICLR 2025posterarXiv:2412.01951
55
citations
#419

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Lijie Liu, Tianxiang Ma, Bingchuan Li et al.

ICCV 2025highlightarXiv:2502.11079
55
citations
#420

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

ICML 2025posterarXiv:2410.02958
55
citations
#421

Sundial: A Family of Highly Capable Time Series Foundation Models

Yong Liu, Guo Qin, Zhiyuan Shi et al.

ICML 2025oralarXiv:2502.00816
55
citations
#422

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Parshin Shojaee, Kazem Meidani, Shashank Gupta et al.

ICLR 2025posterarXiv:2404.18400
55
citations
#423

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025posterarXiv:2506.04308
55
citations
#424

Simplifying Deep Temporal Difference Learning

Matteo Gallici, Mattie Fellows, Benjamin Ellis et al.

ICLR 2025oralarXiv:2407.04811
55
citations
#425

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Jianhong Bai, Menghan Xia, Xintao WANG et al.

ICLR 2025posterarXiv:2412.07760
55
citations
#426

Controlling Space and Time with Diffusion Models

Daniel Watson, Saurabh Saxena, Lala Li et al.

ICLR 2025posterarXiv:2407.07860
55
citations
#427

AgentSquare: Automatic LLM Agent Search in Modular Design Space

Yu Shang, Yu Li, Keyu Zhao et al.

ICLR 2025posterarXiv:2410.06153
55
citations
#428

Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami, Or Patashnik, Ohad Fried et al.

CVPR 2025posterarXiv:2411.14430
55
citations
#429

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Zebin Xing, Xingyu Zhang, Yang Hu et al.

CVPR 2025posterarXiv:2503.05689
54
citations
#430

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao et al.

NEURIPS 2025spotlightarXiv:2503.22679
54
citations
#431

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning

Yu Fu, Zefan Cai, Abedelkadir Asi et al.

ICLR 2025posterarXiv:2410.19258
54
citations
#432

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang et al.

CVPR 2025posterarXiv:2411.19548
54
citations
#433

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Songhao Han, Wei Huang, Hairong Shi et al.

CVPR 2025posterarXiv:2411.14794
54
citations
#434

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Lei Chen, Yuan Meng, Chen Tang et al.

CVPR 2025posterarXiv:2406.17343
54
citations
#435

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Zhengyao Lyu, Chenyang Si, Junhao Song et al.

ICLR 2025oralarXiv:2410.19355
54
citations
#436

How to Evaluate Reward Models for RLHF

Evan Frick, Tianle Li, Connor Chen et al.

ICLR 2025posterarXiv:2410.14872
54
citations
#437

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei et al.

CVPR 2025posterarXiv:2412.12507
54
citations
#438

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Daoguang Zan, Zhirong Huang, Wei Liu et al.

NEURIPS 2025posterarXiv:2504.02605
54
citations
#439

Wonderland: Navigating 3D Scenes from a Single Image

Hanwen Liang, Junli Cao, Vidit Goel et al.

CVPR 2025posterarXiv:2412.12091
54
citations
#440

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Shuangrui Ding, Rui Qian, Xiaoyi Dong et al.

ICCV 2025posterarXiv:2410.16268
54
citations
#441

Inductive Moment Matching

Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song

ICML 2025oralarXiv:2503.07565
54
citations
#442

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

CVPR 2025posterarXiv:2411.10061
54
citations
#443

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks

Kaijing Ma, Xeron Du, Yunran Wang et al.

ICLR 2025posterarXiv:2410.06526
54
citations
#444

Task Singular Vectors: Reducing Task Interference in Model Merging

Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli et al.

CVPR 2025posterarXiv:2412.00081
53
citations
#445

TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining

Wanchao Liang, Tianyu Liu, Less Wright et al.

ICLR 2025poster
53
citations
#446

CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control

Guy Tevet, Sigal Raab, Setareh Cohan et al.

ICLR 2025posterarXiv:2410.03441
53
citations
#447

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Swarnadeep Saha, Xian Li, Marjan Ghazvininejad et al.

ICML 2025posterarXiv:2501.18099
53
citations
#448

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Chao Pang, Xingxing Weng, Jiang Wu et al.

AAAI 2025paperarXiv:2403.20213
53
citations
#449

Goku: Flow Based Video Generative Foundation Models

Shoufa Chen, Chongjian GE, Yuqi Zhang et al.

CVPR 2025highlightarXiv:2502.04896
53
citations
#450

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Tianbao Xie, Jiaqi Deng, Xiaochuan Li et al.

NEURIPS 2025spotlightarXiv:2505.13227
53
citations
#451

Multiple Object Tracking as ID Prediction

Ruopeng Gao, Ji Qi, Limin Wang

CVPR 2025posterarXiv:2403.16848
53
citations
#452

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

Rolandos Alexandros Potamias, Jinglei Zhang, Jiankang Deng et al.

CVPR 2025posterarXiv:2409.12259
53
citations
#453

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Zehui Chen, Kuikun Liu, Qiuchen Wang et al.

ICLR 2025posterarXiv:2407.20183
53
citations
#454

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen et al.

NEURIPS 2025posterarXiv:2507.16815
53
citations
#455

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025paperarXiv:2504.01382
53
citations
#456

EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

Wangbo Yu, Chaoran Feng, Jianing Li et al.

ICCV 2025posterarXiv:2405.20224
53
citations
#457

End-to-End Driving with Online Trajectory Evaluation via BEV World Model

Yingyan Li, Yuqi Wang, Yang Liu et al.

ICCV 2025posterarXiv:2504.01941
53
citations
#458

Proteina: Scaling Flow-based Protein Structure Generative Models

Tomas Geffner, Kieran Didi, Zuobai Zhang et al.

ICLR 2025posterarXiv:2503.00710
53
citations
#459

Tell me about yourself: LLMs are aware of their learned behaviors

Jan Betley, Xuchan Bao, Martín Soto et al.

ICLR 2025oralarXiv:2501.11120
53
citations
#460

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Jia Guo, Shuai Lu, Weihang Zhang et al.

CVPR 2025posterarXiv:2405.14325
52
citations
#461

Flow Q-Learning

Seohong Park, Qiyang Li, Sergey Levine

ICML 2025posterarXiv:2502.02538
52
citations
#462

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Seyedmorteza Sadat, Otmar Hilliges, Romann Weber

ICLR 2025posterarXiv:2410.02416
52
citations
#463

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models

Weihao Ye, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2409.10197
52
citations
#464

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar et al.

COLM 2025paper
52
citations
#465

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Xiangyu Wang, Donglin Yang, ziqin wang et al.

ICLR 2025posterarXiv:2410.07087
52
citations
#466

Physics-Informed Diffusion Models

Jan-Hendrik Bastek, WaiChing Sun, Dennis Kochmann

ICLR 2025posterarXiv:2403.14404
52
citations
#467

RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph

Siru Ouyang, Wenhao Yu, Kaixin Ma et al.

ICLR 2025posterarXiv:2410.14684
52
citations
#468

BOND: Aligning LLMs with Best-of-N Distillation

Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot-Desenonges et al.

ICLR 2025posterarXiv:2407.14622
52
citations
#469

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Ziyang Ma, Yinghao Ma, Yanqiao Zhu et al.

NEURIPS 2025posterarXiv:2505.13032
52
citations
#470

A Decade's Battle on Dataset Bias: Are We There Yet?

Zhuang Liu, Kaiming He

ICLR 2025posterarXiv:2403.08632
52
citations
#471

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Adam Karvonen, Can Rager, Johnny Lin et al.

ICML 2025posterarXiv:2503.09532
51
citations
#472

Does Spatial Cognition Emerge in Frontier Models?

Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl et al.

ICLR 2025posterarXiv:2410.06468
51
citations
#473

NoLiMa: Long-Context Evaluation Beyond Literal Matching

Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt et al.

ICML 2025posterarXiv:2502.05167
51
citations
#474

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.

AAAI 2025paperarXiv:2408.13770
51
citations
#475

Inference Scaling for Long-Context Retrieval Augmented Generation

Zhenrui Yue, Honglei Zhuang, Aijun Bai et al.

ICLR 2025posterarXiv:2410.04343
51
citations
#476

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Xunhao Lai, Jianqiao Lu, Yao Luo et al.

ICLR 2025posterarXiv:2502.20766
51
citations
#477

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.

AAAI 2025paperarXiv:2405.15304
51
citations
#478

OmniBench: Towards The Future of Universal Omni-Language Models

Yizhi Li, Ge Zhang, Yinghao Ma et al.

NEURIPS 2025posterarXiv:2409.15272
51
citations
#479

Calibrating Large Language Models with Sample Consistency

Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.

AAAI 2025paperarXiv:2402.13904
50
citations
#480

MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation

Jinfeng Xu, Zheyu Chen, Shuo Yang et al.

AAAI 2025paperarXiv:2402.19407
50
citations
#481

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Yiheng Xu, Dunjie Lu, Zhennan Shen et al.

ICLR 2025posterarXiv:2412.09605
50
citations
#482

Model merging with SVD to tie the Knots

George Stoica, Pratik Ramesh, Boglarka Ecsedi et al.

ICLR 2025posterarXiv:2410.19735
50
citations
#483

WorldMem: Long-term Consistent World Simulation with Memory

Zeqi Xiao, Yushi LAN, Yifan Zhou et al.

NEURIPS 2025oralarXiv:2504.12369
50
citations
#484

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.

ICLR 2025posterarXiv:2410.08847
49
citations
#485

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

Meng YOU, Zhiyu Zhu, Hui LIU et al.

ICLR 2025posterarXiv:2405.15364
49
citations
#486

Energy-Based Diffusion Language Models for Text Generation

Minkai Xu, Tomas Geffner, Karsten Kreis et al.

ICLR 2025posterarXiv:2410.21357
49
citations
#487

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Baorui Ma, Huachen Gao, Haoge Deng et al.

CVPR 2025highlightarXiv:2412.06699
49
citations
#488

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Xuannan Liu, Zekun Li, Pei Li et al.

ICLR 2025posterarXiv:2406.08772
49
citations
#489

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Sucheng Ren, Qihang Yu, Ju He et al.

ICCV 2025posterarXiv:2502.20388
49
citations
#490

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Muzhi Dai, Chenxu Yang, Qingyi Si

NEURIPS 2025oralarXiv:2505.07686
49
citations
#491

Describe Anything: Detailed Localized Image and Video Captioning

Long Lian, Yifan Ding, Yunhao Ge et al.

ICCV 2025posterarXiv:2504.16072
49
citations
#492

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Qingyun Li, Zhe Chen, Weiyun Wang et al.

ICLR 2025posterarXiv:2406.08418
49
citations
#493

Learning Adaptive Parallel Reasoning with Language Models

Jiayi Pan, Xiuyu Li, Long Lian et al.

COLM 2025paperarXiv:2504.15466
49
citations
#494

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Salman Rahman, Liwei Jiang, James Shiffer et al.

COLM 2025paperarXiv:2504.13203
49
citations
#495

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar et al.

ICLR 2025posterarXiv:2310.15117
48
citations
#496

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

Xingyu Chen, Yue Chen, Yuliang Xiu et al.

ICCV 2025posterarXiv:2503.24391
48
citations
#497

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.

COLM 2025paper
48
citations
#498

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

Arijit Ray, Jiafei Duan, Ellis L Brown II et al.

COLM 2025paperarXiv:2412.07755
48
citations
#499

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

ICLR 2025posterarXiv:2408.16293
48
citations
#500

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.

COLM 2025paperarXiv:2506.20920
48
citations
#501

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Yusuf Roohani, Andrew Lee, Qian Huang et al.

ICLR 2025posterarXiv:2405.17631
48
citations
#502

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

Fu-Yun Wang, Ling Yang, Zhaoyang Huang et al.

ICLR 2025posterarXiv:2410.07303
48
citations
#503

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu et al.

AAAI 2025paperarXiv:2404.12353
48
citations
#504

GRIT: Teaching MLLMs to Think with Images

Yue Fan, Xuehai He, Diji Yang et al.

NEURIPS 2025posterarXiv:2505.15879
48
citations
#505

Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi et al.

ICML 2025poster
48
citations
#506

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Fanqing Meng, Jin Wang, Chuanhao Li et al.

ICLR 2025posterarXiv:2408.02718
48
citations
#507

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

ICLR 2025posterarXiv:2402.18510
48
citations
#508

VinePPO: Refining Credit Assignment in RL Training of LLMs

Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.

ICML 2025posterarXiv:2410.01679
48
citations
#509

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain et al.

ICLR 2025posterarXiv:2406.09415
48
citations
#510

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Han Lin, Jaemin Cho, Abhay Zala et al.

ICLR 2025oralarXiv:2404.09967
48
citations
#511

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.

CVPR 2025posterarXiv:2503.02175
48
citations
#512

Rank1: Test-Time Compute for Reranking in Information Retrieval

Orion Weller, Kathryn Ricci, Eugene Yang et al.

COLM 2025paperarXiv:2502.18418
47
citations
#513

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

Yatin Dandi, Florent Krzakala, Bruno Loureiro et al.

ICLR 2025posterarXiv:2305.18270
47
citations
#514

Towards Interpreting Visual Information Processing in Vision-Language Models

Clement Neo, Luke Ong, Philip Torr et al.

ICLR 2025posterarXiv:2410.07149
47
citations
#515

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NEURIPS 2025posterarXiv:2505.24857
47
citations
#516

ALLaM: Large Language Models for Arabic and English

M Saiful Bari, Yazeed Alnumay, Norah Alzahrani et al.

ICLR 2025posterarXiv:2407.15390
47
citations
#517

Aether: Geometric-Aware Unified World Modeling

Haoyi Zhu, Yifan Wang, Jianjun Zhou et al.

ICCV 2025posterarXiv:2503.18945
47
citations
#518

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Yury Gorishniy, Akim Kotelnikov, Artem Babenko

ICLR 2025posterarXiv:2410.24210
47
citations
#519

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Zhengbo Wang, Jian Liang, Ran He et al.

ICLR 2025posterarXiv:2407.18242
47
citations
#520

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Zhenting Qi, Hanlin Zhang, Eric P Xing et al.

ICLR 2025posterarXiv:2402.17840
47
citations
#521

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Alexander Wettig, Kyle Lo, Sewon Min et al.

ICML 2025posterarXiv:2502.10341
47
citations
#522

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Yinlam Chow, Guy Tennenholtz, Izzeddin Gur et al.

ICLR 2025posterarXiv:2412.15287
47
citations
#523

Language Model Can Listen While Speaking

Ziyang Ma, Yakun Song, Chenpeng Du et al.

AAAI 2025paperarXiv:2408.02622
47
citations
#524

Learn Your Reference Model for Real Good Alignment

Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.

ICLR 2025posterarXiv:2404.09656
47
citations
#525

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Zhihang Lin, Mingbao Lin, Yuan Xie et al.

NEURIPS 2025posterarXiv:2503.22342
47
citations
#526

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani, Idan Shenfeld, Andi Peng et al.

ICLR 2025posterarXiv:2410.04707
47
citations
#527

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Orion Weller, Ben Van Durme, Dawn Lawrie et al.

ICLR 2025posterarXiv:2409.11136
47
citations
#528

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Pan Wang, Qiang Zhou, Yawen Wu et al.

AAAI 2025paperarXiv:2412.12225
47
citations
#529

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers

Daiheng Gao, Shilin Lu, Wenbo Zhou et al.

ICML 2025posterarXiv:2412.20413
47
citations
#530

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Dominic Maggio, Hyungtae Lim, Luca Carlone

NEURIPS 2025posterarXiv:2505.12549
47
citations
#531

Eliminating Position Bias of Language Models: A Mechanistic Approach

Ziqi Wang, Hanlin Zhang, Xiner Li et al.

ICLR 2025posterarXiv:2407.01100
47
citations
#532

Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.

AAAI 2025paperarXiv:2406.15339
46
citations
#533

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Yongxin Zhu, Bocheng Li, Yifei Xin et al.

ICCV 2025posterarXiv:2411.02038
46
citations
#534

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang et al.

CVPR 2025posterarXiv:2412.17808
46
citations
#535

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu et al.

NEURIPS 2025posterarXiv:2504.08600
46
citations
#536

Visual Agents as Fast and Slow Thinkers

Guangyan Sun, Mingyu Jin, Zhenting Wang et al.

ICLR 2025posterarXiv:2408.08862
46
citations
#537

NETS: A Non-equilibrium Transport Sampler

Michael Albergo, Eric Vanden-Eijnden

ICML 2025posterarXiv:2410.02711
46
citations
#538

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

COLM 2025paper
46
citations
#539

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025paperarXiv:2412.14995
46
citations
#540

Vision Language Models are In-Context Value Learners

Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.

ICLR 2025oralarXiv:2411.04549
46
citations
#541

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma et al.

AAAI 2025paperarXiv:2404.14239
46
citations
#542

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Ruili Feng, Han Zhang, Zhilei Shu et al.

NEURIPS 2025posterarXiv:2412.03568
46
citations
#543

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Fan-Yun Sun, Weiyu Liu, Siyi Gu et al.

CVPR 2025posterarXiv:2412.02193
46
citations
#544

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

Ruiyuan Gao, Kai Chen, Bo Xiao et al.

ICCV 2025posterarXiv:2411.13807
46
citations
#545

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Zhuoqun Li, Xuanang Chen, Haiyang Yu et al.

ICLR 2025posterarXiv:2410.08815
46
citations
#546

M-LLM Based Video Frame Selection for Efficient Video Understanding

Kai Hu, Feng Gao, Xiaohan Nie et al.

CVPR 2025posterarXiv:2502.19680
46
citations
#547

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Mingfei Han, Linjie Yang, Xiaojun Chang et al.

ICLR 2025posterarXiv:2312.10300
46
citations
#548

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori et al.

NEURIPS 2025posterarXiv:2504.18575
46
citations
#549

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

Matthew Chang, Gunjan Chhablani, Alexander Clegg et al.

ICLR 2025oralarXiv:2411.00081
46
citations
#550

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Danny Driess, Jost Springenberg, Brian Ichter et al.

NEURIPS 2025spotlightarXiv:2505.23705
46
citations
#551

WorldScore: Unified Evaluation Benchmark for World Generation

Haoyi Duan, Hong-Xing Yu, Sirui Chen et al.

ICCV 2025poster
46
citations
#552

R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Naman Jain, Jaskirat Singh, Manish Shetty et al.

COLM 2025paper
46
citations
#553

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Weizhe Yuan, Jane Yu, Song Jiang et al.

NEURIPS 2025posterarXiv:2502.13124
46
citations
#554

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.

CVPR 2025posterarXiv:2403.08764
46
citations
#555

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Zonglin Yang, Wanhao Liu, Ben Gao et al.

ICLR 2025posterarXiv:2410.07076
45
citations
#556

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025posterarXiv:2412.07626
45
citations
#557

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang et al.

CVPR 2025posterarXiv:2412.10373
45
citations
#558

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma et al.

CVPR 2025posterarXiv:2412.03017
45
citations
#559

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Maxence Faldor, Jenny Zhang, Antoine Cully et al.

ICLR 2025posterarXiv:2405.15568
45
citations
#560

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.

NEURIPS 2025oralarXiv:2504.13180
45
citations
#561

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Keisuke Kamahori, Tian Tang, Yile Gu et al.

ICLR 2025posterarXiv:2402.07033
45
citations
#562

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Junfei Wu, Jian Guan, Kaituo Feng et al.

NEURIPS 2025posterarXiv:2506.09965
45
citations
#563

Generator Matching: Generative modeling with arbitrary Markov processes

Peter Holderrieth, Marton Havasi, Jason Yim et al.

ICLR 2025posterarXiv:2410.20587
45
citations
#564

FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"

Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.

ICLR 2025poster
45
citations
#565

TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

Jeremy Irvin, Emily Liu, Joyce Chen et al.

ICLR 2025oralarXiv:2410.06234
45
citations
#566

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Katrin Renz, Long Chen, Elahe Arani et al.

CVPR 2025highlightarXiv:2503.09594
45
citations
#567

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.

CVPR 2025posterarXiv:2412.18597
45
citations
#568

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.

CVPR 2025highlightarXiv:2501.03841
45
citations
#569

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris et al.

ICML 2025posterarXiv:2502.09509
45
citations
#570

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.

ICLR 2025posterarXiv:2403.06833
45
citations
#571

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Andy (DiJia) Su, Hanlin Zhu, Yingchen Xu et al.

ICML 2025posterarXiv:2502.03275
45
citations
#572

LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions

Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.

COLM 2025paperarXiv:2411.05025
45
citations
#573

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers

Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang

ICCV 2025posterarXiv:2412.18607
44
citations
#574

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen, Yunhao Gou, Runhui Huang et al.

CVPR 2025posterarXiv:2409.18042
44
citations
#575

Depth Any Video with Scalable Synthetic Data

Honghui Yang, Di Huang, Wei Yin et al.

ICLR 2025oralarXiv:2410.10815
44
citations
#576

Atom of Thoughts for Markov LLM Test-Time Scaling

Fengwei Teng, Quan Shi, Zhaoyang Yu et al.

NEURIPS 2025posterarXiv:2502.12018
44
citations
#577

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.

ICLR 2025posterarXiv:2407.00023
44
citations
#578

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen et al.

CVPR 2025posterarXiv:2412.14015
44
citations
#579

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025posterarXiv:2405.18540
44
citations
#580

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.

COLM 2025paper
44
citations
#581

On the Optimization and Generalization of Multi-head Attention

Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.

ICLR 2025posterarXiv:2310.12680
44
citations
#582

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

Long Le, Jason Xie, William Liang et al.

ICLR 2025posterarXiv:2410.13882
44
citations
#583

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Xingyu Fu, Minqian Liu, Zhengyuan Yang et al.

ICML 2025posterarXiv:2501.05452
44
citations
#584

End-to-End Autonomous Driving Through V2X Cooperation

Haibao Yu, Wenxian Yang, Jiaru Zhong et al.

AAAI 2025paperarXiv:2404.00717
44
citations
#585

RRM: Robust Reward Model Training Mitigates Reward Hacking

Tianqi Liu, Wei Xiong, Jie Ren et al.

ICLR 2025posterarXiv:2409.13156
44
citations
#586

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025posterarXiv:2406.11011
44
citations
#587

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paperarXiv:2504.20595
44
citations
#588

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Roman Bachmann, Jesse Allardice, David Mizrahi et al.

ICML 2025posterarXiv:2502.13967
43
citations
#589

Learning 4D Embodied World Models

Haoyu Zhen, Qiao Sun, Hongxin Zhang et al.

ICCV 2025posterarXiv:2504.20995
43
citations
#590

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu et al.

ICCV 2025posterarXiv:2412.01818
43
citations
#591

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Zhangheng LI, Keen You, Haotian Zhang et al.

ICLR 2025posterarXiv:2410.18967
43
citations
#592

RMB: Comprehensively benchmarking reward models in LLM alignment

Enyu Zhou, Guodong Zheng, Binghai Wang et al.

ICLR 2025posterarXiv:2410.09893
43
citations
#593

MET3R: Measuring Multi-View Consistency in Generated Images

Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.

CVPR 2025posterarXiv:2501.06336
43
citations
#594

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Hao Gao, Shaoyu Chen, Bo Jiang et al.

NEURIPS 2025posterarXiv:2502.13144
43
citations
#595

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Junjie He, Yifeng Geng, Liefeng Bo

ICCV 2025posterarXiv:2408.05939
43
citations
#596

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Jinbin Bai, Tian Ye, Wei Chow et al.

ICLR 2025posterarXiv:2410.08261
43
citations
#597

Catastrophic Failure of LLM Unlearning via Quantization

Zhiwei Zhang, Fali Wang, Xiaomin Li et al.

ICLR 2025posterarXiv:2410.16454
43
citations
#598

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Zhen Han, Zeyinzi Jiang, Yulin Pan et al.

ICLR 2025posterarXiv:2410.00086
43
citations
#599

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Chunwei Wang, Guansong Lu, Junwei Yang et al.

ICCV 2025posterarXiv:2412.06673
43
citations
#600

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025posterarXiv:2412.04383
43
citations