Most Cited 2025 "consistency model distillation" Papers

22,274 papers found • Page 8 of 112

#1401

The Curse of Depth in Large Language Models

Wenfang Sun, Xinyuan Song, Pengxiang Li et al.

NEURIPS 2025arXiv:2502.05795
31
citations
#1402

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025arXiv:2412.05276
31
citations
#1403

Spike No More: Stabilizing the Pre-training of Large Language Models

Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.

COLM 2025paperarXiv:2312.16903
31
citations
#1404

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Peng Xia, Siwei Han, Shi Qiu et al.

ICLR 2025arXiv:2410.10139
31
citations
#1405

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Zhicheng YANG, Yiwei Wang, Yinya Huang et al.

ICLR 2025arXiv:2407.09887
31
citations
#1406

Overtrained Language Models Are Harder to Fine-Tune

Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen et al.

ICML 2025arXiv:2503.19206
31
citations
#1407

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

hanxue liang, Jiawei Ren, Ashkan Mirzaei et al.

NEURIPS 2025arXiv:2412.03526
31
citations
#1408

IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing

Chun Gu, Xiaofei Wei, Zixuan Zeng et al.

CVPR 2025arXiv:2412.15867
31
citations
#1409

Steer LLM Latents for Hallucination Detection

Seongheon Park, Xuefeng Du, Min-Hsuan Yeh et al.

ICML 2025arXiv:2503.01917
31
citations
#1410

World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model

Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.

ICCV 2025arXiv:2507.00603
31
citations
#1411

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
31
citations
#1412

Forgetting Transformer: Softmax Attention with a Forget Gate

Zhixuan Lin, Evgenii Nikishin, Xu He et al.

ICLR 2025arXiv:2503.02130
31
citations
#1413

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang et al.

CVPR 2025arXiv:2503.07135
31
citations
#1414

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Shalev Lifshitz, Sheila A. McIlraith, Yilun Du

COLM 2025paperarXiv:2502.20379
31
citations
#1415

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.

CVPR 2025arXiv:2411.04923
31
citations
#1416

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Kun Wang et al.

NEURIPS 2025spotlightarXiv:2506.07398
31
citations
#1417

CBQ: Cross-Block Quantization for Large Language Models

Xin Ding, Xiaoyu Liu, Zhijun Tu et al.

ICLR 2025arXiv:2312.07950
31
citations
#1418

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

Qiuchen Wang, Ruixue Ding, Yu Zeng et al.

NEURIPS 2025arXiv:2505.22019
31
citations
#1419

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

Haohan Weng, Yikai Wang, Tong Zhang et al.

ICLR 2025arXiv:2405.16890
31
citations
#1420

Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection

Wei Luo, Yunkang Cao, Haiming Yao et al.

CVPR 2025arXiv:2503.02424
31
citations
#1421

VCA: Video Curious Agent for Long Video Understanding

Zeyuan Yang, Delin Chen, Xueyang Yu et al.

ICCV 2025arXiv:2412.10471
31
citations
#1422

From Tokens to Words: On the Inner Lexicon of LLMs

Guy Kaplan, Matanel Oren, Yuval Reif et al.

ICLR 2025arXiv:2410.05864
31
citations
#1423

Language Model Alignment in Multilingual Trolley Problems

Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti et al.

ICLR 2025oralarXiv:2407.02273
31
citations
#1424

McEval: Massively Multilingual Code Evaluation

Linzheng Chai, Shukai Liu, Jian Yang et al.

ICLR 2025arXiv:2406.07436
31
citations
#1425

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan et al.

NEURIPS 2025arXiv:2506.05573
31
citations
#1426

TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval

Leqi Shen, Tianxiang Hao, Tao He et al.

ICLR 2025oralarXiv:2409.01156
31
citations
#1427

STAMP: Scalable Task- And Model-agnostic Collaborative Perception

Xiangbo Gao, Runsheng Xu, Jiachen Li et al.

ICLR 2025arXiv:2501.18616
31
citations
#1428

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

Yuchen Tian, Weixiang Yan, Qian Yang et al.

AAAI 2025paperarXiv:2405.00253
31
citations
#1429

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025arXiv:2501.04686
31
citations
#1430

CViT: Continuous Vision Transformer for Operator Learning

Sifan Wang, Jacob Seidman, Shyam Sankaran et al.

ICLR 2025oralarXiv:2405.13998
31
citations
#1431

Best-of-N Jailbreaking

John Hughes, Sara Price, Aengus Lynch et al.

NEURIPS 2025arXiv:2412.03556
31
citations
#1432

PhysGen3D: Crafting a Miniature Interactive World from a Single Image

Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.

CVPR 2025arXiv:2503.20746
31
citations
#1433

How to build a consistency model: Learning flow maps via self-distillation

Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden

NEURIPS 2025arXiv:2505.18825
31
citations
#1434

GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS

Saman Kazemkhani, Aarav Pandya, Daphne Cornelisse et al.

ICLR 2025arXiv:2408.01584
31
citations
#1435

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025arXiv:2502.06787
31
citations
#1436

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Alireza Rezazadeh, Zichao Li, Wei Wei et al.

ICLR 2025arXiv:2410.14052
31
citations
#1437

GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing

Akashah Shabbir, Ilmuz Zaman Mohammed Zumri, Mohammed Bennamoun et al.

ICML 2025arXiv:2501.13925
31
citations
#1438

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NEURIPS 2025arXiv:2503.20762
31
citations
#1439

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior

Hanyu Wang, Saksham Suri, Yixuan Ren et al.

ICLR 2025arXiv:2410.21264
31
citations
#1440

FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining

Dong Li, Yidi Liu, Xueyang Fu et al.

ICML 2025oralarXiv:2405.19450
31
citations
#1441

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Jintao Zhang, Jia wei, Haoxu Wang et al.

NEURIPS 2025spotlightarXiv:2505.11594
31
citations
#1442

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Guoxuan Chen, Han Shi, jiawei li et al.

ICML 2025arXiv:2412.12094
31
citations
#1443

Understanding Chain-of-Thought in LLMs through Information Theory

Jean-Francois Ton, Muhammad Faaiz Taufiq, Yang Liu

ICML 2025arXiv:2411.11984
31
citations
#1444

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi

ICML 2025arXiv:2502.01839
31
citations
#1445

REEF: Representation Encoding Fingerprints for Large Language Models

Jie Zhang, Dongrui Liu, Chen Qian et al.

ICLR 2025arXiv:2410.14273
31
citations
#1446

Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View

Xuan Liu, Jie ZHANG, HaoYang Shang et al.

ICLR 2025arXiv:2405.14744
31
citations
#1447

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Ziteng Wang, Jun Zhu, Jianfei Chen

ICLR 2025arXiv:2412.14711
31
citations
#1448

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri et al.

NEURIPS 2025arXiv:2506.09038
31
citations
#1449

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025arXiv:2407.14207
31
citations
#1450

Efficient Online Reinforcement Learning for Diffusion Policy

Haitong Ma, Tianyi Chen, Kai Wang et al.

ICML 2025arXiv:2502.00361
31
citations
#1451

Herald: A Natural Language Annotated Lean 4 Dataset

Guoxiong Gao, Yutong Wang, Jiedong Jiang et al.

ICLR 2025arXiv:2410.10878
31
citations
#1452

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

Haoyi Jiang, Liu Liu, Tianheng Cheng et al.

CVPR 2025arXiv:2412.13193
31
citations
#1453

AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Li Lin, Santosh Santosh, Mingyang Wu et al.

CVPR 2025arXiv:2406.00783
31
citations
#1454

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.

ICLR 2025arXiv:2409.14989
31
citations
#1455

UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models

Xin Xu, Qiyun Xu, Tong Xiao et al.

ICML 2025arXiv:2502.00334
31
citations
#1456

ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement

Mengqi Lei, Haochen Wu, Xinhua Lv et al.

AAAI 2025paperarXiv:2412.08345
31
citations
#1457

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.

ICLR 2025arXiv:2501.13468
30
citations
#1458

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.

CVPR 2025arXiv:2411.11921
30
citations
#1459

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Dongwon Kim, Ju He, Qihang Yu et al.

ICCV 2025arXiv:2501.07730
30
citations
#1460

LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Jian Jia, Yipei Wang, Yan Li et al.

AAAI 2025paperarXiv:2405.03988
30
citations
#1461

On Large Language Model Continual Unlearning

Chongyang Gao, Lixu Wang, Kaize Ding et al.

ICLR 2025arXiv:2407.10223
30
citations
#1462

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen, Brynn zhao, Haomiao Sun et al.

NEURIPS 2025arXiv:2506.21416
30
citations
#1463

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu et al.

ICML 2025arXiv:2501.16975
30
citations
#1464

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Jan Held, Renaud Vandeghen, Abdullah J Hamdi et al.

CVPR 2025highlightarXiv:2411.14974
30
citations
#1465

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Sheng Zhou, Junbin Xiao, Qingyun Li et al.

CVPR 2025arXiv:2502.07411
30
citations
#1466

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Shi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.

NEURIPS 2025arXiv:2504.16074
30
citations
#1467

Distillation Scaling Laws

Dan Busbridge, Amitis Shidani, Floris Weers et al.

ICML 2025arXiv:2502.08606
30
citations
#1468

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.

ICLR 2025arXiv:2407.13766
30
citations
#1469

Epona: Autoregressive Diffusion World Model for Autonomous Driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu et al.

ICCV 2025arXiv:2506.24113
30
citations
#1470

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

Yepeng Liu, Yiren Song, Hai Ci et al.

ICLR 2025arXiv:2410.05470
30
citations
#1471

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Jiacheng Chen, Tianhao Liang, Sherman Siu et al.

ICLR 2025arXiv:2410.10563
30
citations
#1472

Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers

Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao et al.

AAAI 2025paperarXiv:2402.17564
30
citations
#1473

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

Jixun Yao, Hexin Liu, CHEN CHEN et al.

ICLR 2025arXiv:2502.02942
30
citations
#1474

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning

Hanxun Yu, Wentong Li, Song Wang et al.

CVPR 2025highlightarXiv:2503.00513
30
citations
#1475

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Peiyan Li, Yixiang Chen, Hongtao Wu et al.

NEURIPS 2025arXiv:2506.07961
30
citations
#1476

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.

ICLR 2025arXiv:2410.08182
30
citations
#1477

Faster Algorithms for Structured John Ellipsoid Computation

Yang Cao, Xiaoyu Li, Zhao Song et al.

NEURIPS 2025arXiv:2211.14407
30
citations
#1478

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Junyan Ye, Baichuan Zhou, Zilong Huang et al.

ICLR 2025arXiv:2410.09732
30
citations
#1479

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Hao Fang, Jiawei Kong, Wenbo Yu et al.

ICCV 2025arXiv:2406.05491
30
citations
#1480

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Qi Wu, Yubo Zhao, Yifan Wang et al.

ICLR 2025arXiv:2405.17013
30
citations
#1481

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

zehan wang, Ziang Zhang, Minjie Hong et al.

ICLR 2025arXiv:2407.11895
30
citations
#1482

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025arXiv:2502.07350
30
citations
#1483

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Zihui Cheng, Qiguang Chen, Jin Zhang et al.

AAAI 2025paperarXiv:2412.12932
30
citations
#1484

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Sicong Leng, Yun Xing, Zesen Cheng et al.

NEURIPS 2025arXiv:2410.12787
30
citations
#1485

InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences

Chenyang Zhu, Kai Li, Yue Ma et al.

ICLR 2025arXiv:2412.01197
30
citations
#1486

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Sihang Li, Jin Huang, Jiaxi Zhuang et al.

ICLR 2025arXiv:2408.15545
30
citations
#1487

Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing

Zhuoran Zhang, Yongxiang Li, Zijian Kan et al.

ICML 2025arXiv:2410.06331
30
citations
#1488

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Ziheng Ouyang, Zhen Li, Qibin Hou

CVPR 2025arXiv:2502.18461
30
citations
#1489

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control

Teng Li, Guangcong Zheng, Rui Jiang et al.

ICCV 2025arXiv:2502.10059
30
citations
#1490

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai et al.

AAAI 2025paperarXiv:2412.11023
30
citations
#1491

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025arXiv:2501.04931
30
citations
#1492

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.

CVPR 2025arXiv:2403.17064
30
citations
#1493

Spurious Forgetting in Continual Learning of Language Models

Junhao Zheng, Xidi Cai, Shengjie Qiu et al.

ICLR 2025arXiv:2501.13453
30
citations
#1494

VSSD: Vision Mamba with Non-Causal State Space Duality

Yuheng Shi, Mingjia Li, Minjing Dong et al.

ICCV 2025arXiv:2407.18559
30
citations
#1495

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

Can Jin, Tianjin Huang, Yihua Zhang et al.

AAAI 2025paperarXiv:2312.01397
30
citations
#1496

Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs

Zicheng Zhang, Ziheng Jia, Haoning Wu et al.

CVPR 2025arXiv:2409.20063
30
citations
#1497

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025arXiv:2406.05132
30
citations
#1498

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Doohyuk Jang, Sihwan Park, June Yong Yang et al.

ICLR 2025arXiv:2410.03355
30
citations
#1499

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paperarXiv:2406.16306
30
citations
#1500

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu, Zonghui Li, Xinting Hu et al.

NEURIPS 2025arXiv:2505.16707
30
citations
#1501

How to set AdamW's weight decay as you scale model and dataset size

Xi Wang, Laurence Aitchison

ICML 2025arXiv:2405.13698
30
citations
#1502

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

Moritz Reuss, Jyothish Pari, Pulkit Agrawal et al.

ICLR 2025arXiv:2412.12953
30
citations
#1503

Efficient Dictionary Learning with Switch Sparse Autoencoders

Anish Mudide, Josh Engels, Eric Michaud et al.

ICLR 2025arXiv:2410.08201
30
citations
#1504

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao, Sizhe Dang, Haishan Ye et al.

ICLR 2025
30
citations
#1505

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Xiaojuan Wang, Boyang Zhou, Brian Curless et al.

ICLR 2025arXiv:2408.15239
30
citations
#1506

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Lecheng Kong, Jiarui Feng, Hao Liu et al.

ICLR 2025arXiv:2407.09709
30
citations
#1507

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Bin Chen, Gehui Li, Rongyuan Wu et al.

CVPR 2025arXiv:2411.13383
30
citations
#1508

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025arXiv:2412.07762
30
citations
#1509

MINIMA: Modality Invariant Image Matching

Jiangwei Ren, Xingyu Jiang, Zizhuo Li et al.

CVPR 2025arXiv:2412.19412
30
citations
#1510

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

Justin Deschenaux, Caglar Gulcehre

ICLR 2025arXiv:2410.21035
29
citations
#1511

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Anian Ruoss, Fabio Pardo, Harris Chan et al.

ICML 2025arXiv:2412.01441
29
citations
#1512

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.

ICLR 2025arXiv:2409.11242
29
citations
#1513

Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content

Sheng Wu, Dongxiao He, Xiaobao Wang et al.

AAAI 2025paperarXiv:2412.10460
29
citations
#1514

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

Yiqun Chen, Lingyong Yan, Weiwei Sun et al.

NEURIPS 2025arXiv:2501.15228
29
citations
#1515

Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Sheng Liu, Haotian Ye, James Y Zou

ICLR 2025
29
citations
#1516

Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection

Lichen Bai, Shitong Shao, zikai zhou et al.

ICLR 2025arXiv:2412.10891
29
citations
#1517

Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale

Fan Zhou, Zengzhi Wang, Qian Liu et al.

ICML 2025arXiv:2409.17115
29
citations
#1518

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Hongyan Zhi, Peihao Chen, Junyan Li et al.

CVPR 2025arXiv:2412.01292
29
citations
#1519

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Zhenyu Pan, Haozheng Luo, Manling Li et al.

ICLR 2025arXiv:2403.17359
29
citations
#1520

GraphArena: Evaluating and Exploring Large Language Models on Graph Computation

Jianheng Tang, Qifan Zhang, Yuhan Li et al.

ICLR 2025arXiv:2407.00379
29
citations
#1521

How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework

Yinuo Ren, Haoxuan Chen, Grant Rotskoff et al.

ICLR 2025arXiv:2410.03601
29
citations
#1522

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NEURIPS 2025arXiv:2505.24625
29
citations
#1523

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Navve Wasserman, Noam Rotstein, Roy Ganz et al.

CVPR 2025arXiv:2404.18212
29
citations
#1524

Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free

Ziyue Li, Tianyi Zhou

ICLR 2025arXiv:2410.10814
29
citations
#1525

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

Xin Zhou, DINGKANG LIANG, Sifan Tu et al.

ICCV 2025arXiv:2501.14729
29
citations
#1526

Reinforcement Learning with Action Chunking

Qiyang Li, Zhiyuan (Paul) Zhou, Sergey Levine

NEURIPS 2025oralarXiv:2507.07969
29
citations
#1527

AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

Arman Zharmagambetov, Chuan Guo, Ivan Evtimov et al.

NEURIPS 2025arXiv:2503.09780
29
citations
#1528

Training Dynamics of In-Context Learning in Linear Attention

Yedi Zhang, Aaditya Singh, Peter Latham et al.

ICML 2025spotlightarXiv:2501.16265
29
citations
#1529

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Ying Chen, Guoan Wang, Yuanfeng Ji et al.

CVPR 2025arXiv:2410.11761
29
citations
#1530

Diving into Self-Evolving Training for Multimodal Reasoning

Wei Liu, Junlong Li, Xiwen Zhang et al.

ICML 2025arXiv:2412.17451
29
citations
#1531

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

Lixing Xiao, Shunlin Lu, Huaijin Pi et al.

ICCV 2025arXiv:2503.15451
29
citations
#1532

Can Large Language Models Understand Symbolic Graphics Programs?

Zeju Qiu, Weiyang Liu, Haiwen Feng et al.

ICLR 2025arXiv:2408.08313
29
citations
#1533

Energy-Weighted Flow Matching for Offline Reinforcement Learning

Shiyuan Zhang, Weitong Zhang, Quanquan Gu

ICLR 2025arXiv:2503.04975
29
citations
#1534

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Jiangjie Chen, Qianyu He, Siyu Yuan et al.

NEURIPS 2025spotlightarXiv:2505.19914
29
citations
#1535

Theoretical Benefit and Limitation of Diffusion Language Model

Guhao Feng, Yihan Geng, Jian Guan et al.

NEURIPS 2025arXiv:2502.09622
29
citations
#1536

The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling

Andre Cornman, Jacob West-Roberts, Antonio Camargo et al.

ICLR 2025
29
citations
#1537

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Yu Ying Chiu, Liwei Jiang, Yejin Choi

ICLR 2025oralarXiv:2410.02683
29
citations
#1538

Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding

Zhongyi Shui, Jianpeng Zhang, Weiwei Cao et al.

ICLR 2025arXiv:2501.14548
29
citations
#1539

CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning

Hao Cui, Zahra Shamsi, Gowoon Cheon et al.

ICLR 2025arXiv:2503.13517
29
citations
#1540

Bolt3D: Generating 3D Scenes in Seconds

Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan et al.

ICCV 2025arXiv:2503.14445
29
citations
#1541

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang, Ziang Zhang, Tianyu Pang et al.

ICML 2025arXiv:2412.18605
29
citations
#1542

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu et al.

CVPR 2025arXiv:2412.09856
29
citations
#1543

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025arXiv:2501.09757
29
citations
#1544

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Liming Jiang, Qing Yan, Yumin Jia et al.

ICCV 2025highlightarXiv:2503.16418
29
citations
#1545

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Samira Abnar, Harshay Shah, Dan Busbridge et al.

ICML 2025arXiv:2501.12370
29
citations
#1546

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

Weibo Gao, Qi Liu, Linan Yue et al.

AAAI 2025paperarXiv:2501.10332
29
citations
#1547

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Jingbo Yang, Bairu Hou, Wei Wei et al.

NEURIPS 2025arXiv:2502.16002
29
citations
#1548

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors

Weixuan Wang, JINGYUAN YANG, Wei Peng

ICLR 2025arXiv:2410.12299
29
citations
#1549

RUN: Reversible Unfolding Network for Concealed Object Segmentation

Chunming He, Rihan Zhang, Fengyang Xiao et al.

ICML 2025arXiv:2501.18783
29
citations
#1550

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Geng Li, Jinglin Xu, Yunzhen Zhao et al.

CVPR 2025highlightarXiv:2504.14920
29
citations
#1551

A Formal Framework for Understanding Length Generalization in Transformers

Xinting Huang, Andy Yang, Satwik Bhattamishra et al.

ICLR 2025arXiv:2410.02140
29
citations
#1552

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NEURIPS 2025arXiv:2506.01300
29
citations
#1553

CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression

Yu-Ting Zhan, Cheng-Yuan Ho, He-Bi Yang et al.

ICLR 2025arXiv:2503.00357
29
citations
#1554

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Chejian Xu, Mintong Kang, Jiawei Zhang et al.

ICML 2025arXiv:2410.17401
29
citations
#1555

Machine Unlearning Fails to Remove Data Poisoning Attacks

Martin Pawelczyk, Jimmy Di, Yiwei Lu et al.

ICLR 2025arXiv:2406.17216
29
citations
#1556

A Bias-Free Training Paradigm for More General AI-generated Image Detection

Fabrizio Guillaro, Giada Zingarini, Ben Usman et al.

CVPR 2025arXiv:2412.17671
29
citations
#1557

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Cong Chen, Mingyu Liu, Chenchen Jing et al.

ICLR 2025arXiv:2503.06486
29
citations
#1558

ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding

Zhengzhuo Xu, Bowen Qu, Yiyan Qi et al.

ICLR 2025arXiv:2409.03277
29
citations
#1559

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Zihan Zheng, Zerui Cheng, Zeyu Shen et al.

NEURIPS 2025arXiv:2506.11928
29
citations
#1560

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.

ICLR 2025arXiv:2410.16251
29
citations
#1561

Holistically Evaluating the Environmental Impact of Creating Language Models

Jacob Morrison, Clara Na, Jared Fernandez et al.

ICLR 2025arXiv:2503.05804
29
citations
#1562

Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron

Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.

ICLR 2025
29
citations
#1563

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Huizhuo Yuan, Yifeng Liu, Shuang Wu et al.

ICML 2025arXiv:2411.10438
29
citations
#1564

Multimodal Situational Safety

Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao et al.

ICLR 2025arXiv:2410.06172
29
citations
#1565

Graphic Design with Large Multimodal Model

Yutao Cheng, Zhao Zhang, Maoke Yang et al.

AAAI 2025paperarXiv:2404.14368
29
citations
#1566

Interleaved-Modal Chain-of-Thought

Jun Gao, Yongqi Li, Ziqiang Cao et al.

CVPR 2025arXiv:2411.19488
29
citations
#1567

MegaMath: Pushing the Limits of Open Math Corpora

Fan Zhou, Zengzhi Wang, Nikhil Ranjan et al.

COLM 2025paperarXiv:2504.02807
29
citations
#1568

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Zongyi Li, Shujie HU, Shujie LIU et al.

ICLR 2025oralarXiv:2410.20502
28
citations
#1569

HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation

Haoran Luo, Haihong E, Guanting Chen et al.

NEURIPS 2025arXiv:2503.21322
28
citations
#1570

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

Yiming Chen, Yuan Zhang, Liyuan Cao et al.

ICLR 2025arXiv:2410.07698
28
citations
#1571

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

ZIYU ZHU, Xilin Wang, Yixuan Li et al.

ICCV 2025highlightarXiv:2507.04047
28
citations
#1572

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Tianyu Zhang, Andrew Williams, Phillip Wozny et al.

ICML 2025arXiv:2208.07004
28
citations
#1573

Grounded Reinforcement Learning for Visual Reasoning

Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.

NEURIPS 2025arXiv:2505.23678
28
citations
#1574

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.

ICLR 2025arXiv:2410.07815
28
citations
#1575

TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Haitao Zhou, Chuang Wang, Rui Nie et al.

AAAI 2025paperarXiv:2408.11475
28
citations
#1576

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Yiren Song, Danze Chen, Mike Zheng Shou

ICCV 2025arXiv:2502.01105
28
citations
#1577

From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions

Changle Qu, Sunhao Dai, Xiaochi Wei et al.

ICLR 2025arXiv:2410.08197
28
citations
#1578

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding

Zhenyu Yang, Yuhang Hu, Zemin Du et al.

ICLR 2025oralarXiv:2502.10810
28
citations
#1579

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani, Jason Lee, Alberto Bietti

ICLR 2025arXiv:2412.06538
28
citations
#1580

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Zimu Lu, Aojun Zhou, Ke Wang et al.

ICLR 2025arXiv:2410.08196
28
citations
#1581

SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation

Chun-Han Yao, Yiming Xie, Vikram Voleti et al.

ICCV 2025arXiv:2503.16396
28
citations
#1582

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization

Zechun Liu, Changsheng Zhao, Hanxian Huang et al.

NEURIPS 2025arXiv:2502.02631
28
citations
#1583

LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani et al.

ICML 2025oralarXiv:2504.10415
28
citations
#1584

Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning

Wenyi Xiao, Leilei Gan

NEURIPS 2025spotlightarXiv:2504.18458
28
citations
#1585

Star Attention: Efficient LLM Inference over Long Sequences

Shantanu Acharya, Fei Jia, Boris Ginsburg

ICML 2025arXiv:2411.17116
28
citations
#1586

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Tejas Karkhanis

NEURIPS 2025spotlightarXiv:2505.15201
28
citations
#1587

The Superposition of Diffusion Models Using the Itô Density Estimator

Marta Skreta, Lazar Atanackovic, Joey Bose et al.

ICLR 2025arXiv:2412.17762
28
citations
#1588

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

Siqi Kou, Jiachun Jin, Zhihong Liu et al.

ICML 2025arXiv:2412.00127
28
citations
#1589

Subspace Optimization for Large Language Models with Convergence Guarantees

Yutong He, Pengrui Li, Yipeng Hu et al.

ICML 2025arXiv:2410.11289
28
citations
#1590

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang SHEN, Yue Li, Desong Meng et al.

ICLR 2025arXiv:2407.00132
28
citations
#1591

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Jingjing Chang, Yixiao Fang, Peng Xing et al.

NEURIPS 2025arXiv:2506.07977
28
citations
#1592

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Benjamin Feuer, Micah Goldblum, Teresa Datta et al.

ICLR 2025arXiv:2409.15268
28
citations
#1593

DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting

Hyunwoo Park, Gun Ryu, Wonjun Kim

CVPR 2025arXiv:2504.00773
28
citations
#1594

Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport

Zhenyi Zhang, Tiejun Li, Peijie Zhou

ICLR 2025arXiv:2410.00844
28
citations
#1595

Chain-of-Retrieval Augmented Generation

Liang Wang, Haonan Chen, Nan Yang et al.

NEURIPS 2025arXiv:2501.14342
28
citations
#1596

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

Xin Yi, Shunfan Zheng, Linlin Wang et al.

AAAI 2025paperarXiv:2412.12497
28
citations
#1597

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.

NEURIPS 2025spotlightarXiv:2505.11423
28
citations
#1598

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Haiping Wang, Yuan Liu, Ziwei Liu et al.

ICCV 2025arXiv:2410.16892
28
citations
#1599

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

Zijian Liu, Zhengyuan Zhou

ICLR 2025arXiv:2412.19529
28
citations
#1600

Diffusion-based Neural Network Weights Generation

Bedionita Soro, Bruno Andreis, Hayeon Lee et al.

ICLR 2025arXiv:2402.18153
28
citations