Most Cited 2025 "progressive synthesis" Papers

22,274 papers found • Page 5 of 112

#801

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025paperarXiv:2409.01227
31
citations
#802

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Yang Liu, Chuanchen Luo, Zhongkai Mao et al.

ICLR 2025posterarXiv:2411.00771
31
citations
#803

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Xubing Ye, Yukang Gan, Yixiao Ge et al.

CVPR 2025posterarXiv:2412.00447
31
citations
#804

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.

NEURIPS 2025spotlightarXiv:2506.16791
31
citations
#805

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang, Yuhao Wang, Zichen Wen et al.

NEURIPS 2025oralarXiv:2506.10100
31
citations
#806

A Closer Look at Machine Unlearning for Large Language Models

Xiaojian Yuan, Tianyu Pang, Chao Du et al.

ICLR 2025posterarXiv:2410.08109
31
citations
#807

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Peng Jin, Bo Zhu, Yuan Li et al.

ICLR 2025posterarXiv:2410.07348
31
citations
#808

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025paperarXiv:2412.18537
31
citations
#809

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Yuheng Zhang, Dian Yu, Baolin Peng et al.

ICLR 2025posterarXiv:2407.00617
31
citations
#810

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737
31
citations
#811

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peijie Dong, Lujun Li, Yuedong Zhong et al.

ICLR 2025posterarXiv:2408.01803
31
citations
#812

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Mingyuan Zhou, Huangjie Zheng, Yi Gu et al.

ICLR 2025posterarXiv:2410.14919
31
citations
#813

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025posterarXiv:2501.03218
31
citations
#814

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NEURIPS 2025spotlightarXiv:2502.08640
31
citations
#815

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025posterarXiv:2502.20694
31
citations
#816

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Xinze Li, Sen Mei, Zhenghao Liu et al.

ICLR 2025posterarXiv:2410.13509
31
citations
#817

Number Cookbook: Number Understanding of Language Models and How to Improve It

Haotong Yang, Yi Hu, Shijia Kang et al.

ICLR 2025posterarXiv:2411.03766
31
citations
#818

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025posterarXiv:2406.04321
31
citations
#819

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
31
citations
#820

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

Ezra Karger, Houtan Bastani, Chen Yueh-Han et al.

ICLR 2025posterarXiv:2409.19839
31
citations
#821

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

Haohan Weng, Yikai Wang, Tong Zhang et al.

ICLR 2025posterarXiv:2405.16890
31
citations
#822

REEF: Representation Encoding Fingerprints for Large Language Models

Jie Zhang, Dongrui Liu, Chen Qian et al.

ICLR 2025posterarXiv:2410.14273
31
citations
#823

Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality

Ying Jin, Zhimei Ren, Zhuoran Yang et al.

NEURIPS 2025posterarXiv:2212.09900
30
citations
#824

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025posterarXiv:2406.05132
30
citations
#825

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

Haoyi Jiang, Liu Liu, Tianheng Cheng et al.

CVPR 2025posterarXiv:2412.13193
30
citations
#826

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Hailong Guo, Bohan Zeng, Yiren Song et al.

ICCV 2025posterarXiv:2501.15891
30
citations
#827

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025posterarXiv:2412.13337
30
citations
#828

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Qi Wu, Yubo Zhao, Yifan Wang et al.

ICLR 2025posterarXiv:2405.17013
30
citations
#829

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill, Ashish Sabharwal

NEURIPS 2025posterarXiv:2503.03961
30
citations
#830

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025posterarXiv:2411.18203
30
citations
#831

From Tokens to Words: On the Inner Lexicon of LLMs

Guy Kaplan, Matanel Oren, Yuval Reif et al.

ICLR 2025posterarXiv:2410.05864
30
citations
#832

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025posterarXiv:2502.21079
30
citations
#833

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Zihui Cheng, Qiguang Chen, Jin Zhang et al.

AAAI 2025paperarXiv:2412.12932
30
citations
#834

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.

AAAI 2025paperarXiv:2405.20535
30
citations
#835

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

AAAI 2025paperarXiv:2405.15214
30
citations
#836

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.

ICLR 2025posterarXiv:2407.13766
30
citations
#837

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.

CVPR 2025posterarXiv:2411.04923
30
citations
#838

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025posterarXiv:2312.11556
30
citations
#839

Evolutionary Large Language Model for Automated Feature Transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.

AAAI 2025paperarXiv:2405.16203
30
citations
#840

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data

Asad Aali, Giannis Daras, Brett Levac et al.

ICLR 2025posterarXiv:2403.08728
30
citations
#841

A New Perspective on Shampoo's Preconditioner

Depen Morwani, Itai Shapira, Nikhil Vyas et al.

ICLR 2025posterarXiv:2406.17748
30
citations
#842

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460
30
citations
#843

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

ICLR 2025posterarXiv:2501.13554
30
citations
#844

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025posterarXiv:2406.02720
30
citations
#845

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025posterarXiv:2502.06787
30
citations
#846

Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.

CVPR 2025posterarXiv:2411.18466
30
citations
#847

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NEURIPS 2025posterarXiv:2505.22647
30
citations
#848

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Kunhao Zheng, Juliette Decugis, Jonas Gehring et al.

ICLR 2025posterarXiv:2410.08105
30
citations
#849

McEval: Massively Multilingual Code Evaluation

Linzheng Chai, Shukai Liu, Jian Yang et al.

ICLR 2025posterarXiv:2406.07436
30
citations
#850

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434
30
citations
#851

The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Filippo Bigi, Marcel Langer, Michele Ceriotti

ICML 2025oralarXiv:2412.11569
29
citations
#852

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025posterarXiv:2501.04686
29
citations
#853

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.

ICLR 2025posterarXiv:2410.16251
29
citations
#854

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Navve Wasserman, Noam Rotstein, Roy Ganz et al.

CVPR 2025posterarXiv:2404.18212
29
citations
#855

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Alireza Rezazadeh, Zichao Li, Wei Wei et al.

ICLR 2025posterarXiv:2410.14052
29
citations
#856

InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences

Chenyang Zhu, Kai Li, Yue Ma et al.

ICLR 2025posterarXiv:2412.01197
29
citations
#857

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang et al.

ICLR 2025posterarXiv:2410.06961
29
citations
#858

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.

ICLR 2025posterarXiv:2410.08182
29
citations
#859

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior

Hanyu Wang, Saksham Suri, Yixuan Ren et al.

ICLR 2025posterarXiv:2410.21264
29
citations
#860

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Liming Jiang, Qing Yan, Yumin Jia et al.

ICCV 2025highlightarXiv:2503.16418
29
citations
#861

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang et al.

CVPR 2025posterarXiv:2503.07135
29
citations
#862

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Siwei Wen, junyan ye, Peilin Feng et al.

NEURIPS 2025posterarXiv:2503.14905
29
citations
#863

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.

CVPR 2025posterarXiv:2504.03886
29
citations
#864

FlatQuant: Flatness Matters for LLM Quantization

Yuxuan Sun, Ruikang Liu, Haoli Bai et al.

ICML 2025posterarXiv:2410.09426
29
citations
#865

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025posterarXiv:2407.14207
29
citations
#866

OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation

Yuchen Lin, Chenguo Lin, Jianjin Xu et al.

ICLR 2025posterarXiv:2501.18982
29
citations
#867

MoDeGPT: Modular Decomposition for Large Language Model Compression

Chi-Heng Lin, Shangqian Gao, James Smith et al.

ICLR 2025posterarXiv:2408.09632
29
citations
#868

Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting

Siru Zhong, Weilin Ruan, Ming Jin et al.

ICML 2025oralarXiv:2502.04395
29
citations
#869

The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling

Andre Cornman, Jacob West-Roberts, Antonio Camargo et al.

ICLR 2025poster
29
citations
#870

Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Sheng Liu, Haotian Ye, James Y Zou

ICLR 2025poster
29
citations
#871

STAMP: Scalable Task- And Model-agnostic Collaborative Perception

Xiangbo Gao, Runsheng Xu, Jiachen Li et al.

ICLR 2025posterarXiv:2501.18616
29
citations
#872

Gramian Multimodal Representation Learning and Alignment

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.

ICLR 2025posterarXiv:2412.11959
29
citations
#873

Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View

Xuan Liu, Jie ZHANG, HaoYang Shang et al.

ICLR 2025posterarXiv:2405.14744
29
citations
#874

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

NEURIPS 2025posterarXiv:2502.00234
29
citations
#875

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao, Sizhe Dang, Haishan Ye et al.

ICLR 2025poster
29
citations
#876

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.

CVPR 2025posterarXiv:2411.11921
29
citations
#877

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Xiaojuan Wang, Boyang Zhou, Brian Curless et al.

ICLR 2025posterarXiv:2408.15239
29
citations
#878

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

NEURIPS 2025posterarXiv:2506.05302
29
citations
#879

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

CVPR 2025posterarXiv:2501.09695
29
citations
#880

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Vishwesh Nath, Wenqi Li, Dong Yang et al.

CVPR 2025highlightarXiv:2411.12915
29
citations
#881

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.

ICLR 2025posterarXiv:2409.11242
29
citations
#882

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Doohyuk Jang, Sihwan Park, June Yong Yang et al.

ICLR 2025posterarXiv:2410.03355
29
citations
#883

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

Artyom Stitsyuk, Jaesik Choi

AAAI 2025paperarXiv:2412.17323
29
citations
#884

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.

ICLR 2025posterarXiv:2501.13468
28
citations
#885

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan et al.

NEURIPS 2025posterarXiv:2506.05573
28
citations
#886

Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content

Sheng Wu, Dongxiao He, Xiaobao Wang et al.

AAAI 2025paperarXiv:2412.10460
28
citations
#887

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025highlightarXiv:2502.20653
28
citations
#888

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Keon Lee, Dong Won Kim, Jaehyeon Kim et al.

ICLR 2025posterarXiv:2406.11427
28
citations
#889

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Yiyou Sun, Shawn Hu, Georgia Zhou et al.

NEURIPS 2025posterarXiv:2506.18880
28
citations
#890

Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport

Zhenyi Zhang, Tiejun Li, Peijie Zhou

ICLR 2025posterarXiv:2410.00844
28
citations
#891

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025posterarXiv:2501.09781
28
citations
#892

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025posterarXiv:2501.04931
28
citations
#893

MAT-Agent: Adaptive Multi-Agent Training Optimization

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025posterarXiv:2510.17845
28
citations
#894

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Haochen Wang, Yucheng Zhao, Tiancai Wang et al.

ICCV 2025posterarXiv:2504.01901
28
citations
#895

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NEURIPS 2025posterarXiv:2506.01300
28
citations
#896

Machine Unlearning Fails to Remove Data Poisoning Attacks

Martin Pawelczyk, Jimmy Di, Yiwei Lu et al.

ICLR 2025posterarXiv:2406.17216
28
citations
#897

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang, Ziang Zhang, Tianyu Pang et al.

ICML 2025posterarXiv:2412.18605
28
citations
#898

RUN: Reversible Unfolding Network for Concealed Object Segmentation

Chunming He, Rihan Zhang, Fengyang Xiao et al.

ICML 2025posterarXiv:2501.18783
28
citations
#899

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

Weibo Gao, Qi Liu, Linan Yue et al.

AAAI 2025paperarXiv:2501.10332
28
citations
#900

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Lecheng Kong, Jiarui Feng, Hao Liu et al.

ICLR 2025posterarXiv:2407.09709
28
citations
#901

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.

NEURIPS 2025spotlightarXiv:2505.11423
28
citations
#902

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Yuxiao Hu, Qian Li, Dongxiao Zhang et al.

ICLR 2025posterarXiv:2501.03747
28
citations
#903

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Ziteng Wang, Jun Zhu, Jianfei Chen

ICLR 2025posterarXiv:2412.14711
28
citations
#904

From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams, Liam Bai, Minji Lee et al.

ICML 2025spotlight
28
citations
#905

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Geng Li, Jinglin Xu, Yunzhen Zhao et al.

CVPR 2025highlightarXiv:2504.14920
28
citations
#906

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Le Zhuo, Liangbing Zhao, Sayak Paul et al.

ICCV 2025posterarXiv:2504.16080
28
citations
#907

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

Yepeng Liu, Yiren Song, Hai Ci et al.

ICLR 2025posterarXiv:2410.05470
28
citations
#908

Diffusion-based Neural Network Weights Generation

Bedionita Soro, Bruno Andreis, Hayeon Lee et al.

ICLR 2025posterarXiv:2402.18153
28
citations
#909

Spurious Forgetting in Continual Learning of Language Models

Junhao Zheng, Xidi Cai, Shengjie Qiu et al.

ICLR 2025posterarXiv:2501.13453
28
citations
#910

Overtrained Language Models Are Harder to Fine-Tune

Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen et al.

ICML 2025posterarXiv:2503.19206
28
citations
#911

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Sheng Zhou, Junbin Xiao, Qingyun Li et al.

CVPR 2025posterarXiv:2502.07411
28
citations
#912

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NEURIPS 2025posterarXiv:2505.10446
28
citations
#913

Herald: A Natural Language Annotated Lean 4 Dataset

Guoxiong Gao, Yutong Wang, Jiedong Jiang et al.

ICLR 2025posterarXiv:2410.10878
28
citations
#914

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.

ICLR 2025posterarXiv:2410.07815
28
citations
#915

Can Large Language Models Understand Symbolic Graphics Programs?

Zeju Qiu, Weiyang Liu, Haiwen Feng et al.

ICLR 2025posterarXiv:2408.08313
28
citations
#916

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025posterarXiv:2502.12892
28
citations
#917

Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection

Wei Luo, Yunkang Cao, Haiming Yao et al.

CVPR 2025posterarXiv:2503.02424
28
citations
#918

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Zhejun Zhang, Peter Karkus, Maximilian Igl et al.

CVPR 2025posterarXiv:2412.05334
28
citations
#919

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

Siqi Kou, Jiachun Jin, Zhihong Liu et al.

ICML 2025posterarXiv:2412.00127
28
citations
#920

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Junyan Ye, Baichuan Zhou, Zilong Huang et al.

ICLR 2025posterarXiv:2410.09732
28
citations
#921

Long-Context State-Space Video World Models

Ryan Po, Yotam Nitzan, Richard Zhang et al.

ICCV 2025posterarXiv:2505.20171
28
citations
#922

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Jiacheng Chen, Tianhao Liang, Sherman Siu et al.

ICLR 2025posterarXiv:2410.10563
28
citations
#923

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.

CVPR 2025posterarXiv:2503.17673
28
citations
#924

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning

Hanxun Yu, Wentong Li, Song Wang et al.

CVPR 2025highlightarXiv:2503.00513
28
citations
#925

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Daniel Marczak, Simone Magistri, Sebastian Cygert et al.

ICML 2025posterarXiv:2502.04959
27
citations
#926

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Zongyi Li, Shujie HU, Shujie LIU et al.

ICLR 2025oralarXiv:2410.20502
27
citations
#927

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

Jixun Yao, Hexin Liu, CHEN CHEN et al.

ICLR 2025posterarXiv:2502.02942
27
citations
#928

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

Cheng Sun, Jaesung Choe, Charles Loop et al.

CVPR 2025posterarXiv:2412.04459
27
citations
#929

Graphic Design with Large Multimodal Model

Yutao Cheng, Zhao Zhang, Maoke Yang et al.

AAAI 2025paperarXiv:2404.14368
27
citations
#930

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.

ICLR 2025posterarXiv:2407.07880
27
citations
#931

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025posterarXiv:2412.07762
27
citations
#932

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.

NEURIPS 2025spotlightarXiv:2504.15376
27
citations
#933

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

Yiqun Chen, Lingyong Yan, Weiwei Sun et al.

NEURIPS 2025posterarXiv:2501.15228
27
citations
#934

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning

Hao Chen, Jiaming Liu, Chenyang Gu et al.

NEURIPS 2025poster
27
citations
#935

PersonalLLM: Tailoring LLMs to Individual Preferences

Thomas Zollo, Andrew Siah, Naimeng Ye et al.

ICLR 2025posterarXiv:2409.20296
27
citations
#936

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Zhenyu Pan, Haozheng Luo, Manling Li et al.

ICLR 2025posterarXiv:2403.17359
27
citations
#937

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

Ming Li, Yongchun Gu, Yi Wang et al.

AAAI 2025paper
27
citations
#938

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.

ICLR 2025posterarXiv:2409.14989
27
citations
#939

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

ICLR 2025posterarXiv:2410.11325
27
citations
#940

Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding

Zhongyi Shui, Jianpeng Zhang, Weiwei Cao et al.

ICLR 2025posterarXiv:2501.14548
27
citations
#941

Theoretical Benefit and Limitation of Diffusion Language Model

Guhao Feng, Yihan Geng, Jian Guan et al.

NEURIPS 2025posterarXiv:2502.09622
27
citations
#942

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NEURIPS 2025oralarXiv:2507.01467
27
citations
#943

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Peiyan Li, Yixiang Chen, Hongtao Wu et al.

NEURIPS 2025posterarXiv:2506.07961
27
citations
#944

Language-Image Models with 3D Understanding

Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.

ICLR 2025posterarXiv:2405.03685
27
citations
#945

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Haiping Wang, Yuan Liu, Ziwei Liu et al.

ICCV 2025posterarXiv:2410.16892
27
citations
#946

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Mingzhen Sun, Weining Wang, Li et al.

CVPR 2025posterarXiv:2503.07418
27
citations
#947

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Benjamin Feuer, Micah Goldblum, Teresa Datta et al.

ICLR 2025posterarXiv:2409.15268
27
citations
#948

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025posterarXiv:2501.09757
27
citations
#949

Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free

Ziyue Li, Tianyi Zhou

ICLR 2025posterarXiv:2410.10814
27
citations
#950

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Yuhao Wang, Yang Liu, Aihua Zheng et al.

AAAI 2025paperarXiv:2412.10650
27
citations
#951

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Minjie Zhu, Yichen Zhu, Ning Liu et al.

AAAI 2025paperarXiv:2403.06199
27
citations
#952

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Yu Ying Chiu, Liwei Jiang, Yejin Choi

ICLR 2025oralarXiv:2410.02683
27
citations
#953

Improving Uncertainty Estimation through Semantically Diverse Language Generation

Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi et al.

ICLR 2025posterarXiv:2406.04306
27
citations
#954

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Zimu Lu, Aojun Zhou, Ke Wang et al.

ICLR 2025posterarXiv:2410.08196
27
citations
#955

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.

NEURIPS 2025posterarXiv:2505.21523
27
citations
#956

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025posterarXiv:2412.05276
27
citations
#957

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Zhicheng YANG, Yiwei Wang, Yinya Huang et al.

ICLR 2025posterarXiv:2407.09887
27
citations
#958

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai et al.

AAAI 2025paperarXiv:2412.11023
27
citations
#959

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Xiaoshuai Song, Muxi Diao, Guanting Dong et al.

ICLR 2025posterarXiv:2406.08587
27
citations
#960

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao et al.

ICLR 2025posterarXiv:2412.13630
27
citations
#961

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Anian Ruoss, Fabio Pardo, Harris Chan et al.

ICML 2025posterarXiv:2412.01441
27
citations
#962

Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Laura Leal-Taixe

CVPR 2025highlightarXiv:2501.14914
27
citations
#963

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

Siyu Xu, Yunke Wang, Chenghao Xia et al.

NEURIPS 2025oralarXiv:2502.02175
27
citations
#964

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

CVPR 2025highlightarXiv:2410.03665
27
citations
#965

PhysGen3D: Crafting a Miniature Interactive World from a Single Image

Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.

CVPR 2025posterarXiv:2503.20746
26
citations
#966

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NEURIPS 2025posterarXiv:2503.20762
26
citations
#967

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025posterarXiv:2405.15143
26
citations
#968

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu, Wenzhao Zheng, Jie Zhou et al.

NEURIPS 2025posterarXiv:2507.02863
26
citations
#969

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Muhammad Gohar Javed, chuan guo, Li Cheng et al.

ICLR 2025oralarXiv:2410.10010
26
citations
#970

CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression

Yu-Ting Zhan, Cheng-Yuan Ho, He-Bi Yang et al.

ICLR 2025posterarXiv:2503.00357
26
citations
#971

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri et al.

NEURIPS 2025posterarXiv:2506.09038
26
citations
#972

Perception-Guided Jailbreak Against Text-to-Image Models

Yihao Huang, Le Liang, Tianlin Li et al.

AAAI 2025paperarXiv:2408.10848
26
citations
#973

Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection

Lichen Bai, Shitong Shao, zikai zhou et al.

ICLR 2025posterarXiv:2412.10891
26
citations
#974

InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences

Hongkai Zheng, Wenda Chu, Bingliang Zhang et al.

ICLR 2025posterarXiv:2503.11043
26
citations
#975

Chain-of-Retrieval Augmented Generation

Liang Wang, Haonan Chen, Nan Yang et al.

NEURIPS 2025posterarXiv:2501.14342
26
citations
#976

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025posterarXiv:2407.15811
26
citations
#977

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Sicong Leng, Yun Xing, Zesen Cheng et al.

NEURIPS 2025posterarXiv:2410.12787
26
citations
#978

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Weifeng Lin, Xinyu Wei, Renrui Zhang et al.

ICLR 2025posterarXiv:2409.15278
26
citations
#979

LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Jian Jia, Yipei Wang, Yan Li et al.

AAAI 2025paperarXiv:2405.03988
26
citations
#980

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu et al.

ICML 2025posterarXiv:2501.16975
26
citations
#981

Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement

Dominik Grimm, Jonathan Pirnay

ICLR 2025posterarXiv:2403.15180
26
citations
#982

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Yiren Song, Danze Chen, Mike Zheng Shou

ICCV 2025posterarXiv:2502.01105
26
citations
#983

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Jinjin Zhang, qiuyu Huang, Junjie Liu et al.

CVPR 2025posterarXiv:2503.18352
26
citations
#984

DeFoG: Discrete Flow Matching for Graph Generation

Yiming Qin, Manuel Madeira, Dorina Thanou et al.

ICML 2025oralarXiv:2410.04263
26
citations
#985

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Shi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.

NEURIPS 2025posterarXiv:2504.16074
26
citations
#986

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang SHEN, Yue Li, Desong Meng et al.

ICLR 2025posterarXiv:2407.00132
26
citations
#987

On Large Language Model Continual Unlearning

Chongyang Gao, Lixu Wang, Kaize Ding et al.

ICLR 2025posterarXiv:2407.10223
26
citations
#988

CViT: Continuous Vision Transformer for Operator Learning

Sifan Wang, Jacob Seidman, Shyam Sankaran et al.

ICLR 2025oralarXiv:2405.13998
26
citations
#989

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025posterarXiv:2502.07350
26
citations
#990

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le et al.

AAAI 2025paperarXiv:2412.19663
26
citations
#991

SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning

Yichen Wu, Hongming Piao, Long-Kai Huang et al.

ICLR 2025posterarXiv:2501.13198
26
citations
#992

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

ICLR 2025posterarXiv:2406.00434
26
citations
#993

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.

NEURIPS 2025posterarXiv:2503.01822
26
citations
#994

Fast Feedforward 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Mengyao Li et al.

ICLR 2025posterarXiv:2410.08017
26
citations
#995

Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond

Qizhou Wang, Jin Zhou, (Andrew) Zhanke Zhou et al.

ICLR 2025posterarXiv:2502.19301
26
citations
#996

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen et al.

AAAI 2025paperarXiv:2408.17267
26
citations
#997

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat et al.

CVPR 2025posterarXiv:2401.05779
26
citations
#998

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Ying Chen, Guoan Wang, Yuanfeng Ji et al.

CVPR 2025posterarXiv:2410.11761
26
citations
#999

DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance

Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.

AAAI 2025paperarXiv:2406.18459
26
citations
#1000

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

CVPR 2025posterarXiv:2412.01169
26
citations