Most Cited 2025 "conversational state understanding" Papers

22,274 papers found • Page 10 of 112

#1801

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Ruoxuan Feng, Jiangyu Hu, Wenke Xia et al.

ICLR 2025arXiv:2502.12191
25
citations
#1802

Self-Adapting Language Models

Adam Zweiger, Jyo Pari, Han Guo et al.

NEURIPS 2025arXiv:2506.10943
25
citations
#1803

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

CVPR 2025arXiv:2411.17190
25
citations
#1804

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Yiying Yang, Wei Cheng, Sijin Chen et al.

NEURIPS 2025arXiv:2504.06263
25
citations
#1805

OrcaLoca: An LLM Agent Framework for Software Issue Localization

Zhongming Yu, Hejia Zhang, Yujie Zhao et al.

ICML 2025arXiv:2502.00350
25
citations
#1806

Towards a Mechanistic Explanation of Diffusion Model Generalization

Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy et al.

ICML 2025spotlightarXiv:2411.19339
25
citations
#1807

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Hongkang Li, Yihua Zhang, shuai ZHANG et al.

ICLR 2025arXiv:2504.10957
25
citations
#1808

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Zizheng Pan, Bohan Zhuang, De-An Huang et al.

ICLR 2025arXiv:2402.14167
25
citations
#1809

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Yang Zhou, Xu Gao, Zichong Chen et al.

CVPR 2025arXiv:2502.20235
25
citations
#1810

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025arXiv:2412.11890
25
citations
#1811

LICO: Large Language Models for In-Context Molecular Optimization

Tung Nguyen, Aditya Grover

ICLR 2025arXiv:2406.18851
25
citations
#1812

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes

Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.

ICLR 2025oralarXiv:2501.00602
25
citations
#1813

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Sangmin Bae, Adam Fisch, Hrayr Harutyunyan et al.

ICLR 2025arXiv:2410.20672
25
citations
#1814

Modifying Large Language Model Post-Training for Diverse Creative Writing

John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele et al.

COLM 2025paperarXiv:2503.17126
25
citations
#1815

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

CVPR 2025arXiv:2405.17403
25
citations
#1816

Self-Consistency Preference Optimization

Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.

ICML 2025arXiv:2411.04109
25
citations
#1817

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

Ming Hu, Kun yuan, Yaling Shen et al.

ICCV 2025arXiv:2411.15421
25
citations
#1818

CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models

Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.

CVPR 2025arXiv:2412.12093
25
citations
#1819

3D Vision-Language Gaussian Splatting

Qucheng Peng, Benjamin Planche, Zhongpai Gao et al.

ICLR 2025arXiv:2410.07577
25
citations
#1820

Generalizable Human Gaussians from Single-View Image

Jinnan Chen, Chen Li, Jianfeng Zhang et al.

ICLR 2025arXiv:2406.06050
25
citations
#1821

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding

Yunlong Tang, Daiki Shimada, Jing Bi et al.

AAAI 2025paperarXiv:2403.16276
25
citations
#1822

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.

AAAI 2025paperarXiv:2408.10682
25
citations
#1823

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

Zhangquan Chen, Xufang Luo, Dongsheng Li

ICCV 2025arXiv:2503.07523
25
citations
#1824

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.

ICLR 2025arXiv:2412.14957
25
citations
#1825

Fast Exact Unlearning for In-Context Learning Data for LLMs

Andrei Muresanu, Anvith Thudi, Michael Zhang et al.

ICML 2025arXiv:2402.00751
25
citations
#1826

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Lue Fan, Hao ZHANG, Qitai Wang et al.

CVPR 2025arXiv:2412.03566
25
citations
#1827

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

Cassidy Laidlaw, Shivam Singhal, Anca Dragan

ICLR 2025arXiv:2403.03185
25
citations
#1828

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

Yilun Zhou, Austin Xu, PeiFeng Wang et al.

ICML 2025arXiv:2504.15253
25
citations
#1829

Core Knowledge Deficits in Multi-Modal Language Models

Yijiang Li, Qingying Gao, Tianwei Zhao et al.

ICML 2025arXiv:2410.10855
25
citations
#1830

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.

NEURIPS 2025arXiv:2505.21497
25
citations
#1831

BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion

Huafeng Li, Dayong Su, Qing Cai et al.

AAAI 2025paperarXiv:2412.08050
25
citations
#1832

Regularization by Texts for Latent Diffusion Inverse Solvers

Jeongsol Kim, Geon Yeong Park, Hyungjin Chung et al.

ICLR 2025arXiv:2311.15658
25
citations
#1833

Instant Policy: In-Context Imitation Learning via Graph Diffusion

Vitalis Vosylius, Edward Johns

ICLR 2025arXiv:2411.12633
25
citations
#1834

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models Via Visual Information Steering

Zhuowei Li, Haizhou Shi, Yunhe Gao et al.

ICML 2025arXiv:2502.03628
25
citations
#1835

SWE-bench Goes Live!

Linghao Zhang, Shilin He, Chaoyun Zhang et al.

NEURIPS 2025arXiv:2505.23419
25
citations
#1836

Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel

Shivam Gupta, Linda Cai, Sitan Chen

ICLR 2025arXiv:2406.00924
25
citations
#1837

rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

Yifei Liu, Li Lyna Zhang, Yi Zhu et al.

NEURIPS 2025arXiv:2505.21297
25
citations
#1838

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives

Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka et al.

ICLR 2025arXiv:2502.02723
25
citations
#1839

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi et al.

CVPR 2025arXiv:2503.01113
25
citations
#1840

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

Jiahao Li, Weijian Ma, Xueyang Li et al.

CVPR 2025arXiv:2505.04481
25
citations
#1841

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025arXiv:2411.17221
25
citations
#1842

Weight ensembling improves reasoning in language models

Xingyu Dang, Christina Baek, Kaiyue Wen et al.

COLM 2025paperarXiv:2504.10478
25
citations
#1843

MMRL: Multi-Modal Representation Learning for Vision-Language Models

Yuncheng Guo, Xiaodong Gu

CVPR 2025arXiv:2503.08497
25
citations
#1844

Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine

Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.

AAAI 2025paperarXiv:2412.09278
25
citations
#1845

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NEURIPS 2025arXiv:2505.20347
25
citations
#1846

PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset

Jiazhen Liu, Yuhan Fu, Ruobing Xie et al.

CVPR 2025highlightarXiv:2403.11116
25
citations
#1847

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Chenyu Li, Oscar Michel, Xichen Pan et al.

ICML 2025arXiv:2503.09595
25
citations
#1848

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Alec Helbling, Tuna Han Salih Meral, Benjamin Hoover et al.

ICML 2025oralarXiv:2502.04320
25
citations
#1849

JetFormer: An autoregressive generative model of raw images and text

Michael Tschannen, André Susano Pinto, Alexander Kolesnikov

ICLR 2025arXiv:2411.19722
25
citations
#1850

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.

COLM 2025paperarXiv:2504.05108
25
citations
#1851

FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model

Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.

ICLR 2025arXiv:2412.08261
25
citations
#1852

Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation

Ying Jin, Jinlong Peng, Qingdong He et al.

CVPR 2025arXiv:2408.13509
25
citations
#1853

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.

ICCV 2025arXiv:2404.03214
25
citations
#1854

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

chengqian gao, Haonan Li, Liu Liu et al.

ICML 2025arXiv:2502.09650
25
citations
#1855

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Xiaosen Zheng, Tianyu Pang, Chao Du et al.

ICLR 2025arXiv:2410.07137
25
citations
#1856

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025arXiv:2505.16652
25
citations
#1857

ParGo: Bridging Vision-Language with Partial and Global Views

An-Lan Wang, Bin Shan, Wei Shi et al.

AAAI 2025paperarXiv:2408.12928
25
citations
#1858

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

NEURIPS 2025arXiv:2505.16984
25
citations
#1859

Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models

Jingyang Zhang, Jingwei Sun, Eric Yeats et al.

ICLR 2025
24
citations
#1860

Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images

Sichen Zhu, Yuchen Zhu, Molei Tao et al.

ICLR 2025arXiv:2501.15598
24
citations
#1861

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

Jingyu Zhang, Ahmed Elgohary Ghoneim, Ahmed Magooda et al.

ICLR 2025arXiv:2410.08968
24
citations
#1862

M-Prometheus: A Suite of Open Multilingual LLM Judges

José Pombal, Dongkeun Yoon, Patrick Fernandes et al.

COLM 2025paperarXiv:2504.04953
24
citations
#1863

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Peiwen Sun, Sitong Cheng, Xiangtai Li et al.

ICLR 2025arXiv:2410.10676
24
citations
#1864

Leveraging Large Language Models for Node Generation in Few-Shot Learning on Text-Attributed Graphs

Jianxiang Yu, Yuxiang Ren, Chenghua Gong et al.

AAAI 2025paperarXiv:2310.09872
24
citations
#1865

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs

Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.

ICLR 2025arXiv:2502.15938
24
citations
#1866

Backtracking Improves Generation Safety

Yiming Zhang, Jianfeng Chi, Hailey Nguyen et al.

ICLR 2025arXiv:2409.14586
24
citations
#1867

miniCTX: Neural Theorem Proving with (Long-)Contexts

Jiewen Hu, Thomas Zhu, Sean Welleck

ICLR 2025arXiv:2408.03350
24
citations
#1868

EditAR: Unified Conditional Generation with Autoregressive Models

Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang

CVPR 2025arXiv:2501.04699
24
citations
#1869

Universal Image Restoration Pre-training via Degradation Classification

Jiakui Hu, Lujia Jin, Zhengjian Yao et al.

ICLR 2025arXiv:2501.15510
24
citations
#1870

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Xinyan Chen, Renrui Zhang, Dongzhi JIANG et al.

NEURIPS 2025arXiv:2506.05331
24
citations
#1871

Mixture Compressor for Mixture-of-Experts LLMs Gains More

Wei Huang, Yue Liao, Jianhui Liu et al.

ICLR 2025arXiv:2410.06270
24
citations
#1872

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025arXiv:2503.18278
24
citations
#1873

Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.

CVPR 2025highlightarXiv:2502.16652
24
citations
#1874

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Rylan Schaeffer, Dan Valentine, Luke Bailey et al.

ICLR 2025arXiv:2407.15211
24
citations
#1875

Agent-Oriented Planning in Multi-Agent Systems

Ao LI, Yuexiang Xie, Songze Li et al.

ICLR 2025arXiv:2410.02189
24
citations
#1876

MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

XINJIE ZHANG, Zhening Liu, Yifan Zhang et al.

ICCV 2025highlightarXiv:2410.13613
24
citations
#1877

GOAL: A Generalist Combinatorial Optimization Agent Learner

Darko Drakulić, Sofia Michel, Jean-Marc Andreoli

ICLR 2025arXiv:2406.15079
24
citations
#1878

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

Wufei Ma, Luoxin Ye, Nessa McWeeney et al.

CVPR 2025highlightarXiv:2505.00788
24
citations
#1879

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2507.21606
24
citations
#1880

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu et al.

ICLR 2025arXiv:2410.23317
24
citations
#1881

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.

CVPR 2025arXiv:2412.10209
24
citations
#1882

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

Chen Qian, Dongrui Liu, Hao Wen et al.

NEURIPS 2025arXiv:2506.02867
24
citations
#1883

Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

Shiqi Chen, Jinghan Zhang, Tongyao Zhu et al.

ICML 2025arXiv:2505.05464
24
citations
#1884

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Jaihoon Kim, Taehoon Yoon, Jisung Hwang et al.

NEURIPS 2025arXiv:2503.19385
24
citations
#1885

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert Dick et al.

ICLR 2025arXiv:2408.12578
24
citations
#1886

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models

Tianyu Fu, Tengxuan Liu, Qinghao Han et al.

ICCV 2025arXiv:2501.01986
24
citations
#1887

Halton Scheduler for Masked Generative Image Transformer

Victor Besnier, Mickael Chen, David Hurych et al.

ICLR 2025arXiv:2503.17076
24
citations
#1888

Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction

Edgar Sucar, Zihang Lai, Eldar Insafutdinov et al.

ICCV 2025highlightarXiv:2503.16318
24
citations
#1889

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Jie Cheng, Gang Xiong, Ruixi Qiao et al.

NEURIPS 2025arXiv:2504.15275
24
citations
#1890

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Haoyi Zhu, Honghui Yang, Yating Wang et al.

ICLR 2025arXiv:2410.08208
24
citations
#1891

Discrepancy Minimization in Input-Sparsity Time

Yichuan Deng, Xiaoyu Li, Zhao Song et al.

ICML 2025spotlightarXiv:2210.12468
24
citations
#1892

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

Weixiang Yan, Haitian Liu, Tengxiao Wu et al.

NEURIPS 2025arXiv:2406.13890
24
citations
#1893

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.

ICML 2025arXiv:2411.16375
24
citations
#1894

ARB-LLM: Alternating Refined Binarizations for Large Language Models

Zhiteng Li, Xianglong Yan, Tianao Zhang et al.

ICLR 2025arXiv:2410.03129
24
citations
#1895

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Logan Cross, Violet Xiang, Agam Bhatia et al.

ICLR 2025arXiv:2407.07086
24
citations
#1896

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Zhen Xing, Qi Dai, Zejia Weng et al.

ICCV 2025arXiv:2406.06465
24
citations
#1897

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.

CVPR 2025arXiv:2405.13637
24
citations
#1898

Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

Qichao Shentu, Beibu Li, Kai Zhao et al.

ICLR 2025arXiv:2405.15273
24
citations
#1899

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

Yuhao Wang, Xuehu Liu, Tianyu Yan et al.

AAAI 2025paperarXiv:2412.10707
24
citations
#1900

CleanDIFT: Diffusion Features without Noise

Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.

CVPR 2025arXiv:2412.03439
24
citations
#1901

Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering

Yutao Feng, Xiang Feng, Yintong Shang et al.

CVPR 2025arXiv:2401.15318
24
citations
#1902

On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity

Quentin Bertrand, Anne Gagneux, Mathurin Massias et al.

NEURIPS 2025oralarXiv:2506.03719
24
citations
#1903

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025arXiv:2504.06632
24
citations
#1904

Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model

Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.

CVPR 2025arXiv:2404.05583
24
citations
#1905

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Bhavya, Stelian Coros, Andreas Krause et al.

ICLR 2025arXiv:2412.12098
24
citations
#1906

Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons

Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.

NEURIPS 2025arXiv:2406.14144
24
citations
#1907

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025arXiv:2412.03283
24
citations
#1908

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Cunxiang Wang, Ruoxi Ning, Boqi Pan et al.

ICLR 2025arXiv:2403.12766
24
citations
#1909

GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

Xiang Lan, Feng Wu, Kai He et al.

NEURIPS 2025arXiv:2503.06073
24
citations
#1910

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Hao Wen, Zehuan Huang, Yaohui Wang et al.

CVPR 2025arXiv:2406.03184
24
citations
#1911

Safety Pretraining: Toward the Next Generation of Safe AI

Pratyush Maini, Sachin Goyal, Dylan Sam et al.

NEURIPS 2025oralarXiv:2504.16980
24
citations
#1912

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.

NEURIPS 2025spotlightarXiv:2503.04412
24
citations
#1913

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Daniil Vankov, Anton Rodomanov, Angelia Nedich et al.

ICLR 2025arXiv:2410.10800
24
citations
#1914

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Luxi He, Yangsibo Huang, Weijia Shi et al.

ICLR 2025arXiv:2406.14526
24
citations
#1915

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot, Bogdan Mazoure, Omar Attia et al.

CVPR 2025arXiv:2412.08442
24
citations
#1916

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

Peidong Li, Dixiao Cui

ICLR 2025oralarXiv:2409.18341
24
citations
#1917

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Kaicheng Yang, Tiancheng Gu, Xiang An et al.

AAAI 2025paperarXiv:2408.09441
24
citations
#1918

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs

Yunqiu Xu, Linchao Zhu, Yi Yang

ICCV 2025arXiv:2410.12332
24
citations
#1919

NightHaze: Nighttime Image Dehazing via Self-Prior Learning

Beibei Lin, Yeying Jin, Yan Wending et al.

AAAI 2025paperarXiv:2403.07408
24
citations
#1920

Training on the Test Task Confounds Evaluation and Emergence

Ricardo Dominguez-Olmedo, Florian Eddie Dorner, Moritz Hardt

ICLR 2025arXiv:2407.07890
24
citations
#1921

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

CVPR 2025arXiv:2404.15611
24
citations
#1922

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li et al.

CVPR 2025arXiv:2503.14908
24
citations
#1923

OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Xiaoqiang Wang, Bang Liu

ICLR 2025arXiv:2410.18963
24
citations
#1924

Towards Foundation Models for Mixed Integer Linear Programming

Sirui Li, Janardhan Kulkarni, Ishai Menache et al.

ICLR 2025arXiv:2410.08288
24
citations
#1925

Specialized Foundation Models Struggle to Beat Supervised Baselines

Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.

ICLR 2025arXiv:2411.02796
24
citations
#1926

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Xianfu Cheng, Wei Zhang, Shiwei Zhang et al.

ICCV 2025arXiv:2502.13059
24
citations
#1927

QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?

Belinda Li, Been Kim, Zi Wang

NEURIPS 2025arXiv:2503.22674
24
citations
#1928

Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data

Florian Eddie Dorner, Vivian Nastl, Moritz Hardt

ICLR 2025
24
citations
#1929

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Xu Cao, Yifan Shen, Bolin Lai et al.

COLM 2025paperarXiv:2406.10424
24
citations
#1930

Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Shizhe Diao, Yu Yang, Yonggan Fu et al.

NEURIPS 2025spotlightarXiv:2504.13161
24
citations
#1931

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

George Wang, Jesse Hoogland, Stan van Wingerden et al.

ICLR 2025arXiv:2410.02984
24
citations
#1932

Exploiting Diffusion Prior for Real-World Image Dehazing with Unpaired Training

Yunwei Lan, Zhigao Cui, Chang Liu et al.

AAAI 2025paperarXiv:2503.15017
24
citations
#1933

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Yiwu Zhong, Zhuoming Liu, Yin Li et al.

ICCV 2025arXiv:2412.03248
24
citations
#1934

RouteLLM: Learning to Route LLMs from Preference Data

Isaac Ong, Amjad Almahairi, Vincent Wu et al.

ICLR 2025
24
citations
#1935

Towards General-Purpose Model-Free Reinforcement Learning

Scott Fujimoto, Pierluca D'Oro, Amy Zhang et al.

ICLR 2025arXiv:2501.16142
24
citations
#1936

Teaching Language Models to Critique via Reinforcement Learning

Zhihui Xie, Jie chen, Liyu Chen et al.

ICML 2025arXiv:2502.03492
24
citations
#1937

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Rongyao Fang, Chengqi Duan, Kun Wang et al.

ICCV 2025arXiv:2410.13861
24
citations
#1938

RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

Kaiwen Zha, Zhengqi Gao, Maohao Shen et al.

NEURIPS 2025arXiv:2505.15034
24
citations
#1939

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui, Mo Zhu, Yulei Qin et al.

AAAI 2025paperarXiv:2412.14528
24
citations
#1940

FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Taekyung Ki, Dongchan Min, Gyeongsu Chae

ICCV 2025arXiv:2412.01064
24
citations
#1941

FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping

Yuzhou Ji, He Zhu, Junshu Tang et al.

AAAI 2025paperarXiv:2406.01916
24
citations
#1942

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

Zhefei Gong, Pengxiang Ding, Shangke Lyu et al.

ICCV 2025arXiv:2412.06782
24
citations
#1943

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.

COLM 2025paperarXiv:2502.01976
24
citations
#1944

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Muye Huang, Han Lai, Xinyu Zhang et al.

AAAI 2025paperarXiv:2409.01577
24
citations
#1945

MC^2: Multi-concept Guidance for Customized Multi-concept Generation

Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.

CVPR 2025arXiv:2404.05268
24
citations
#1946

GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration

Yuhang Li, Ruokai Yin, Donghyun Lee et al.

ICML 2025arXiv:2504.02692
24
citations
#1947

I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength

Wanquan Feng, Jiawei Liu, Pengqi Tu et al.

ICLR 2025arXiv:2411.06525
24
citations
#1948

Emergence of meta-stable clustering in mean-field transformer models

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

ICLR 2025arXiv:2410.23228
24
citations
#1949

WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting

Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz

AAAI 2025paperarXiv:2412.17176
24
citations
#1950

AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

Mintong Kang, Chejian Xu, Bo Li

ICLR 2025oralarXiv:2412.08608
24
citations
#1951

OpenVIS: Open-vocabulary Video Instance Segmentation

Pinxue Guo, Hao Huang, Peiyang He et al.

AAAI 2025paperarXiv:2305.16835
24
citations
#1952

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Yun Qu, Yuhang Jiang, Boyuan Wang et al.

AAAI 2025paperarXiv:2412.11120
24
citations
#1953

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.

ICLR 2025arXiv:2408.15313
24
citations
#1954

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.

ICML 2025arXiv:2505.07395
24
citations
#1955

Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models

Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy et al.

NEURIPS 2025arXiv:2506.04210
24
citations
#1956

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Oussama Zekri, Nicolas Boulle

NEURIPS 2025arXiv:2502.01384
24
citations
#1957

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025arXiv:2411.10836
24
citations
#1958

Diverse Preference Learning for Capabilities and Alignment

Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell

ICLR 2025arXiv:2511.08594
24
citations
#1959

Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma, Haiteng Zhao, Junlei Zhang et al.

ICLR 2025arXiv:2410.17195
24
citations
#1960

Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search

Boyan Li, Jiayi Zhang, Ju Fan et al.

ICML 2025arXiv:2502.17248
24
citations
#1961

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Nishad Singhi, Hritik Bansal, Arian Hosseini et al.

COLM 2025paper
24
citations
#1962

From Language Models over Tokens to Language Models over Characters

Tim Vieira, Benjamin LeBrun, Mario Giulianelli et al.

ICML 2025spotlightarXiv:2412.03719
23
citations
#1963

Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion

Enrico Ventura, Beatrice Achilli, Gianluigi Silvestri et al.

ICLR 2025arXiv:2410.05898
23
citations
#1964

Population Transformer: Learning Population-level Representations of Neural Activity

Geeling Chau, Christopher Wang, Sabera Talukder et al.

ICLR 2025oralarXiv:2406.03044
23
citations
#1965

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

Jingtao Li, Yingyi Liu, XINYU WANG et al.

CVPR 2025arXiv:2503.21841
23
citations
#1966

MeshArt: Generating Articulated Meshes with Structure-Guided Transformers

Daoyi Gao, Mohd Yawar Nihal Siddiqui, Lei Li et al.

CVPR 2025arXiv:2412.11596
23
citations
#1967

Faster Cascades via Speculative Decoding

Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat et al.

ICLR 2025arXiv:2405.19261
23
citations
#1968

Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning

Melanie Sclar, Jane Dwivedi-Yu, Maryam Fazel-Zarandi et al.

ICLR 2025arXiv:2412.12175
23
citations
#1969

SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Alan Baade, Puyuan Peng, David Harwath

ICLR 2025arXiv:2410.04029
23
citations
#1970

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Angela Castillo, Jonas Kohler, Juan C. Pérez et al.

AAAI 2025paperarXiv:2312.12487
23
citations
#1971

Language Imbalance Driven Rewarding for Multilingual Self-improving

Wen Yang, Junhong Wu, Chen Wang et al.

ICLR 2025arXiv:2410.08964
23
citations
#1972

A Simple Model of Inference Scaling Laws

Noam Levi

ICML 2025arXiv:2410.16377
23
citations
#1973

PhysAnimator: Physics-Guided Generative Cartoon Animation

Tianyi Xie, Yiwei Zhao, Ying Jiang et al.

CVPR 2025arXiv:2501.16550
23
citations
#1974

FLAIR: VLM with Fine-grained Language-informed Image Representations

Rui Xiao, Sanghwan Kim, Iuliana Georgescu et al.

CVPR 2025arXiv:2412.03561
23
citations
#1975

MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions

Jian Wu, Linyi Yang, Dongyuan Li et al.

ICLR 2025
23
citations
#1976

Erasing Conceptual Knowledge from Language Models

Rohit Gandikota, Sheridan Feucht, Samuel Marks et al.

NEURIPS 2025arXiv:2410.02760
23
citations
#1977

Language Models are Advanced Anonymizers

Robin Staab, Mark Vero, Mislav Balunovic et al.

ICLR 2025arXiv:2402.13846
23
citations
#1978

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.

ICCV 2025arXiv:2506.22139
23
citations
#1979

Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge

Hanna Wallach, Meera Desai, A. Feder Cooper et al.

ICML 2025arXiv:2502.00561
23
citations
#1980

Oscillatory State-Space Models

T. Konstantin Rusch, Daniela Rus

ICLR 2025arXiv:2410.03943
23
citations
#1981

Heavy-Tailed Diffusion Models

Kushagra Pandey, Jaideep Pathak, Yilun Xu et al.

ICLR 2025arXiv:2410.14171
23
citations
#1982

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Jiaxiang Cheng, Pan Xie, Xin Xia et al.

AAAI 2025paperarXiv:2403.02084
23
citations
#1983

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

Zicheng Zhang, Tengchuan Kou, Chunyi Li et al.

CVPR 2025arXiv:2503.02357
23
citations
#1984

ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks

Qiang Liu, Mengyu Chu, Nils Thuerey

ICLR 2025arXiv:2408.11104
23
citations
#1985

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images

Tianhao Wu, Chuanxia Zheng, Frank Guan et al.

ICCV 2025arXiv:2503.13439
23
citations
#1986

On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality

Jerry Yao-Chieh Hu, Weimin Wu, Yi-Chen Lee et al.

ICLR 2025arXiv:2411.17522
23
citations
#1987

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma et al.

NEURIPS 2025arXiv:2504.19162
23
citations
#1988

Transformers are Universal In-context Learners

Takashi Furuya, Maarten V de Hoop, Gabriel Peyré

ICLR 2025arXiv:2408.01367
23
citations
#1989

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Gleb Rodionov, Roman Garipov, Alina Shutova et al.

NEURIPS 2025spotlightarXiv:2504.06261
23
citations
#1990

MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation

Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.

CVPR 2025arXiv:2503.13446
23
citations
#1991

Grounding Video Models to Actions through Goal Conditioned Exploration

Yunhao Luo, Yilun Du

ICLR 2025arXiv:2411.07223
23
citations
#1992

HELMET: How to Evaluate Long-context Models Effectively and Thoroughly

Howard Yen, Tianyu Gao, Minmin Hou et al.

ICLR 2025
23
citations
#1993

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Luca Barsellotti, Lorenzo Bianchi, Nicola Messina et al.

ICCV 2025arXiv:2411.19331
23
citations
#1994

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan

ICLR 2025arXiv:2501.09009
23
citations
#1995

Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning

Puning Yang, Qizhou Wang, Zhuo Huang et al.

ICML 2025arXiv:2505.11953
23
citations
#1996

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Zihan Liu, Shuangrui Ding, Zhixiong Zhang et al.

ICML 2025arXiv:2502.13128
23
citations
#1997

Mastering Board Games by External and Internal Planning with Language Models

John Schultz, Jakub Adamek, Matej Jusup et al.

ICML 2025spotlightarXiv:2412.12119
23
citations
#1998

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025arXiv:2412.12077
23
citations
#1999

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.

ICLR 2025arXiv:2406.16257
23
citations
#2000

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition

Yongkun Du, Zhineng Chen, Hongtao Xie et al.

ICCV 2025arXiv:2411.15858
23
citations