Most Cited 2025 "region representation" Papers

22,274 papers found • Page 5 of 112

#801

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Yoad Tewel, Rinon Gal, Dvir Samuel et al.

ICLR 2025posterarXiv:2411.07232
34
citations
#802

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover

NEURIPS 2025spotlightarXiv:2506.00413
34
citations
#803

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

CVPR 2025highlightarXiv:2501.18954
34
citations
#804

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025paperarXiv:2409.12753
34
citations
#805

Multi-Objective Evolution of Heuristic Using Large Language Model

Shunyu Yao, Fei Liu, Xi Lin et al.

AAAI 2025paperarXiv:2409.16867
34
citations
#806

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025paperarXiv:2412.18537
34
citations
#807

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Mintong Kang, Bo Li

ICLR 2025posterarXiv:2407.05557
34
citations
#808

Text4Seg: Reimagining Image Segmentation as Text Generation

Mengcheng Lan, Chaofeng Chen, Yue Zhou et al.

ICLR 2025posterarXiv:2410.09855
34
citations
#809

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

Ezra Karger, Houtan Bastani, Chen Yueh-Han et al.

ICLR 2025posterarXiv:2409.19839
34
citations
#810

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025posterarXiv:2411.16345
34
citations
#811

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NEURIPS 2025posterarXiv:2505.14521
34
citations
#812

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang, Marta Skreta, Cher-Tian Ser et al.

ICLR 2025posterarXiv:2406.16976
34
citations
#813

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025posterarXiv:2401.10232
34
citations
#814

AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

CVPR 2025highlightarXiv:2412.14123
34
citations
#815

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Xuehai He, Weixi Feng, Kaizhi Zheng et al.

ICLR 2025posterarXiv:2406.08407
34
citations
#816

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Yaxi Lu, Shenzhi Yang, Cheng Qian et al.

ICLR 2025posterarXiv:2410.12361
34
citations
#817

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153
34
citations
#818

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Ziyu Zhao, tao shen, Didi Zhu et al.

ICLR 2025posterarXiv:2409.16167
34
citations
#819

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts

Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian et al.

ICML 2025spotlightarXiv:2503.02819
34
citations
#820

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025posterarXiv:2408.15881
34
citations
#821

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee et al.

CVPR 2025posterarXiv:2411.16318
34
citations
#822

SuperBPE: Space Travel for Language Models

Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.

COLM 2025paperarXiv:2503.13423
34
citations
#823

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

Artyom Stitsyuk, Jaesik Choi

AAAI 2025paperarXiv:2412.17323
34
citations
#824

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang, Yuhao Wang, Zichen Wen et al.

NEURIPS 2025oralarXiv:2506.10100
34
citations
#825

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Shengji Tang, Weicai Ye, Peng Ye et al.

ICLR 2025posterarXiv:2410.06245
34
citations
#826

A Closer Look at Machine Unlearning for Large Language Models

Xiaojian Yuan, Tianyu Pang, Chao Du et al.

ICLR 2025posterarXiv:2410.08109
34
citations
#827

YOLOE: Real-Time Seeing Anything

Ao Wang, Lihao Liu, Hui Chen et al.

ICCV 2025posterarXiv:2503.07465
34
citations
#828

Which Attention Heads Matter for In-Context Learning?

Kayo Yin, Jacob Steinhardt

ICML 2025posterarXiv:2502.14010
34
citations
#829

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.

ICLR 2025posterarXiv:2409.02076
34
citations
#830

Improving the Diffusability of Autoencoders

Ivan Skorokhodov, Sharath Girish, Benran Hu et al.

ICML 2025posterarXiv:2502.14831
34
citations
#831

Think while You Generate: Discrete Diffusion with Planned Denoising

Sulin Liu, Juno Nam, Andrew Campbell et al.

ICLR 2025posterarXiv:2410.06264
34
citations
#832

Persistent Pre-training Poisoning of LLMs

Yiming Zhang, Javier Rando, Ivan Evtimov et al.

ICLR 2025posterarXiv:2410.13722
34
citations
#833

Dynamic Diffusion Transformer

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

ICLR 2025posterarXiv:2410.03456
34
citations
#834

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.

ICLR 2025posterarXiv:2406.03136
34
citations
#835

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.

ICLR 2025posterarXiv:2410.06912
34
citations
#836

Robust Autonomy Emerges from Self-Play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.

ICML 2025posterarXiv:2502.03349
34
citations
#837

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737
33
citations
#838

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025posterarXiv:2402.04236
33
citations
#839

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025posterarXiv:2404.10775
33
citations
#840

The Diffusion Duality

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan et al.

ICML 2025posterarXiv:2506.10892
33
citations
#841

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola et al.

ICLR 2025posterarXiv:2501.19309
33
citations
#842

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Hui Zhang, Dexiang Hong, Yitong Wang et al.

ICCV 2025posterarXiv:2412.03859
33
citations
#843

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.

COLM 2025paperarXiv:2503.00177
33
citations
#844

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.

ICLR 2025posterarXiv:2406.07072
33
citations
#845

What to align in multimodal contrastive learning?

Benoit Dufumier, Javiera Castillo Navarro, Devis Tuia et al.

ICLR 2025posterarXiv:2409.07402
33
citations
#846

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.

ICML 2025posterarXiv:2406.04391
33
citations
#847

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NEURIPS 2025posterarXiv:2505.14489
33
citations
#848

SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

AAAI 2025paperarXiv:2502.04347
33
citations
#849

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.

AAAI 2025paperarXiv:2402.05813
33
citations
#850

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi, Wenyao Zhang, Yufei Ding et al.

NEURIPS 2025spotlightarXiv:2502.13143
33
citations
#851

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao et al.

CVPR 2025posterarXiv:2503.03663
33
citations
#852

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning

Zhe Li, Weihao Yuan, Yisheng He et al.

ICLR 2025posterarXiv:2410.07093
33
citations
#853

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.

CVPR 2025posterarXiv:2501.04689
33
citations
#854

PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

Minghao Chen, Roman Shapovalov, Iro Laina et al.

CVPR 2025highlightarXiv:2412.18608
33
citations
#855

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
33
citations
#856

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199
33
citations
#857

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song et al.

AAAI 2025paperarXiv:2407.14078
33
citations
#858

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

ICLR 2025posterarXiv:2410.13638
33
citations
#859

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.

ICLR 2025posterarXiv:2405.14297
33
citations
#860

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025posterarXiv:2502.20694
33
citations
#861

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434
33
citations
#862

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao, Yang Zhang, Chuchu Fan

ICLR 2025posterarXiv:2410.12112
33
citations
#863

On the Emergence of Position Bias in Transformers

Xinyi Wu, Yifei Wang, Stefanie Jegelka et al.

ICML 2025posterarXiv:2502.01951
33
citations
#864

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Yang Liu, Chuanchen Luo, Zhongkai Mao et al.

ICLR 2025posterarXiv:2411.00771
32
citations
#865

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NEURIPS 2025oralarXiv:2504.02826
32
citations
#866

MonSter: Marry Monodepth to Stereo Unleashes Power

JunDa Cheng, Longliang Liu, Gangwei Xu et al.

CVPR 2025highlight
32
citations
#867

PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion

Sophia Tang, Yinuo Zhang, Pranam Chatterjee, PhD

ICML 2025posterarXiv:2412.17780
32
citations
#868

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning

Andreas Auer, Patrick Podest, Daniel Klotz et al.

NEURIPS 2025posterarXiv:2505.23719
32
citations
#869

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025posterarXiv:2410.13640
32
citations
#870

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu, Melanie Sclar, Skyler Hallinan et al.

ICLR 2025posterarXiv:2410.04265
32
citations
#871

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Yucheng Li, Huiqiang Jiang, Qianhui Wu et al.

ICLR 2025posterarXiv:2412.10319
32
citations
#872

Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang, Žan Gojčič, Huan Ling et al.

CVPR 2025poster
32
citations
#873

Retrieval-Augmented Generation with Conflicting Evidence

Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.13079
32
citations
#874

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025posterarXiv:2502.21079
32
citations
#875

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li et al.

CVPR 2025posterarXiv:2412.10958
32
citations
#876

Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2025posterarXiv:2406.06526
32
citations
#877

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Peng Jin, Bo Zhu, Yuan Li et al.

ICLR 2025posterarXiv:2410.07348
32
citations
#878

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025posterarXiv:2406.02720
32
citations
#879

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

ICLR 2025posterarXiv:2410.05440
32
citations
#880

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025posterarXiv:2411.19772
32
citations
#881

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.

CVPR 2025posterarXiv:2412.04472
32
citations
#882

LEGION: Learning to Ground and Explain for Synthetic Image Detection

Hengrui Kang, Siwei Wen, Zichen Wen et al.

ICCV 2025highlightarXiv:2503.15264
32
citations
#883

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025posterarXiv:2406.10210
32
citations
#884

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025posterarXiv:2406.04321
32
citations
#885

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Kunhao Zheng, Juliette Decugis, Jonas Gehring et al.

ICLR 2025posterarXiv:2410.08105
32
citations
#886

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries

HUAKUN LUO, Haixu Wu, Hang Zhou et al.

ICML 2025posterarXiv:2502.02414
32
citations
#887

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Emily Cheng, Diego Doimo, Corentin Kervadec et al.

ICLR 2025posterarXiv:2405.15471
32
citations
#888

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.

NEURIPS 2025posterarXiv:2505.11475
32
citations
#889

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
31
citations
#890

System 1.x: Learning to Balance Fast and Slow Planning with Language Models

Swarnadeep Saha, Archiki Prasad, Justin Chen et al.

ICLR 2025posterarXiv:2407.14414
31
citations
#891

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NEURIPS 2025spotlightarXiv:2502.08640
31
citations
#892

Spike No More: Stabilizing the Pre-training of Large Language Models

Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.

COLM 2025paperarXiv:2312.16903
31
citations
#893

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025paperarXiv:2409.01227
31
citations
#894

Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.

CVPR 2025posterarXiv:2411.18466
31
citations
#895

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Shalev Lifshitz, Sheila A. McIlraith, Yilun Du

COLM 2025paperarXiv:2502.20379
31
citations
#896

MAT-Agent: Adaptive Multi-Agent Training Optimization

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025posterarXiv:2510.17845
31
citations
#897

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Alireza Rezazadeh, Zichao Li, Wei Wei et al.

ICLR 2025posterarXiv:2410.14052
31
citations
#898

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287
31
citations
#899

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.

CVPR 2025posterarXiv:2412.06016
31
citations
#900

Guided Real Image Dehazing Using YCbCr Color Space

Wenxuan Fang, Junkai Fan, Yu Zheng et al.

AAAI 2025paperarXiv:2412.17496
31
citations
#901

McEval: Massively Multilingual Code Evaluation

Linzheng Chai, Shukai Liu, Jian Yang et al.

ICLR 2025posterarXiv:2406.07436
31
citations
#902

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

AAAI 2025paperarXiv:2405.15214
31
citations
#903

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025posterarXiv:2501.03218
31
citations
#904

Informed Correctors for Discrete Diffusion Models

Yixiu Zhao, Jiaxin Shi, Feng Chen et al.

NEURIPS 2025posterarXiv:2407.21243
31
citations
#905

VCA: Video Curious Agent for Long Video Understanding

Zeyuan Yang, Delin Chen, Xueyang Yu et al.

ICCV 2025posterarXiv:2412.10471
31
citations
#906

Gramian Multimodal Representation Learning and Alignment

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.

ICLR 2025posterarXiv:2412.11959
31
citations
#907

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill, Ashish Sabharwal

NEURIPS 2025posterarXiv:2503.03961
31
citations
#908

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peijie Dong, Lujun Li, Yuedong Zhong et al.

ICLR 2025posterarXiv:2408.01803
31
citations
#909

REEF: Representation Encoding Fingerprints for Large Language Models

Jie Zhang, Dongrui Liu, Chen Qian et al.

ICLR 2025posterarXiv:2410.14273
31
citations
#910

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Xiangpeng Yang, Linchao Zhu, Hehe Fan et al.

ICLR 2025posterarXiv:2502.17258
31
citations
#911

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Xinze Li, Sen Mei, Zhenghao Liu et al.

ICLR 2025posterarXiv:2410.13509
31
citations
#912

Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View

Xuan Liu, Jie ZHANG, HaoYang Shang et al.

ICLR 2025posterarXiv:2405.14744
31
citations
#913

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Xubing Ye, Yukang Gan, Yixiao Ge et al.

CVPR 2025posterarXiv:2412.00447
31
citations
#914

Number Cookbook: Number Understanding of Language Models and How to Improve It

Haotong Yang, Yi Hu, Shijia Kang et al.

ICLR 2025posterarXiv:2411.03766
31
citations
#915

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Mingyuan Zhou, Huangjie Zheng, Yi Gu et al.

ICLR 2025posterarXiv:2410.14919
31
citations
#916

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Yuheng Zhang, Dian Yu, Baolin Peng et al.

ICLR 2025posterarXiv:2407.00617
31
citations
#917

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

Haohan Weng, Yikai Wang, Tong Zhang et al.

ICLR 2025posterarXiv:2405.16890
31
citations
#918

From Tokens to Words: On the Inner Lexicon of LLMs

Guy Kaplan, Matanel Oren, Yuval Reif et al.

ICLR 2025posterarXiv:2410.05864
30
citations
#919

Spurious Forgetting in Continual Learning of Language Models

Junhao Zheng, Xidi Cai, Shengjie Qiu et al.

ICLR 2025posterarXiv:2501.13453
30
citations
#920

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.

AAAI 2025paperarXiv:2405.20535
30
citations
#921

ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement

Mengqi Lei, Haochen Wu, Xinhua Lv et al.

AAAI 2025paperarXiv:2412.08345
30
citations
#922

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data

Asad Aali, Giannis Daras, Brett Levac et al.

ICLR 2025posterarXiv:2403.08728
30
citations
#923

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Hailong Guo, Bohan Zeng, Yiren Song et al.

ICCV 2025posterarXiv:2501.15891
30
citations
#924

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

Haoyi Jiang, Liu Liu, Tianheng Cheng et al.

CVPR 2025posterarXiv:2412.13193
30
citations
#925

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao, Sizhe Dang, Haishan Ye et al.

ICLR 2025poster
30
citations
#926

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.

ICLR 2025posterarXiv:2407.13766
30
citations
#927

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460
30
citations
#928

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.

CVPR 2025posterarXiv:2411.04923
30
citations
#929

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang et al.

ICLR 2025posterarXiv:2410.06961
30
citations
#930

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025posterarXiv:2411.18203
30
citations
#931

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Zihui Cheng, Qiguang Chen, Jin Zhang et al.

AAAI 2025paperarXiv:2412.12932
30
citations
#932

Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality

Ying Jin, Zhimei Ren, Zhuoran Yang et al.

NEURIPS 2025posterarXiv:2212.09900
30
citations
#933

Empowering LLMs to Understand and Generate Complex Vector Graphics

XiMing Xing, Juncheng Hu, Guotao Liang et al.

CVPR 2025posterarXiv:2412.11102
30
citations
#934

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
30
citations
#935

LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Jian Jia, Yipei Wang, Yan Li et al.

AAAI 2025paperarXiv:2405.03988
30
citations
#936

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan et al.

NEURIPS 2025posterarXiv:2506.05573
30
citations
#937

A New Perspective on Shampoo's Preconditioner

Depen Morwani, Itai Shapira, Nikhil Vyas et al.

ICLR 2025posterarXiv:2406.17748
30
citations
#938

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Jiacheng Chen, Tianhao Liang, Sherman Siu et al.

ICLR 2025posterarXiv:2410.10563
30
citations
#939

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025posterarXiv:2502.06787
30
citations
#940

Evolutionary Large Language Model for Automated Feature Transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.

AAAI 2025paperarXiv:2405.16203
30
citations
#941

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paperarXiv:2406.16306
30
citations
#942

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

ICLR 2025posterarXiv:2501.13554
30
citations
#943

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025posterarXiv:2312.11556
30
citations
#944

Herald: A Natural Language Annotated Lean 4 Dataset

Guoxiong Gao, Yutong Wang, Jiedong Jiang et al.

ICLR 2025posterarXiv:2410.10878
30
citations
#945

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NEURIPS 2025posterarXiv:2505.22647
30
citations
#946

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025posterarXiv:2412.13337
30
citations
#947

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Qi Wu, Yubo Zhao, Yifan Wang et al.

ICLR 2025posterarXiv:2405.17013
30
citations
#948

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025posterarXiv:2406.05132
30
citations
#949

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Le Zhuo, Liangbing Zhao, Sayak Paul et al.

ICCV 2025posterarXiv:2504.16080
30
citations
#950

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Doohyuk Jang, Sihwan Park, June Yong Yang et al.

ICLR 2025posterarXiv:2410.03355
30
citations
#951

Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron

Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.

ICLR 2025poster
29
citations
#952

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Ziteng Wang, Jun Zhu, Jianfei Chen

ICLR 2025posterarXiv:2412.14711
29
citations
#953

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao et al.

ICLR 2025posterarXiv:2412.13630
29
citations
#954

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.

CVPR 2025posterarXiv:2411.11921
29
citations
#955

The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling

Andre Cornman, Jacob West-Roberts, Antonio Camargo et al.

ICLR 2025poster
29
citations
#956

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri et al.

NEURIPS 2025posterarXiv:2506.09038
29
citations
#957

OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation

Yuchen Lin, Chenguo Lin, Jianjin Xu et al.

ICLR 2025posterarXiv:2501.18982
29
citations
#958

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.

ICLR 2025posterarXiv:2409.11242
29
citations
#959

On Large Language Model Continual Unlearning

Chongyang Gao, Lixu Wang, Kaize Ding et al.

ICLR 2025posterarXiv:2407.10223
29
citations
#960

STAMP: Scalable Task- And Model-agnostic Collaborative Perception

Xiangbo Gao, Runsheng Xu, Jiachen Li et al.

ICLR 2025posterarXiv:2501.18616
29
citations
#961

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.

CVPR 2025posterarXiv:2504.03886
29
citations
#962

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

NEURIPS 2025posterarXiv:2502.00234
29
citations
#963

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Siwei Wen, junyan ye, Peilin Feng et al.

NEURIPS 2025posterarXiv:2503.14905
29
citations
#964

MoDeGPT: Modular Decomposition for Large Language Model Compression

Chi-Heng Lin, Shangqian Gao, James Smith et al.

ICLR 2025posterarXiv:2408.09632
29
citations
#965

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Navve Wasserman, Noam Rotstein, Roy Ganz et al.

CVPR 2025posterarXiv:2404.18212
29
citations
#966

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang et al.

CVPR 2025posterarXiv:2503.07135
29
citations
#967

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NEURIPS 2025posterarXiv:2506.01300
29
citations
#968

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

Yepeng Liu, Yiren Song, Hai Ci et al.

ICLR 2025posterarXiv:2410.05470
29
citations
#969

InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences

Chenyang Zhu, Kai Li, Yue Ma et al.

ICLR 2025posterarXiv:2412.01197
29
citations
#970

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

Weibo Gao, Qi Liu, Linan Yue et al.

AAAI 2025paperarXiv:2501.10332
29
citations
#971

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

Jixun Yao, Hexin Liu, CHEN CHEN et al.

ICLR 2025posterarXiv:2502.02942
29
citations
#972

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

CVPR 2025posterarXiv:2501.09695
29
citations
#973

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Xiaojuan Wang, Boyang Zhou, Brian Curless et al.

ICLR 2025posterarXiv:2408.15239
29
citations
#974

Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection

Wei Luo, Yunkang Cao, Haiming Yao et al.

CVPR 2025posterarXiv:2503.02424
29
citations
#975

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025posterarXiv:2407.14207
29
citations
#976

The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Filippo Bigi, Marcel Langer, Michele Ceriotti

ICML 2025oralarXiv:2412.11569
29
citations
#977

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Sheng Zhou, Junbin Xiao, Qingyun Li et al.

CVPR 2025posterarXiv:2502.07411
29
citations
#978

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

NEURIPS 2025posterarXiv:2506.05302
29
citations
#979

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.

ICLR 2025posterarXiv:2410.08182
29
citations
#980

FlatQuant: Flatness Matters for LLM Quantization

Yuxuan Sun, Ruikang Liu, Haoli Bai et al.

ICML 2025posterarXiv:2410.09426
29
citations
#981

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior

Hanyu Wang, Saksham Suri, Yixuan Ren et al.

ICLR 2025posterarXiv:2410.21264
29
citations
#982

Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Sheng Liu, Haotian Ye, James Y Zou

ICLR 2025poster
29
citations
#983

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Vishwesh Nath, Wenqi Li, Dong Yang et al.

CVPR 2025highlightarXiv:2411.12915
29
citations
#984

Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting

Siru Zhong, Weilin Ruan, Ming Jin et al.

ICML 2025oralarXiv:2502.04395
29
citations
#985

MegaMath: Pushing the Limits of Open Math Corpora

Fan Zhou, Zengzhi Wang, Nikhil Ranjan et al.

COLM 2025paperarXiv:2504.02807
29
citations
#986

Machine Unlearning Fails to Remove Data Poisoning Attacks

Martin Pawelczyk, Jimmy Di, Yiwei Lu et al.

ICLR 2025posterarXiv:2406.17216
29
citations
#987

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025posterarXiv:2501.04686
29
citations
#988

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.

ICLR 2025posterarXiv:2410.16251
29
citations
#989

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025posterarXiv:2412.05276
29
citations
#990

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Liming Jiang, Qing Yan, Yumin Jia et al.

ICCV 2025highlightarXiv:2503.16418
29
citations
#991

Competing Large Language Models in Multi-Agent Gaming Environments

Jen-Tse Huang, Eric John Li, Man Ho LAM et al.

ICLR 2025poster
28
citations
#992

Denoising Autoregressive Transformers for Scalable Text-to-Image Generation

Jiatao Gu, Yuyang Wang, Yizhe Zhang et al.

ICLR 2025posterarXiv:2410.08159
28
citations
#993

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Yunzhen Feng, Elvis Dohmatob, Pu Yang et al.

ICLR 2025posterarXiv:2406.07515
28
citations
#994

Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport

Zhenyi Zhang, Tiejun Li, Peijie Zhou

ICLR 2025posterarXiv:2410.00844
28
citations
#995

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025posterarXiv:2501.09757
28
citations
#996

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025posterarXiv:2502.12892
28
citations
#997

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai et al.

AAAI 2025paperarXiv:2412.11023
28
citations
#998

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025posterarXiv:2503.21483
28
citations
#999

Long-Context State-Space Video World Models

Ryan Po, Yotam Nitzan, Richard Zhang et al.

ICCV 2025posterarXiv:2505.20171
28
citations
#1000

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Junyan Ye, Baichuan Zhou, Zilong Huang et al.

ICLR 2025posterarXiv:2410.09732
28
citations