Most Cited 2025 "helmholtz equations" Papers

22,274 papers found • Page 6 of 112

#1001

Denoising Autoregressive Transformers for Scalable Text-to-Image Generation

Jiatao Gu, Yuyang Wang, Yizhe Zhang et al.

ICLR 2025posterarXiv:2410.08159
28
citations
#1002

RUN: Reversible Unfolding Network for Concealed Object Segmentation

Chunming He, Rihan Zhang, Fengyang Xiao et al.

ICML 2025posterarXiv:2501.18783
28
citations
#1003

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Yunzhen Feng, Elvis Dohmatob, Pu Yang et al.

ICLR 2025posterarXiv:2406.07515
28
citations
#1004

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Zongyi Li, Shujie HU, Shujie LIU et al.

ICLR 2025oralarXiv:2410.20502
28
citations
#1005

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Zhejun Zhang, Peter Karkus, Maximilian Igl et al.

CVPR 2025posterarXiv:2412.05334
28
citations
#1006

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.

ICLR 2025posterarXiv:2410.07815
28
citations
#1007

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.

ICLR 2025posterarXiv:2501.13468
28
citations
#1008

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025posterarXiv:2501.04931
28
citations
#1009

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Geng Li, Jinglin Xu, Yunzhen Zhao et al.

CVPR 2025highlightarXiv:2504.14920
28
citations
#1010

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NEURIPS 2025posterarXiv:2503.20762
28
citations
#1011

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Junyan Ye, Baichuan Zhou, Zilong Huang et al.

ICLR 2025posterarXiv:2410.09732
28
citations
#1012

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Yuxiao Hu, Qian Li, Dongxiao Zhang et al.

ICLR 2025posterarXiv:2501.03747
28
citations
#1013

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.

NEURIPS 2025spotlightarXiv:2505.11423
28
citations
#1014

Long-Context State-Space Video World Models

Ryan Po, Yotam Nitzan, Richard Zhang et al.

ICCV 2025posterarXiv:2505.20171
28
citations
#1015

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025posterarXiv:2501.09757
28
citations
#1016

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025posterarXiv:2503.21483
28
citations
#1017

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Yiyou Sun, Shawn Hu, Georgia Zhou et al.

NEURIPS 2025posterarXiv:2506.18880
28
citations
#1018

Improving Uncertainty Estimation through Semantically Diverse Language Generation

Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi et al.

ICLR 2025posterarXiv:2406.04306
28
citations
#1019

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai et al.

AAAI 2025paperarXiv:2412.11023
28
citations
#1020

Overtrained Language Models Are Harder to Fine-Tune

Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen et al.

ICML 2025posterarXiv:2503.19206
28
citations
#1021

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025posterarXiv:2412.07762
28
citations
#1022

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Haochen Wang, Yucheng Zhao, Tiancai Wang et al.

ICCV 2025posterarXiv:2504.01901
28
citations
#1023

From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams, Liam Bai, Minji Lee et al.

ICML 2025spotlight
28
citations
#1024

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025posterarXiv:2502.12892
28
citations
#1025

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025highlightarXiv:2502.20653
28
citations
#1026

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025posterarXiv:2501.09781
28
citations
#1027

Can Large Language Models Understand Symbolic Graphics Programs?

Zeju Qiu, Weiyang Liu, Haiwen Feng et al.

ICLR 2025posterarXiv:2408.08313
28
citations
#1028

Diffusion-based Neural Network Weights Generation

Bedionita Soro, Bruno Andreis, Hayeon Lee et al.

ICLR 2025posterarXiv:2402.18153
28
citations
#1029

NoveltyBench: Evaluating Language Models for Humanlike Diversity

Yiming Zhang, Harshita Diddee, Susan Holm et al.

COLM 2025paperarXiv:2504.05228
28
citations
#1030

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Keon Lee, Dong Won Kim, Jaehyeon Kim et al.

ICLR 2025posterarXiv:2406.11427
28
citations
#1031

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang, Ziang Zhang, Tianyu Pang et al.

ICML 2025posterarXiv:2412.18605
28
citations
#1032

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

Ming Li, Yongchun Gu, Yi Wang et al.

AAAI 2025paper
27
citations
#1033

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Qirui Chen, Shangzhe Di, Weidi Xie

AAAI 2025paperarXiv:2408.14469
27
citations
#1034

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.

ICLR 2025posterarXiv:2407.07880
27
citations
#1035

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang et al.

COLM 2025paperarXiv:2504.12522
27
citations
#1036

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Daniel Marczak, Simone Magistri, Sebastian Cygert et al.

ICML 2025posterarXiv:2502.04959
27
citations
#1037

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Anian Ruoss, Fabio Pardo, Harris Chan et al.

ICML 2025posterarXiv:2412.01441
27
citations
#1038

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Benjamin Feuer, Micah Goldblum, Teresa Datta et al.

ICLR 2025posterarXiv:2409.15268
27
citations
#1039

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Yu Ying Chiu, Liwei Jiang, Yejin Choi

ICLR 2025oralarXiv:2410.02683
27
citations
#1040

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

Siyu Xu, Yunke Wang, Chenghao Xia et al.

NEURIPS 2025oralarXiv:2502.02175
27
citations
#1041

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Sicong Leng, Yun Xing, Zesen Cheng et al.

NEURIPS 2025posterarXiv:2410.12787
27
citations
#1042

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Haiping Wang, Yuan Liu, Ziwei Liu et al.

ICCV 2025posterarXiv:2410.16892
27
citations
#1043

Rethinking Reward Modeling in Preference-based Large Language Model Alignment

Hao Sun, Yunyi Shen, Jean-Francois Ton

ICLR 2025poster
27
citations
#1044

Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding

Zhongyi Shui, Jianpeng Zhang, Weiwei Cao et al.

ICLR 2025posterarXiv:2501.14548
27
citations
#1045

Bolt3D: Generating 3D Scenes in Seconds

Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan et al.

ICCV 2025posterarXiv:2503.14445
27
citations
#1046

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NEURIPS 2025oralarXiv:2507.01467
27
citations
#1047

Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free

Ziyue Li, Tianyi Zhou

ICLR 2025posterarXiv:2410.10814
27
citations
#1048

DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting

Hyunwoo Park, Gun Ryu, Wonjun Kim

CVPR 2025posterarXiv:2504.00773
27
citations
#1049

PersonalLLM: Tailoring LLMs to Individual Preferences

Thomas Zollo, Andrew Siah, Naimeng Ye et al.

ICLR 2025posterarXiv:2409.20296
27
citations
#1050

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Minjie Zhu, Yichen Zhu, Ning Liu et al.

AAAI 2025paperarXiv:2403.06199
27
citations
#1051

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

Yiqun Chen, Lingyong Yan, Weiwei Sun et al.

NEURIPS 2025posterarXiv:2501.15228
27
citations
#1052

Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Laura Leal-Taixe

CVPR 2025highlightarXiv:2501.14914
27
citations
#1053

Fast Feedforward 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Mengyao Li et al.

ICLR 2025posterarXiv:2410.08017
27
citations
#1054

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

Justin Deschenaux, Caglar Gulcehre

ICLR 2025posterarXiv:2410.21035
27
citations
#1055

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Yuhao Wang, Yang Liu, Aihua Zheng et al.

AAAI 2025paperarXiv:2412.10650
27
citations
#1056

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.

NEURIPS 2025posterarXiv:2505.21523
27
citations
#1057

Perception-Guided Jailbreak Against Text-to-Image Models

Yihao Huang, Le Liang, Tianlin Li et al.

AAAI 2025paperarXiv:2408.10848
27
citations
#1058

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning

Hao Chen, Jiaming Liu, Chenyang Gu et al.

NEURIPS 2025poster
27
citations
#1059

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Mingzhen Sun, Weining Wang, Li et al.

CVPR 2025posterarXiv:2503.07418
27
citations
#1060

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat et al.

CVPR 2025posterarXiv:2401.05779
27
citations
#1061

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Ying Chen, Guoan Wang, Yuanfeng Ji et al.

CVPR 2025posterarXiv:2410.11761
27
citations
#1062

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Zimu Lu, Aojun Zhou, Ke Wang et al.

ICLR 2025posterarXiv:2410.08196
27
citations
#1063

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Peiyan Li, Yixiang Chen, Hongtao Wu et al.

NEURIPS 2025posterarXiv:2506.07961
27
citations
#1064

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

CVPR 2025highlightarXiv:2410.03665
27
citations
#1065

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Zhenyu Pan, Haozheng Luo, Manling Li et al.

ICLR 2025posterarXiv:2403.17359
27
citations
#1066

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.

NEURIPS 2025spotlightarXiv:2504.15376
27
citations
#1067

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu, Zonghui Li, Xinting Hu et al.

NEURIPS 2025posterarXiv:2505.16707
27
citations
#1068

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Shi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.

NEURIPS 2025posterarXiv:2504.16074
27
citations
#1069

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Zhicheng YANG, Yiwei Wang, Yinya Huang et al.

ICLR 2025posterarXiv:2407.09887
27
citations
#1070

Theoretical Benefit and Limitation of Diffusion Language Model

Guhao Feng, Yihan Geng, Jian Guan et al.

NEURIPS 2025posterarXiv:2502.09622
27
citations
#1071

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Xiaoshuai Song, Muxi Diao, Guanting Dong et al.

ICLR 2025posterarXiv:2406.08587
27
citations
#1072

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.

ICLR 2025posterarXiv:2409.14989
27
citations
#1073

Chain-of-Retrieval Augmented Generation

Liang Wang, Haonan Chen, Nan Yang et al.

NEURIPS 2025posterarXiv:2501.14342
27
citations
#1074

How to build a consistency model: Learning flow maps via self-distillation

Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden

NEURIPS 2025posterarXiv:2505.18825
27
citations
#1075

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

Cheng Sun, Jaesung Choe, Charles Loop et al.

CVPR 2025posterarXiv:2412.04459
27
citations
#1076

Language-Image Models with 3D Understanding

Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.

ICLR 2025posterarXiv:2405.03685
27
citations
#1077

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

ICLR 2025posterarXiv:2410.11325
27
citations
#1078

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye et al.

CVPR 2025posterarXiv:2411.12814
27
citations
#1079

Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection

Lichen Bai, Shitong Shao, zikai zhou et al.

ICLR 2025posterarXiv:2412.10891
26
citations
#1080

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Muhammad Gohar Javed, chuan guo, Li Cheng et al.

ICLR 2025oralarXiv:2410.10010
26
citations
#1081

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.

NEURIPS 2025posterarXiv:2503.01822
26
citations
#1082

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.17017
26
citations
#1083

VideoGigaGAN: Towards Detail-rich Video Super-Resolution

Yiran Xu, Taesung Park, Richard Zhang et al.

CVPR 2025posterarXiv:2404.12388
26
citations
#1084

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Hongyan Zhi, Peihao Chen, Junyan Li et al.

CVPR 2025posterarXiv:2412.01292
26
citations
#1085

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Lawrence Jang, Yinheng Li, Dan Zhao et al.

ICLR 2025posterarXiv:2410.19100
26
citations
#1086

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

ICLR 2025posterarXiv:2406.00434
26
citations
#1087

CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression

Yu-Ting Zhan, Cheng-Yuan Ho, He-Bi Yang et al.

ICLR 2025posterarXiv:2503.00357
26
citations
#1088

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Laura Ruis, Maximilian Mozes, Juhan Bae et al.

ICLR 2025posterarXiv:2411.12580
26
citations
#1089

Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond

Qizhou Wang, Jin Zhou, (Andrew) Zhanke Zhou et al.

ICLR 2025posterarXiv:2502.19301
26
citations
#1090

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

Ziyang Wu, Tianjiao Ding, Yifu Lu et al.

ICLR 2025posterarXiv:2412.17810
26
citations
#1091

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Jinjin Zhang, qiuyu Huang, Junjie Liu et al.

CVPR 2025posterarXiv:2503.18352
26
citations
#1092

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen et al.

AAAI 2025paperarXiv:2408.17267
26
citations
#1093

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen et al.

ICLR 2025posterarXiv:2503.00535
26
citations
#1094

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

CVPR 2025posterarXiv:2412.01169
26
citations
#1095

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.

ICLR 2025posterarXiv:2410.03524
26
citations
#1096

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang et al.

ICLR 2025posterarXiv:2412.09349
26
citations
#1097

DeFoG: Discrete Flow Matching for Graph Generation

Yiming Qin, Manuel Madeira, Dorina Thanou et al.

ICML 2025oralarXiv:2410.04263
26
citations
#1098

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025posterarXiv:2412.03324
26
citations
#1099

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu et al.

ICML 2025posterarXiv:2501.16975
26
citations
#1100

Frequency Dynamic Convolution for Dense Image Prediction

Linwei Chen, Lin Gu, Liang Li et al.

CVPR 2025posterarXiv:2503.18783
26
citations
#1101

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le et al.

AAAI 2025paperarXiv:2412.19663
26
citations
#1102

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025posterarXiv:2405.15143
26
citations
#1103

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025posterarXiv:2407.15811
26
citations
#1104

PhysGen3D: Crafting a Miniature Interactive World from a Single Image

Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.

CVPR 2025posterarXiv:2503.20746
26
citations
#1105

DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance

Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.

AAAI 2025paperarXiv:2406.18459
26
citations
#1106

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang SHEN, Yue Li, Desong Meng et al.

ICLR 2025posterarXiv:2407.00132
26
citations
#1107

InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences

Hongkai Zheng, Wenda Chu, Bingliang Zhang et al.

ICLR 2025posterarXiv:2503.11043
26
citations
#1108

Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement

Dominik Grimm, Jonathan Pirnay

ICLR 2025posterarXiv:2403.15180
26
citations
#1109

The Superposition of Diffusion Models Using the Itô Density Estimator

Marta Skreta, Lazar Atanackovic, Joey Bose et al.

ICLR 2025posterarXiv:2412.17762
26
citations
#1110

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Weifeng Lin, Xinyu Wei, Renrui Zhang et al.

ICLR 2025posterarXiv:2409.15278
26
citations
#1111

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025posterarXiv:2502.07350
26
citations
#1112

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Jingbo Yang, Bairu Hou, Wei Wei et al.

NEURIPS 2025posterarXiv:2502.16002
26
citations
#1113

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Yiren Song, Danze Chen, Mike Zheng Shou

ICCV 2025posterarXiv:2502.01105
26
citations
#1114

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025posterarXiv:2409.03137
26
citations
#1115

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025posterarXiv:2503.01210
26
citations
#1116

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu, Wenzhao Zheng, Jie Zhou et al.

NEURIPS 2025posterarXiv:2507.02863
26
citations
#1117

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108
25
citations
#1118

ICLR: In-Context Learning of Representations

Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.

ICLR 2025posterarXiv:2501.00070
25
citations
#1119

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2025posterarXiv:2411.15241
25
citations
#1120

Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models

Lucio La Cava, Andrea Tagarelli

AAAI 2025paperarXiv:2401.07115
25
citations
#1121

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.

COLM 2025paperarXiv:2504.05108
25
citations
#1122

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi, Penghui Ruan, Marco Chen et al.

NEURIPS 2025posterarXiv:2502.06734
25
citations
#1123

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025highlightarXiv:2412.19238
25
citations
#1124

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Kyungmin Lee, Xiaohang Li, Qifei Wang et al.

CVPR 2025posterarXiv:2502.02588
25
citations
#1125

Weight ensembling improves reasoning in language models

Xingyu Dang, Christina Baek, Kaiyue Wen et al.

COLM 2025paperarXiv:2504.10478
25
citations
#1126

Epona: Autoregressive Diffusion World Model for Autonomous Driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu et al.

ICCV 2025posterarXiv:2506.24113
25
citations
#1127

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025posterarXiv:2505.19591
25
citations
#1128

Generating CAD Code with Vision-Language Models for 3D Designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.

ICLR 2025posterarXiv:2410.05340
25
citations
#1129

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani, Jason Lee, Alberto Bietti

ICLR 2025posterarXiv:2412.06538
25
citations
#1130

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Cong Chen, Mingyu Liu, Chenchen Jing et al.

ICLR 2025posterarXiv:2503.06486
25
citations
#1131

Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer

ICLR 2025posterarXiv:2406.18382
25
citations
#1132

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen et al.

ICLR 2025posterarXiv:2404.07206
25
citations
#1133

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.

ICLR 2025posterarXiv:2410.08134
25
citations
#1134

Moral Alignment for LLM Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

ICLR 2025oralarXiv:2410.01639
25
citations
#1135

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.

ICCV 2025highlightarXiv:2411.19325
25
citations
#1136

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025posterarXiv:2411.17820
25
citations
#1137

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

Dongping Chen, Yue Huang, Siyuan Wu et al.

ICLR 2025oralarXiv:2406.10819
25
citations
#1138

Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case

Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.

NEURIPS 2025oralarXiv:2208.14960
25
citations
#1139

Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning

Hyun Ryu, Gyeongman Kim, Hyemin S. Lee et al.

ICLR 2025posterarXiv:2410.08047
25
citations
#1140

ResearchTown: Simulator of Human Research Community

Haofei Yu, Zhaochen Hong, Zirui Cheng et al.

ICML 2025posterarXiv:2412.17767
25
citations
#1141

A Formal Framework for Understanding Length Generalization in Transformers

Xinting Huang, Andy Yang, Satwik Bhattamishra et al.

ICLR 2025posterarXiv:2410.02140
25
citations
#1142

Interleaved-Modal Chain-of-Thought

Jun Gao, Yongqi Li, Ziqiang Cao et al.

CVPR 2025posterarXiv:2411.19488
25
citations
#1143

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.

CVPR 2025highlightarXiv:2412.15214
25
citations
#1144

Grounded Reinforcement Learning for Visual Reasoning

Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.

NEURIPS 2025posterarXiv:2505.23678
25
citations
#1145

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Yuxuan Cai, Jiangning Zhang, Haoyang He et al.

ICCV 2025posterarXiv:2410.16236
25
citations
#1146

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025posterarXiv:2411.17762
25
citations
#1147

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen, Brynn zhao, Haomiao Sun et al.

NEURIPS 2025posterarXiv:2506.21416
25
citations
#1148

Modifying Large Language Model Post-Training for Diverse Creative Writing

John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele et al.

COLM 2025paperarXiv:2503.17126
25
citations
#1149

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.

ICCV 2025posterarXiv:2406.14240
25
citations
#1150

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.

CVPR 2025posterarXiv:2501.06693
25
citations
#1151

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation

haiyang liu, Xingchao Yang, Tomoya Akiyama et al.

ICLR 2025posterarXiv:2410.04221
25
citations
#1152

AutoPresent: Designing Structured Visuals from Scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.

CVPR 2025posterarXiv:2501.00912
25
citations
#1153

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding

Yunlong Tang, Daiki Shimada, Jing Bi et al.

AAAI 2025paperarXiv:2403.16276
25
citations
#1154

UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models

Xin Xu, Qiyun Xu, Tong Xiao et al.

ICML 2025posterarXiv:2502.00334
25
citations
#1155

Can LLMs Solve Longer Math Word Problems Better?

Xin Xu, Tong Xiao, Zitong Chao et al.

ICLR 2025posterarXiv:2405.14804
25
citations
#1156

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Lue Fan, Hao ZHANG, Qitai Wang et al.

CVPR 2025posterarXiv:2412.03566
25
citations
#1157

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025posterarXiv:2411.09703
25
citations
#1158

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NEURIPS 2025posterarXiv:2505.20411
25
citations
#1159

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes

Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.

ICLR 2025oralarXiv:2501.00602
25
citations
#1160

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic, Xi (Nicole) Zhang, Brandon Amos et al.

ICLR 2025oralarXiv:2408.14608
25
citations
#1161

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng et al.

NEURIPS 2025posterarXiv:2505.20292
25
citations
#1162

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.

NEURIPS 2025posterarXiv:2505.21497
25
citations
#1163

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

ICLR 2025posterarXiv:2412.13795
25
citations
#1164

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.

CVPR 2025posterarXiv:2504.09795
25
citations
#1165

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Belinda Mo, Kyssen Yu, Joshua Kazdan et al.

NEURIPS 2025posterarXiv:2502.09956
25
citations
#1166

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025posterarXiv:2501.12612
25
citations
#1167

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

Xiaofeng Wang, Kang Zhao, Feng Liu et al.

NEURIPS 2025posterarXiv:2411.08380
25
citations
#1168

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025posterarXiv:2406.19353
25
citations
#1169

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu, Yihang Zhu, Yunao Huang et al.

CVPR 2025posterarXiv:2412.03142
25
citations
#1170

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Siyuan Liang, Jiawei Liang, Tianyu Pang et al.

CVPR 2025posterarXiv:2406.18844
25
citations
#1171

An Intelligent Agentic System for Complex Image Restoration Problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.

ICLR 2025posterarXiv:2410.17809
25
citations
#1172

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NEURIPS 2025posterarXiv:2507.15857
25
citations
#1173

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025posterarXiv:2503.20188
25
citations
#1174

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Zizheng Pan, Bohan Zhuang, De-An Huang et al.

ICLR 2025posterarXiv:2402.14167
25
citations
#1175

Hyper-Connections

Defa Zhu, Hongzhi Huang, Zihao Huang et al.

ICLR 2025posterarXiv:2409.19606
25
citations
#1176

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Zihan Zheng, Zerui Cheng, Zeyu Shen et al.

NEURIPS 2025posterarXiv:2506.11928
25
citations
#1177

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

Xin Yi, Shunfan Zheng, Linlin Wang et al.

AAAI 2025paperarXiv:2412.12497
25
citations
#1178

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Bin Chen, Gehui Li, Rongyuan Wu et al.

CVPR 2025posterarXiv:2411.13383
25
citations
#1179

Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation

Chengwen Qi, Ren Ma, Bowen Li et al.

ICLR 2025posterarXiv:2502.06563
25
citations
#1180

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.

ICLR 2025posterarXiv:2410.03415
25
citations
#1181

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi et al.

CVPR 2025posterarXiv:2503.01113
24
citations
#1182

miniCTX: Neural Theorem Proving with (Long-)Contexts

Jiewen Hu, Thomas Zhu, Sean Welleck

ICLR 2025posterarXiv:2408.03350
24
citations
#1183

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

Fengxiang Wang, hongzhen wang, Zonghao Guo et al.

CVPR 2025highlightarXiv:2503.23771
24
citations
#1184

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025posterarXiv:2503.10080
24
citations
#1185

SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks

Meng Lou, Yunxiang Fu, Yizhou Yu

AAAI 2025paperarXiv:2409.09649
24
citations
#1186

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.

COLM 2025paperarXiv:2502.01976
24
citations
#1187

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NEURIPS 2025posterarXiv:2505.24625
24
citations
#1188

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2507.21606
24
citations
#1189

Artificial Kuramoto Oscillatory Neurons

Takeru Miyato, Sindy Löwe, Andreas Geiger et al.

ICLR 2025oralarXiv:2410.13821
24
citations
#1190

Inverse Constitutional AI: Compressing Preferences into Principles

Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.

ICLR 2025posterarXiv:2406.06560
24
citations
#1191

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.

CVPR 2025posterarXiv:2405.13637
24
citations
#1192

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Yang Zhou, Xu Gao, Zichong Chen et al.

CVPR 2025posterarXiv:2502.20235
24
citations
#1193

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Xin Liu, Jie Liu, Jie Tang et al.

CVPR 2025posterarXiv:2503.06896
24
citations
#1194

Efficient Online Reinforcement Learning for Diffusion Policy

Haitong Ma, Tianyi Chen, Kai Wang et al.

ICML 2025posterarXiv:2502.00361
24
citations
#1195

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Xiaosen Zheng, Tianyu Pang, Chao Du et al.

ICLR 2025posterarXiv:2410.07137
24
citations
#1196

Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images

Sichen Zhu, Yuchen Zhu, Molei Tao et al.

ICLR 2025posterarXiv:2501.15598
24
citations
#1197

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

CVPR 2025posterarXiv:2405.17403
24
citations
#1198

Results of the Big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.

NEURIPS 2025posterarXiv:2409.17424
24
citations
#1199

Self-Adapting Language Models

Adam Zweiger, Jyo Pari, Han Guo et al.

NEURIPS 2025posterarXiv:2506.10943
24
citations
#1200

Energy-Weighted Flow Matching for Offline Reinforcement Learning

Shiyuan Zhang, Weitong Zhang, Quanquan Gu

ICLR 2025posterarXiv:2503.04975
24
citations