Most Cited 2025 "real-image dataset" Papers

22,274 papers found • Page 9 of 112

#1601

PersonalLLM: Tailoring LLMs to Individual Preferences

Thomas Zollo, Andrew Siah, Naimeng Ye et al.

ICLR 2025arXiv:2409.20296
28
citations
#1602

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Yunzhen Feng, Elvis Dohmatob, Pu Yang et al.

ICLR 2025arXiv:2406.07515
28
citations
#1603

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat et al.

CVPR 2025arXiv:2401.05779
28
citations
#1604

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.

ICLR 2025arXiv:2410.07815
28
citations
#1605

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Tejas Karkhanis

NEURIPS 2025spotlightarXiv:2505.15201
28
citations
#1606

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.

CVPR 2025arXiv:2501.06693
28
citations
#1607

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025arXiv:2503.21483
28
citations
#1608

NoveltyBench: Evaluating Language Models for Humanlike Diversity

Yiming Zhang, Harshita Diddee, Susan Holm et al.

COLM 2025paperarXiv:2504.05228
28
citations
#1609

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.

NEURIPS 2025spotlightarXiv:2505.11423
28
citations
#1610

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Zimu Lu, Aojun Zhou, Ke Wang et al.

ICLR 2025arXiv:2410.08196
28
citations
#1611

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park, Jeehye Na, Jinyoung Kim et al.

NEURIPS 2025arXiv:2506.07464
28
citations
#1612

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Liang Pan, Zeshi Yang, Zhiyang Dou et al.

CVPR 2025arXiv:2503.19901
28
citations
#1613

Uncovering Overfitting in Large Language Model Editing

Mengqi Zhang, Xiaotian Ye, Qiang Liu et al.

ICLR 2025arXiv:2410.07819
28
citations
#1614

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization

Zechun Liu, Changsheng Zhao, Hanxian Huang et al.

NEURIPS 2025arXiv:2502.02631
28
citations
#1615

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Jingjing Chang, Yixiao Fang, Peng Xing et al.

NEURIPS 2025arXiv:2506.07977
28
citations
#1616

Star Attention: Efficient LLM Inference over Long Sequences

Shantanu Acharya, Fei Jia, Boris Ginsburg

ICML 2025arXiv:2411.17116
28
citations
#1617

Chain-of-Retrieval Augmented Generation

Liang Wang, Haonan Chen, Nan Yang et al.

NEURIPS 2025arXiv:2501.14342
28
citations
#1618

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

Xin Yi, Shunfan Zheng, Linlin Wang et al.

AAAI 2025paperarXiv:2412.12497
28
citations
#1619

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.

NEURIPS 2025spotlightarXiv:2504.15376
28
citations
#1620

A Geometric Framework for Understanding Memorization in Generative Models

Brendan Ross, Hamidreza Kamkari, Tongzi Wu et al.

ICLR 2025arXiv:2411.00113
28
citations
#1621

The Superposition of Diffusion Models Using the Itô Density Estimator

Marta Skreta, Lazar Atanackovic, Joey Bose et al.

ICLR 2025arXiv:2412.17762
28
citations
#1622

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

Siqi Kou, Jiachun Jin, Zhihong Liu et al.

ICML 2025arXiv:2412.00127
28
citations
#1623

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

Zijian Liu, Zhengyuan Zhou

ICLR 2025arXiv:2412.19529
28
citations
#1624

LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani et al.

ICML 2025oralarXiv:2504.10415
28
citations
#1625

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu, Wenzhao Zheng, Jie Zhou et al.

NEURIPS 2025arXiv:2507.02863
28
citations
#1626

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Keon Lee, Dong Won Kim, Jaehyeon Kim et al.

ICLR 2025arXiv:2406.11427
28
citations
#1627

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Belinda Mo, Kyssen Yu, Joshua Kazdan et al.

NEURIPS 2025arXiv:2502.09956
28
citations
#1628

MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang, Di Kang, Heyi Sun et al.

CVPR 2025arXiv:2404.19026
28
citations
#1629

CREAM: Consistency Regularized Self-Rewarding Language Models

Zhaoyang Wang, Weilei He, Zhiyuan Liang et al.

ICLR 2025arXiv:2410.12735
28
citations
#1630

Subspace Optimization for Large Language Models with Convergence Guarantees

Yutong He, Pengrui Li, Yipeng Hu et al.

ICML 2025arXiv:2410.11289
28
citations
#1631

Improving Uncertainty Estimation through Semantically Diverse Language Generation

Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi et al.

ICLR 2025arXiv:2406.04306
28
citations
#1632

AutoPresent: Designing Structured Visuals from Scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.

CVPR 2025arXiv:2501.00912
28
citations
#1633

Diffusion-based Neural Network Weights Generation

Bedionita Soro, Bruno Andreis, Hayeon Lee et al.

ICLR 2025arXiv:2402.18153
28
citations
#1634

From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams, Liam Bai, Minji Lee et al.

ICML 2025spotlight
28
citations
#1635

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang SHEN, Yue Li, Desong Meng et al.

ICLR 2025arXiv:2407.00132
28
citations
#1636

Scalable Equilibrium Sampling with Sequential Boltzmann Generators

Charlie Tan, Joey Bose, Chen Lin et al.

ICML 2025arXiv:2502.18462
28
citations
#1637

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2025arXiv:2411.15241
27
citations
#1638

Contrastive Localized Language-Image Pre-Training

Hong-You Chen, Zhengfeng Lai, Haotian Zhang et al.

ICML 2025arXiv:2410.02746
27
citations
#1639

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems

Ruochen Jiao, Shaoyuan Xie, Justin Yue et al.

ICLR 2025arXiv:2405.20774
27
citations
#1640

Efficient Visual State Space Model for Image Deblurring

Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.

CVPR 2025arXiv:2405.14343
27
citations
#1641

Language-Image Models with 3D Understanding

Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.

ICLR 2025arXiv:2405.03685
27
citations
#1642

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Qirui Chen, Shangzhe Di, Weidi Xie

AAAI 2025paperarXiv:2408.14469
27
citations
#1643

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

Siyu Xu, Yunke Wang, Chenghao Xia et al.

NEURIPS 2025oralarXiv:2502.02175
27
citations
#1644

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang et al.

NEURIPS 2025oralarXiv:2505.23656
27
citations
#1645

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.

ICLR 2025arXiv:2410.08134
27
citations
#1646

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Jinjin Zhang, qiuyu Huang, Junjie Liu et al.

CVPR 2025arXiv:2503.18352
27
citations
#1647

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

ICLR 2025arXiv:2406.00434
27
citations
#1648

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Muhammad Gohar Javed, chuan guo, Li Cheng et al.

ICLR 2025oralarXiv:2410.10010
27
citations
#1649

Frequency Dynamic Convolution for Dense Image Prediction

Linwei Chen, Lin Gu, Liang Li et al.

CVPR 2025arXiv:2503.18783
27
citations
#1650

Results of the Big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.

NEURIPS 2025arXiv:2409.17424
27
citations
#1651

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025arXiv:2411.17820
27
citations
#1652

Multi-Robot Motion Planning with Diffusion Models

Yorai Shaoul, Itamar Mishani, Shivam Vats et al.

ICLR 2025arXiv:2410.03072
27
citations
#1653

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Rui Xie, Yinhong Liu, Penghao Zhou et al.

ICCV 2025arXiv:2501.02976
27
citations
#1654

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

CVPR 2025arXiv:2412.01169
27
citations
#1655

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Minjie Zhu, Yichen Zhu, Ning Liu et al.

AAAI 2025paperarXiv:2403.06199
27
citations
#1656

MuPT: A Generative Symbolic Music Pretrained Transformer

Xingwei Qu, yuelin bai, Yinghao MA et al.

ICLR 2025arXiv:2404.06393
27
citations
#1657

Generating CAD Code with Vision-Language Models for 3D Designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.

ICLR 2025arXiv:2410.05340
27
citations
#1658

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

Ming Li, Yongchun Gu, Yi Wang et al.

AAAI 2025paper
27
citations
#1659

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

Yiming Wang, Pei Zhang, Jialong Tang et al.

NEURIPS 2025arXiv:2504.18428
27
citations
#1660

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.

ICLR 2025arXiv:2410.03524
27
citations
#1661

Rethinking Reward Modeling in Preference-based Large Language Model Alignment

Hao Sun, Yunyi Shen, Jean-Francois Ton

ICLR 2025
27
citations
#1662

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Mingzhen Sun, Weining Wang, Li et al.

CVPR 2025arXiv:2503.07418
27
citations
#1663

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025arXiv:2412.03324
27
citations
#1664

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025arXiv:2405.15143
27
citations
#1665

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.

ICCV 2025highlightarXiv:2411.19325
27
citations
#1666

TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting

Yifan Hu, Guibin Zhang, Peiyuan Liu et al.

ICML 2025oralarXiv:2501.13041
27
citations
#1667

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le et al.

AAAI 2025paperarXiv:2412.19663
27
citations
#1668

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi et al.

CVPR 2025highlightarXiv:2411.17646
27
citations
#1669

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Kyle Sargent, Kyle Hsu, Justin Johnson et al.

ICCV 2025arXiv:2503.11056
27
citations
#1670

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning

Hao Chen, Jiaming Liu, Chenyang Gu et al.

NEURIPS 2025
27
citations
#1671

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.

ICML 2025arXiv:2502.03714
27
citations
#1672

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Yun Zhu, Jia-Chen Gu, Caitlin Sikora et al.

ICLR 2025arXiv:2405.16178
27
citations
#1673

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen et al.

ICLR 2025arXiv:2503.00535
27
citations
#1674

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Yuhao Wang, Yang Liu, Aihua Zheng et al.

AAAI 2025paperarXiv:2412.10650
27
citations
#1675

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.

CVPR 2025arXiv:2501.02955
27
citations
#1676

ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

Jiawen Yu, Hairuo Liu, Qiaojun Yu et al.

NEURIPS 2025arXiv:2505.22159
27
citations
#1677

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?

Yanchen Xu, Siqi Huang, Hongyuan Zhang et al.

AAAI 2025paperarXiv:2412.08128
27
citations
#1678

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025arXiv:2411.17762
27
citations
#1679

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.

CVPR 2025highlightarXiv:2502.07601
27
citations
#1680

VideoGigaGAN: Towards Detail-rich Video Super-Resolution

Yiran Xu, Taesung Park, Richard Zhang et al.

CVPR 2025arXiv:2404.12388
27
citations
#1681

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Wei Shen, Guanlin Liu, Yu Yue et al.

NEURIPS 2025arXiv:2503.22230
27
citations
#1682

Perception-Guided Jailbreak Against Text-to-Image Models

Yihao Huang, Le Liang, Tianlin Li et al.

AAAI 2025paperarXiv:2408.10848
27
citations
#1683

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Yuxuan Cai, Jiangning Zhang, Haoyang He et al.

ICCV 2025arXiv:2410.16236
27
citations
#1684

EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

Zexuan Yan, Yue Ma, Chang Zou et al.

ICCV 2025arXiv:2503.10270
27
citations
#1685

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer

Lei Su, Xiaochen Ma, Xuekang Zhu et al.

AAAI 2025paperarXiv:2412.14598
27
citations
#1686

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.

ICCV 2025arXiv:2406.14240
27
citations
#1687

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025arXiv:2409.03137
27
citations
#1688

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.

ICLR 2025arXiv:2410.17146
27
citations
#1689

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

CVPR 2025highlightarXiv:2410.03665
27
citations
#1690

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang et al.

COLM 2025paperarXiv:2504.12522
27
citations
#1691

KV-Edit: Training-Free Image Editing for Precise Background Preservation

Tianrui Zhu, Shiyi Zhang, Jiawei Shao et al.

ICCV 2025arXiv:2502.17363
27
citations
#1692

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng et al.

NEURIPS 2025arXiv:2505.20292
27
citations
#1693

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.

CVPR 2025arXiv:2504.09795
27
citations
#1694

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes

Hengwei Bian, Lingdong Kong, Haozhe Xie et al.

ICLR 2025arXiv:2410.18084
27
citations
#1695

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025arXiv:2501.12612
27
citations
#1696

Reward Guided Latent Consistency Distillation

William Wang, Jiachen Li, Weixi Feng et al.

ICLR 2025arXiv:2403.11027
27
citations
#1697

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov et al.

ICLR 2025arXiv:2407.15018
27
citations
#1698

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Xiaoshuai Song, Muxi Diao, Guanting Dong et al.

ICLR 2025arXiv:2406.08587
27
citations
#1699

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi, Penghui Ruan, Marco Chen et al.

NEURIPS 2025arXiv:2502.06734
27
citations
#1700

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

Cheng Sun, Jaesung Choe, Charles Loop et al.

CVPR 2025arXiv:2412.04459
27
citations
#1701

Autoformulation of Mathematical Optimization Models Using LLMs

Nicolás Astorga, Tennison Liu, Yuanzhang Xiao et al.

ICML 2025arXiv:2411.01679
27
citations
#1702

EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration

Allen Nie, Yi Su, Bo Chang et al.

ICML 2025arXiv:2410.06238
27
citations
#1703

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye et al.

CVPR 2025arXiv:2411.12814
27
citations
#1704

Training a Scientific Reasoning Model for Chemistry

Siddharth Narayanan, James Braza, Ryan-Rhys Griffiths et al.

NEURIPS 2025arXiv:2506.17238
26
citations
#1705

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108
26
citations
#1706

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NEURIPS 2025arXiv:2507.15857
26
citations
#1707

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Thomas Robert, Mher Safaryan, Ionut-Vlad Modoranu et al.

ICLR 2025arXiv:2410.16103
26
citations
#1708

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

Fengxiang Wang, hongzhen wang, Zonghao Guo et al.

CVPR 2025highlightarXiv:2503.23771
26
citations
#1709

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025arXiv:2503.10080
26
citations
#1710

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Kyungmin Lee, Xiaohang Li, Qifei Wang et al.

CVPR 2025arXiv:2502.02588
26
citations
#1711

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025highlightarXiv:2412.19238
26
citations
#1712

Concept Bottleneck Large Language Models

Chung-En Sun, Tuomas Oikarinen, Berk Ustun et al.

ICLR 2025arXiv:2412.07992
26
citations
#1713

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025arXiv:2411.09703
26
citations
#1714

Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing

Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.

AAAI 2025paperarXiv:2501.04376
26
citations
#1715

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen et al.

ICLR 2025arXiv:2404.07206
26
citations
#1716

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.

CVPR 2025highlightarXiv:2412.15214
26
citations
#1717

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.17017
26
citations
#1718

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Lawrence Jang, Yinheng Li, Dan Zhao et al.

ICLR 2025arXiv:2410.19100
26
citations
#1719

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.

ICLR 2025arXiv:2410.03415
26
citations
#1720

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

Guosheng Zhao, Xiaofeng Wang, Chaojun Ni et al.

ICCV 2025arXiv:2503.18438
26
citations
#1721

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

Shunlin Lu, Jingbo Wang, Zeyu Lu et al.

CVPR 2025arXiv:2412.14559
26
citations
#1722

Inverse Constitutional AI: Compressing Preferences into Principles

Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.

ICLR 2025arXiv:2406.06560
26
citations
#1723

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

Teng Xiao, Yige Yuan, Zhengyu Chen et al.

ICLR 2025arXiv:2502.00883
26
citations
#1724

Moral Alignment for LLM Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

ICLR 2025oralarXiv:2410.01639
26
citations
#1725

Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models

Lucio La Cava, Andrea Tagarelli

AAAI 2025paperarXiv:2401.07115
26
citations
#1726

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Xin Liu, Jie Liu, Jie Tang et al.

CVPR 2025arXiv:2503.06896
26
citations
#1727

EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities

Talor Abramovich, Meet Udeshi, Minghao Shao et al.

ICML 2025arXiv:2409.16165
26
citations
#1728

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.

ICCV 2025arXiv:2504.01724
26
citations
#1729

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen et al.

AAAI 2025paperarXiv:2408.17267
26
citations
#1730

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

Ke Ji, Jiahao Xu, Tian Liang et al.

NEURIPS 2025arXiv:2503.02875
26
citations
#1731

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

Haonan Han, Rui Yang, Huan Liao et al.

ICCV 2025arXiv:2405.18525
26
citations
#1732

DisCo: Graph-Based Disentangled Contrastive Learning for Cold-Start Cross-Domain Recommendation

Hourun Li, Yifan Wang, Zhiping Xiao et al.

AAAI 2025paperarXiv:2412.15005
26
citations
#1733

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025arXiv:2503.20188
26
citations
#1734

GenMol: A Drug Discovery Generalist with Discrete Diffusion

Seul Lee, Karsten Kreis, Srimukh Veccham et al.

ICML 2025arXiv:2501.06158
26
citations
#1735

R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO

Huanjin Yao, Qixiang Yin, Jingyi Zhang et al.

NEURIPS 2025arXiv:2505.16673
26
citations
#1736

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan, Shurong Zheng, Yousong Zhu et al.

ICCV 2025arXiv:2403.09333
26
citations
#1737

PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization

André Hottung, Mridul Mahajan, Kevin Tierney

ICLR 2025arXiv:2402.14048
26
citations
#1738

AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies

Jian Gao, Weidong Cao, Junyi Yang et al.

ICLR 2025arXiv:2503.00205
26
citations
#1739

Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning

Jinlong Pang, Na Di, Zhaowei Zhu et al.

ICML 2025arXiv:2502.01968
26
citations
#1740

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Alejandro Lozano, Min Woo Sun, James Burgess et al.

CVPR 2025arXiv:2501.07171
26
citations
#1741

FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error

Beilin Chu, Xuan Xu, Xin Wang et al.

CVPR 2025arXiv:2412.07140
26
citations
#1742

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Guanyu Zhou, Yibo Yan, Xin Zou et al.

ICLR 2025arXiv:2410.04780
26
citations
#1743

Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Laura Leal-Taixe

CVPR 2025highlightarXiv:2501.14914
26
citations
#1744

Discrete Copula Diffusion

Anji Liu, Oliver Broadrick, Mathias Niepert et al.

ICLR 2025arXiv:2410.01949
26
citations
#1745

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang et al.

ICLR 2025arXiv:2412.09349
26
citations
#1746

Robust Tracking via Mamba-based Context-aware Token Learning

Jinxia Xie, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2412.13611
26
citations
#1747

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

Guangqi Jiang, Yifei Sun, Tao Huang et al.

ICLR 2025arXiv:2410.22325
26
citations
#1748

Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo

João Loula, Benjamin LeBrun, Li Du et al.

ICLR 2025arXiv:2504.13139
26
citations
#1749

Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning

Mingyang Chen, sunhaoze, Tianpeng Li et al.

ICLR 2025arXiv:2410.12952
26
citations
#1750

ResearchTown: Simulator of Human Research Community

Haofei Yu, Zhaochen Hong, Zirui Cheng et al.

ICML 2025arXiv:2412.17767
26
citations
#1751

Data Selection via Optimal Control for Language Models

Yuxian Gu, Li Dong, Hongning Wang et al.

ICLR 2025arXiv:2410.07064
26
citations
#1752

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.

ICCV 2025arXiv:2504.08736
26
citations
#1753

Language Representations Can be What Recommenders Need: Findings and Potentials

Leheng Sheng, An Zhang, Yi Zhang et al.

ICLR 2025arXiv:2407.05441
26
citations
#1754

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Le Yang, Ziwei Zheng, Boxu Chen et al.

CVPR 2025arXiv:2412.13817
26
citations
#1755

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Renqiu Xia, mingsheng li, Hancheng Ye et al.

ICLR 2025arXiv:2412.11863
26
citations
#1756

Scaling Laws for Native Multimodal Models

Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa et al.

ICCV 2025arXiv:2504.07951
26
citations
#1757

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Canyu Zhao, Yanlong Sun, Mingyu Liu et al.

NEURIPS 2025spotlightarXiv:2502.17157
26
citations
#1758

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025arXiv:2407.15811
26
citations
#1759

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025arXiv:2503.01210
26
citations
#1760

Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent

Yongxian Wei, Anke Tang, Li Shen et al.

ICML 2025arXiv:2501.01230
26
citations
#1761

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

ICLR 2025arXiv:2412.13795
26
citations
#1762

Underdamped Diffusion Bridges with Applications to Sampling

Denis Blessing, Julius Berner, Lorenz Richter et al.

ICLR 2025arXiv:2503.01006
26
citations
#1763

ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Jiaru Zou, Ling Yang, Jingwen Gu et al.

NEURIPS 2025arXiv:2506.18896
26
citations
#1764

TAPIP3D: Tracking Any Point in Persistent 3D Geometry

Bowei Zhang, Lei Ke, Adam Harley et al.

NEURIPS 2025oralarXiv:2504.14717
26
citations
#1765

BLADE: Enhancing Black-Box Large Language Models with Small Domain-Specific Models

Haitao Li, Qingyao Ai, Jia Chen et al.

AAAI 2025paperarXiv:2403.18365
26
citations
#1766

Towards Neural Scaling Laws for Time Series Foundation Models

Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.

ICLR 2025arXiv:2410.12360
26
citations
#1767

Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models

Jie Ma, Zhitao Gao, Qi Chai et al.

AAAI 2025paperarXiv:2409.03155
26
citations
#1768

Can LLMs Solve Longer Math Word Problems Better?

Xin Xu, Tong Xiao, Zitong Chao et al.

ICLR 2025arXiv:2405.14804
26
citations
#1769

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

Xiaofeng Wang, Kang Zhao, Feng Liu et al.

NEURIPS 2025arXiv:2411.08380
26
citations
#1770

How Do Large Language Monkeys Get Their Power (Laws)?

Rylan Schaeffer, Joshua Kazdan, John Hughes et al.

ICML 2025oralarXiv:2502.17578
26
citations
#1771

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025arXiv:2406.19353
26
citations
#1772

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu, Yihang Zhu, Yunao Huang et al.

CVPR 2025arXiv:2412.03142
26
citations
#1773

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Siyuan Liang, Jiawei Liang, Tianyu Pang et al.

CVPR 2025arXiv:2406.18844
26
citations
#1774

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

Julien Siems, Timur Carstensen, Arber Zela et al.

NEURIPS 2025arXiv:2502.10297
26
citations
#1775

DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance

Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.

AAAI 2025paperarXiv:2406.18459
26
citations
#1776

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

Rui Ye, shuo tang, Rui Ge et al.

ICML 2025arXiv:2503.03686
26
citations
#1777

An Intelligent Agentic System for Complex Image Restoration Problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.

ICLR 2025arXiv:2410.17809
26
citations
#1778

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification

Wei Li, Renshan Zhang, Rui Shao et al.

NEURIPS 2025arXiv:2508.21046
26
citations
#1779

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Weifeng Lin, Xinyu Wei, Renrui Zhang et al.

ICLR 2025arXiv:2409.15278
26
citations
#1780

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025arXiv:2505.16652
25
citations
#1781

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.

ICLR 2025arXiv:2412.14957
25
citations
#1782

Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine

Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.

AAAI 2025paperarXiv:2412.09278
25
citations
#1783

A Unified Approach to Routing and Cascading for LLMs

Jasper Dekoninck, Maximilian Baader, Martin Vechev

ICML 2025arXiv:2410.10347
25
citations
#1784

Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.

ICML 2025oralarXiv:2405.08719
25
citations
#1785

FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model

Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.

ICLR 2025arXiv:2412.08261
25
citations
#1786

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.

ICCV 2025arXiv:2404.03214
25
citations
#1787

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

CVPR 2025arXiv:2411.17190
25
citations
#1788

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

Kazi Hasan Ibn Arif, JinYi Yoon, Dimitrios S. Nikolopoulos et al.

AAAI 2025paperarXiv:2408.10945
25
citations
#1789

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Yiying Yang, Wei Cheng, Sijin Chen et al.

NEURIPS 2025arXiv:2504.06263
25
citations
#1790

Artificial Kuramoto Oscillatory Neurons

Takeru Miyato, Sindy Löwe, Andreas Geiger et al.

ICLR 2025oralarXiv:2410.13821
25
citations
#1791

LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation

Qidong Liu, Xian Wu, Wanyu Wang et al.

AAAI 2025paperarXiv:2409.19925
25
citations
#1792

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Zixuan Gong, Qi Zhang, Guangyin Bao et al.

AAAI 2025paperarXiv:2404.12630
25
citations
#1793

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

Thomas Kuntz, Agatha Duzan, Hao Zhao et al.

NEURIPS 2025spotlightarXiv:2506.14866
25
citations
#1794

Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case

Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.

NEURIPS 2025oralarXiv:2208.14960
25
citations
#1795

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Yang Zhou, Xu Gao, Zichong Chen et al.

CVPR 2025arXiv:2502.20235
25
citations
#1796

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives

Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka et al.

ICLR 2025arXiv:2502.02723
25
citations
#1797

3D Vision-Language Gaussian Splatting

Qucheng Peng, Benjamin Planche, Zhongpai Gao et al.

ICLR 2025arXiv:2410.07577
25
citations
#1798

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025arXiv:2412.11890
25
citations
#1799

Self-Adapting Language Models

Adam Zweiger, Jyo Pari, Han Guo et al.

NEURIPS 2025arXiv:2506.10943
25
citations
#1800

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

CVPR 2025arXiv:2405.17403
25
citations