Most Cited 2025 "real-image dataset" Papers

22,274 papers found • Page 9 of 112

Filters:Most Cited 2025 real-image dataset Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#1601

PersonalLLM: Tailoring LLMs to Individual Preferences

Thomas Zollo, Andrew Siah, Naimeng Ye et al.

ICLR 2025arXiv:2409.20296

citations

#1602

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Yunzhen Feng, Elvis Dohmatob, Pu Yang et al.

ICLR 2025arXiv:2406.07515

citations

#1603

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat et al.

CVPR 2025arXiv:2401.05779

citations

#1604

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.

ICLR 2025arXiv:2410.07815

citations

#1605

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Tejas Karkhanis

NEURIPS 2025spotlightarXiv:2505.15201

citations

#1606

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.

CVPR 2025arXiv:2501.06693

citations

#1607

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025arXiv:2503.21483

citations

#1608

NoveltyBench: Evaluating Language Models for Humanlike Diversity

Yiming Zhang, Harshita Diddee, Susan Holm et al.

COLM 2025paperarXiv:2504.05228

citations

#1609

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.

NEURIPS 2025spotlightarXiv:2505.11423

citations

#1610

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Zimu Lu, Aojun Zhou, Ke Wang et al.

ICLR 2025arXiv:2410.08196

citations

#1611

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park, Jeehye Na, Jinyoung Kim et al.

NEURIPS 2025arXiv:2506.07464

citations

#1612

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Liang Pan, Zeshi Yang, Zhiyang Dou et al.

CVPR 2025arXiv:2503.19901

citations

#1613

Uncovering Overfitting in Large Language Model Editing

Mengqi Zhang, Xiaotian Ye, Qiang Liu et al.

ICLR 2025arXiv:2410.07819

citations

#1614

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization

Zechun Liu, Changsheng Zhao, Hanxian Huang et al.

NEURIPS 2025arXiv:2502.02631

citations

#1615

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Jingjing Chang, Yixiao Fang, Peng Xing et al.

NEURIPS 2025arXiv:2506.07977

citations

#1616

Star Attention: Efficient LLM Inference over Long Sequences

Shantanu Acharya, Fei Jia, Boris Ginsburg

ICML 2025arXiv:2411.17116

citations

#1617

Chain-of-Retrieval Augmented Generation

Liang Wang, Haonan Chen, Nan Yang et al.

NEURIPS 2025arXiv:2501.14342

citations

#1618

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

Xin Yi, Shunfan Zheng, Linlin Wang et al.

AAAI 2025paperarXiv:2412.12497

citations

#1619

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.

NEURIPS 2025spotlightarXiv:2504.15376

citations

#1620

A Geometric Framework for Understanding Memorization in Generative Models

Brendan Ross, Hamidreza Kamkari, Tongzi Wu et al.

ICLR 2025arXiv:2411.00113

citations

#1621

The Superposition of Diffusion Models Using the Itô Density Estimator

Marta Skreta, Lazar Atanackovic, Joey Bose et al.

ICLR 2025arXiv:2412.17762

citations

#1622

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

Siqi Kou, Jiachun Jin, Zhihong Liu et al.

ICML 2025arXiv:2412.00127

citations

#1623

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

Zijian Liu, Zhengyuan Zhou

ICLR 2025arXiv:2412.19529

citations

#1624

LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani et al.

ICML 2025oralarXiv:2504.10415

citations

#1625

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu, Wenzhao Zheng, Jie Zhou et al.

NEURIPS 2025arXiv:2507.02863

citations

#1626

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Keon Lee, Dong Won Kim, Jaehyeon Kim et al.

ICLR 2025arXiv:2406.11427

citations

#1627

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Belinda Mo, Kyssen Yu, Joshua Kazdan et al.

NEURIPS 2025arXiv:2502.09956

citations

#1628

MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang, Di Kang, Heyi Sun et al.

CVPR 2025arXiv:2404.19026

citations

#1629

CREAM: Consistency Regularized Self-Rewarding Language Models

Zhaoyang Wang, Weilei He, Zhiyuan Liang et al.

ICLR 2025arXiv:2410.12735

citations

#1630

Subspace Optimization for Large Language Models with Convergence Guarantees

Yutong He, Pengrui Li, Yipeng Hu et al.

ICML 2025arXiv:2410.11289

citations

#1631

Improving Uncertainty Estimation through Semantically Diverse Language Generation

Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi et al.

ICLR 2025arXiv:2406.04306

citations

#1632

AutoPresent: Designing Structured Visuals from Scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.

CVPR 2025arXiv:2501.00912

citations

#1633

Diffusion-based Neural Network Weights Generation

Bedionita Soro, Bruno Andreis, Hayeon Lee et al.

ICLR 2025arXiv:2402.18153

citations

#1634

From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams, Liam Bai, Minji Lee et al.

ICML 2025spotlight

citations

#1635

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang SHEN, Yue Li, Desong Meng et al.

ICLR 2025arXiv:2407.00132

citations

#1636

Scalable Equilibrium Sampling with Sequential Boltzmann Generators

Charlie Tan, Joey Bose, Chen Lin et al.

ICML 2025arXiv:2502.18462

citations

#1637

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2025arXiv:2411.15241

citations

#1638

Contrastive Localized Language-Image Pre-Training

Hong-You Chen, Zhengfeng Lai, Haotian Zhang et al.

ICML 2025arXiv:2410.02746

citations

#1639

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems

Ruochen Jiao, Shaoyuan Xie, Justin Yue et al.

ICLR 2025arXiv:2405.20774

citations

#1640

Efficient Visual State Space Model for Image Deblurring

Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.

CVPR 2025arXiv:2405.14343

citations

#1641

Language-Image Models with 3D Understanding

Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.

ICLR 2025arXiv:2405.03685

citations

#1642

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Qirui Chen, Shangzhe Di, Weidi Xie

AAAI 2025paperarXiv:2408.14469

citations

#1643

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

Siyu Xu, Yunke Wang, Chenghao Xia et al.

NEURIPS 2025oralarXiv:2502.02175

citations

#1644

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang et al.

NEURIPS 2025oralarXiv:2505.23656

citations

#1645

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.

ICLR 2025arXiv:2410.08134

citations

#1646

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Jinjin Zhang, qiuyu Huang, Junjie Liu et al.

CVPR 2025arXiv:2503.18352

citations

#1647

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

ICLR 2025arXiv:2406.00434

citations

#1648

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Muhammad Gohar Javed, chuan guo, Li Cheng et al.

ICLR 2025oralarXiv:2410.10010

citations

#1649

Frequency Dynamic Convolution for Dense Image Prediction

Linwei Chen, Lin Gu, Liang Li et al.

CVPR 2025arXiv:2503.18783

citations

#1650

Results of the Big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.

NEURIPS 2025arXiv:2409.17424

citations

#1651

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025arXiv:2411.17820

citations

#1652

Multi-Robot Motion Planning with Diffusion Models

Yorai Shaoul, Itamar Mishani, Shivam Vats et al.

ICLR 2025arXiv:2410.03072

citations

#1653

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Rui Xie, Yinhong Liu, Penghao Zhou et al.

ICCV 2025arXiv:2501.02976

citations

#1654

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

CVPR 2025arXiv:2412.01169

citations

#1655

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Minjie Zhu, Yichen Zhu, Ning Liu et al.

AAAI 2025paperarXiv:2403.06199

citations

#1656

MuPT: A Generative Symbolic Music Pretrained Transformer

Xingwei Qu, yuelin bai, Yinghao MA et al.

ICLR 2025arXiv:2404.06393

citations

#1657

Generating CAD Code with Vision-Language Models for 3D Designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.

ICLR 2025arXiv:2410.05340

citations

#1658

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

Ming Li, Yongchun Gu, Yi Wang et al.

AAAI 2025paper

citations

#1659

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

Yiming Wang, Pei Zhang, Jialong Tang et al.

NEURIPS 2025arXiv:2504.18428

citations

#1660

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.

ICLR 2025arXiv:2410.03524

citations

#1661

Rethinking Reward Modeling in Preference-based Large Language Model Alignment

Hao Sun, Yunyi Shen, Jean-Francois Ton

ICLR 2025

citations

#1662

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Mingzhen Sun, Weining Wang, Li et al.

CVPR 2025arXiv:2503.07418

citations

#1663

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025arXiv:2412.03324

citations

#1664

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025arXiv:2405.15143

citations

#1665

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.

ICCV 2025highlightarXiv:2411.19325

citations

#1666

TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting

Yifan Hu, Guibin Zhang, Peiyuan Liu et al.

ICML 2025oralarXiv:2501.13041

citations

#1667

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le et al.

AAAI 2025paperarXiv:2412.19663

citations

#1668

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi et al.

CVPR 2025highlightarXiv:2411.17646

citations

#1669

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Kyle Sargent, Kyle Hsu, Justin Johnson et al.

ICCV 2025arXiv:2503.11056

citations

#1670

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning

Hao Chen, Jiaming Liu, Chenyang Gu et al.

NEURIPS 2025

citations

#1671

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.

ICML 2025arXiv:2502.03714

citations

#1672

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Yun Zhu, Jia-Chen Gu, Caitlin Sikora et al.

ICLR 2025arXiv:2405.16178

citations

#1673

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen et al.

ICLR 2025arXiv:2503.00535

citations

#1674

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Yuhao Wang, Yang Liu, Aihua Zheng et al.

AAAI 2025paperarXiv:2412.10650

citations

#1675

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.

CVPR 2025arXiv:2501.02955

citations

#1676

ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

Jiawen Yu, Hairuo Liu, Qiaojun Yu et al.

NEURIPS 2025arXiv:2505.22159

citations

#1677

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?

Yanchen Xu, Siqi Huang, Hongyuan Zhang et al.

AAAI 2025paperarXiv:2412.08128

citations

#1678

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025arXiv:2411.17762

citations

#1679

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.

CVPR 2025highlightarXiv:2502.07601

citations

#1680

VideoGigaGAN: Towards Detail-rich Video Super-Resolution

Yiran Xu, Taesung Park, Richard Zhang et al.

CVPR 2025arXiv:2404.12388

citations

#1681

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Wei Shen, Guanlin Liu, Yu Yue et al.

NEURIPS 2025arXiv:2503.22230

citations

#1682

Perception-Guided Jailbreak Against Text-to-Image Models

Yihao Huang, Le Liang, Tianlin Li et al.

AAAI 2025paperarXiv:2408.10848

citations

#1683

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Yuxuan Cai, Jiangning Zhang, Haoyang He et al.

ICCV 2025arXiv:2410.16236

citations

#1684

EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

Zexuan Yan, Yue Ma, Chang Zou et al.

ICCV 2025arXiv:2503.10270

citations

#1685

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer

Lei Su, Xiaochen Ma, Xuekang Zhu et al.

AAAI 2025paperarXiv:2412.14598

citations

#1686

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.

ICCV 2025arXiv:2406.14240

citations

#1687

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025arXiv:2409.03137

citations

#1688

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.

ICLR 2025arXiv:2410.17146

citations

#1689

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

CVPR 2025highlightarXiv:2410.03665

citations

#1690

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang et al.

COLM 2025paperarXiv:2504.12522

citations

#1691

KV-Edit: Training-Free Image Editing for Precise Background Preservation

Tianrui Zhu, Shiyi Zhang, Jiawei Shao et al.

ICCV 2025arXiv:2502.17363

citations

#1692

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng et al.

NEURIPS 2025arXiv:2505.20292

citations

#1693

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.

CVPR 2025arXiv:2504.09795

citations

#1694

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes

Hengwei Bian, Lingdong Kong, Haozhe Xie et al.

ICLR 2025arXiv:2410.18084

citations

#1695

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025arXiv:2501.12612

citations

#1696

Reward Guided Latent Consistency Distillation

William Wang, Jiachen Li, Weixi Feng et al.

ICLR 2025arXiv:2403.11027

citations

#1697

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov et al.

ICLR 2025arXiv:2407.15018

citations

#1698

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Xiaoshuai Song, Muxi Diao, Guanting Dong et al.

ICLR 2025arXiv:2406.08587

citations

#1699

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi, Penghui Ruan, Marco Chen et al.

NEURIPS 2025arXiv:2502.06734

citations

#1700

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

Cheng Sun, Jaesung Choe, Charles Loop et al.

CVPR 2025arXiv:2412.04459

citations

#1701

Autoformulation of Mathematical Optimization Models Using LLMs

Nicolás Astorga, Tennison Liu, Yuanzhang Xiao et al.

ICML 2025arXiv:2411.01679

citations

#1702

EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration

Allen Nie, Yi Su, Bo Chang et al.

ICML 2025arXiv:2410.06238

citations

#1703

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye et al.

CVPR 2025arXiv:2411.12814

citations

#1704

Training a Scientific Reasoning Model for Chemistry

Siddharth Narayanan, James Braza, Ryan-Rhys Griffiths et al.

NEURIPS 2025arXiv:2506.17238

citations

#1705

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108

citations

#1706

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NEURIPS 2025arXiv:2507.15857

citations

#1707

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Thomas Robert, Mher Safaryan, Ionut-Vlad Modoranu et al.

ICLR 2025arXiv:2410.16103

citations

#1708

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

Fengxiang Wang, hongzhen wang, Zonghao Guo et al.

CVPR 2025highlightarXiv:2503.23771

citations

#1709

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025arXiv:2503.10080

citations

#1710

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Kyungmin Lee, Xiaohang Li, Qifei Wang et al.

CVPR 2025arXiv:2502.02588

citations

#1711

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025highlightarXiv:2412.19238

citations

#1712

Concept Bottleneck Large Language Models

Chung-En Sun, Tuomas Oikarinen, Berk Ustun et al.

ICLR 2025arXiv:2412.07992

citations

#1713

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025arXiv:2411.09703

citations

#1714

Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing

Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.

AAAI 2025paperarXiv:2501.04376

citations

#1715

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen et al.

ICLR 2025arXiv:2404.07206

citations

#1716

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.

CVPR 2025highlightarXiv:2412.15214

citations

#1717

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.17017

citations

#1718

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Lawrence Jang, Yinheng Li, Dan Zhao et al.

ICLR 2025arXiv:2410.19100

citations

#1719

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.

ICLR 2025arXiv:2410.03415

citations

#1720

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

Guosheng Zhao, Xiaofeng Wang, Chaojun Ni et al.

ICCV 2025arXiv:2503.18438

citations

#1721

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

Shunlin Lu, Jingbo Wang, Zeyu Lu et al.

CVPR 2025arXiv:2412.14559

citations

#1722

Inverse Constitutional AI: Compressing Preferences into Principles

Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.

ICLR 2025arXiv:2406.06560

citations

#1723

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

Teng Xiao, Yige Yuan, Zhengyu Chen et al.

ICLR 2025arXiv:2502.00883

citations

#1724

Moral Alignment for LLM Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

ICLR 2025oralarXiv:2410.01639

citations

#1725

Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models

Lucio La Cava, Andrea Tagarelli

AAAI 2025paperarXiv:2401.07115

citations

#1726

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Xin Liu, Jie Liu, Jie Tang et al.

CVPR 2025arXiv:2503.06896

citations

#1727

EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities

Talor Abramovich, Meet Udeshi, Minghao Shao et al.

ICML 2025arXiv:2409.16165

citations

#1728

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.

ICCV 2025arXiv:2504.01724

citations

#1729

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen et al.

AAAI 2025paperarXiv:2408.17267

citations

#1730

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

Ke Ji, Jiahao Xu, Tian Liang et al.

NEURIPS 2025arXiv:2503.02875

citations

#1731

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

Haonan Han, Rui Yang, Huan Liao et al.

ICCV 2025arXiv:2405.18525

citations

#1732

DisCo: Graph-Based Disentangled Contrastive Learning for Cold-Start Cross-Domain Recommendation

Hourun Li, Yifan Wang, Zhiping Xiao et al.

AAAI 2025paperarXiv:2412.15005

citations

#1733

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025arXiv:2503.20188

citations

#1734

GenMol: A Drug Discovery Generalist with Discrete Diffusion

Seul Lee, Karsten Kreis, Srimukh Veccham et al.

ICML 2025arXiv:2501.06158

citations

#1735

R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO

Huanjin Yao, Qixiang Yin, Jingyi Zhang et al.

NEURIPS 2025arXiv:2505.16673

citations

#1736

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan, Shurong Zheng, Yousong Zhu et al.

ICCV 2025arXiv:2403.09333

citations

#1737

PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization

André Hottung, Mridul Mahajan, Kevin Tierney

ICLR 2025arXiv:2402.14048

citations

#1738

AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies

Jian Gao, Weidong Cao, Junyi Yang et al.

ICLR 2025arXiv:2503.00205

citations

#1739

Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning

Jinlong Pang, Na Di, Zhaowei Zhu et al.

ICML 2025arXiv:2502.01968

citations

#1740

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Alejandro Lozano, Min Woo Sun, James Burgess et al.

CVPR 2025arXiv:2501.07171

citations

#1741

FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error

Beilin Chu, Xuan Xu, Xin Wang et al.

CVPR 2025arXiv:2412.07140

citations

#1742

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Guanyu Zhou, Yibo Yan, Xin Zou et al.

ICLR 2025arXiv:2410.04780

citations

#1743

Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Laura Leal-Taixe

CVPR 2025highlightarXiv:2501.14914

citations

#1744

Discrete Copula Diffusion

Anji Liu, Oliver Broadrick, Mathias Niepert et al.

ICLR 2025arXiv:2410.01949

citations

#1745

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang et al.

ICLR 2025arXiv:2412.09349

citations

#1746

Robust Tracking via Mamba-based Context-aware Token Learning

Jinxia Xie, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2412.13611

citations

#1747

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

Guangqi Jiang, Yifei Sun, Tao Huang et al.

ICLR 2025arXiv:2410.22325

citations

#1748

Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo

João Loula, Benjamin LeBrun, Li Du et al.

ICLR 2025arXiv:2504.13139

citations

#1749

Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning

Mingyang Chen, sunhaoze, Tianpeng Li et al.

ICLR 2025arXiv:2410.12952

citations

#1750

ResearchTown: Simulator of Human Research Community

Haofei Yu, Zhaochen Hong, Zirui Cheng et al.

ICML 2025arXiv:2412.17767

citations

#1751

Data Selection via Optimal Control for Language Models

Yuxian Gu, Li Dong, Hongning Wang et al.

ICLR 2025arXiv:2410.07064

citations

#1752

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.

ICCV 2025arXiv:2504.08736

citations

#1753

Language Representations Can be What Recommenders Need: Findings and Potentials

Leheng Sheng, An Zhang, Yi Zhang et al.

ICLR 2025arXiv:2407.05441

citations

#1754

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Le Yang, Ziwei Zheng, Boxu Chen et al.

CVPR 2025arXiv:2412.13817

citations

#1755

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Renqiu Xia, mingsheng li, Hancheng Ye et al.

ICLR 2025arXiv:2412.11863

citations

#1756

Scaling Laws for Native Multimodal Models

Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa et al.

ICCV 2025arXiv:2504.07951

citations

#1757

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Canyu Zhao, Yanlong Sun, Mingyu Liu et al.

NEURIPS 2025spotlightarXiv:2502.17157

citations

#1758

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025arXiv:2407.15811

citations

#1759

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025arXiv:2503.01210

citations

#1760

Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent

Yongxian Wei, Anke Tang, Li Shen et al.

ICML 2025arXiv:2501.01230

citations

#1761

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

ICLR 2025arXiv:2412.13795

citations

#1762

Underdamped Diffusion Bridges with Applications to Sampling

Denis Blessing, Julius Berner, Lorenz Richter et al.

ICLR 2025arXiv:2503.01006

citations

#1763

ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Jiaru Zou, Ling Yang, Jingwen Gu et al.

NEURIPS 2025arXiv:2506.18896

citations

#1764

TAPIP3D: Tracking Any Point in Persistent 3D Geometry

Bowei Zhang, Lei Ke, Adam Harley et al.

NEURIPS 2025oralarXiv:2504.14717

citations

#1765

BLADE: Enhancing Black-Box Large Language Models with Small Domain-Specific Models

Haitao Li, Qingyao Ai, Jia Chen et al.

AAAI 2025paperarXiv:2403.18365

citations

#1766

Towards Neural Scaling Laws for Time Series Foundation Models

Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.

ICLR 2025arXiv:2410.12360

citations

#1767

Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models

Jie Ma, Zhitao Gao, Qi Chai et al.

AAAI 2025paperarXiv:2409.03155

citations

#1768

Can LLMs Solve Longer Math Word Problems Better?

Xin Xu, Tong Xiao, Zitong Chao et al.

ICLR 2025arXiv:2405.14804

citations

#1769

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

Xiaofeng Wang, Kang Zhao, Feng Liu et al.

NEURIPS 2025arXiv:2411.08380

citations

#1770

How Do Large Language Monkeys Get Their Power (Laws)?

Rylan Schaeffer, Joshua Kazdan, John Hughes et al.

ICML 2025oralarXiv:2502.17578

citations

#1771

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025arXiv:2406.19353

citations

#1772

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu, Yihang Zhu, Yunao Huang et al.

CVPR 2025arXiv:2412.03142

citations

#1773

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Siyuan Liang, Jiawei Liang, Tianyu Pang et al.

CVPR 2025arXiv:2406.18844

citations

#1774

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

Julien Siems, Timur Carstensen, Arber Zela et al.

NEURIPS 2025arXiv:2502.10297

citations

#1775

DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance

Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.

AAAI 2025paperarXiv:2406.18459

citations

#1776

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

Rui Ye, shuo tang, Rui Ge et al.

ICML 2025arXiv:2503.03686

citations

#1777

An Intelligent Agentic System for Complex Image Restoration Problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.

ICLR 2025arXiv:2410.17809

citations

#1778

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification

Wei Li, Renshan Zhang, Rui Shao et al.

NEURIPS 2025arXiv:2508.21046

citations

#1779

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Weifeng Lin, Xinyu Wei, Renrui Zhang et al.

ICLR 2025arXiv:2409.15278

citations

#1780

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025arXiv:2505.16652

citations

#1781

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.

ICLR 2025arXiv:2412.14957

citations

#1782

Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine

Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.

AAAI 2025paperarXiv:2412.09278

citations

#1783

A Unified Approach to Routing and Cascading for LLMs

Jasper Dekoninck, Maximilian Baader, Martin Vechev

ICML 2025arXiv:2410.10347

citations

#1784

Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.

ICML 2025oralarXiv:2405.08719

citations

#1785

FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model

Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.

ICLR 2025arXiv:2412.08261

citations

#1786

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.

ICCV 2025arXiv:2404.03214

citations

#1787

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

CVPR 2025arXiv:2411.17190

citations

#1788

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

Kazi Hasan Ibn Arif, JinYi Yoon, Dimitrios S. Nikolopoulos et al.

AAAI 2025paperarXiv:2408.10945

citations

#1789

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Yiying Yang, Wei Cheng, Sijin Chen et al.

NEURIPS 2025arXiv:2504.06263

citations

#1790

Artificial Kuramoto Oscillatory Neurons

Takeru Miyato, Sindy Löwe, Andreas Geiger et al.

ICLR 2025oralarXiv:2410.13821

citations

#1791

LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation

Qidong Liu, Xian Wu, Wanyu Wang et al.

AAAI 2025paperarXiv:2409.19925

citations

#1792

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Zixuan Gong, Qi Zhang, Guangyin Bao et al.

AAAI 2025paperarXiv:2404.12630

citations

#1793

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

Thomas Kuntz, Agatha Duzan, Hao Zhao et al.

NEURIPS 2025spotlightarXiv:2506.14866

citations

#1794

Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case

Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.

NEURIPS 2025oralarXiv:2208.14960

citations

#1795

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Yang Zhou, Xu Gao, Zichong Chen et al.

CVPR 2025arXiv:2502.20235

citations

#1796

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives

Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka et al.

ICLR 2025arXiv:2502.02723

citations

#1797

3D Vision-Language Gaussian Splatting

Qucheng Peng, Benjamin Planche, Zhongpai Gao et al.

ICLR 2025arXiv:2410.07577

citations

#1798

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025arXiv:2412.11890

citations

#1799

Self-Adapting Language Models

Adam Zweiger, Jyo Pari, Han Guo et al.

NEURIPS 2025arXiv:2506.10943

citations

#1800

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

CVPR 2025arXiv:2405.17403

citations

← Previous

1...7 8 9 10 11...112