Most Cited 2025 "gradient domination assumptions" Papers

22,274 papers found • Page 18 of 112

#3401

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Hao Zhong, Muzhi Zhu, Zongze Du et al.

NEURIPS 2025oralarXiv:2505.20256
14
citations
#3402

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Tian Qin, David Alvarez-Melis, Samy Jelassi et al.

COLM 2025paperarXiv:2504.07052
14
citations
#3403

Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient

Jan Ludziejewski, Maciej Pióro, Jakub Krajewski et al.

ICML 2025arXiv:2502.05172
14
citations
#3404

Unleashing Vecset Diffusion Model for Fast Shape Generation

Zeqiang Lai, Zhao Yunfei, Zibo Zhao et al.

ICCV 2025highlightarXiv:2503.16302
14
citations
#3405

RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning

Yuanhuiyi Lyu, Xu Zheng, Lutao Jiang et al.

ICML 2025arXiv:2502.00848
14
citations
#3406

REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints

Di Wu, Liu Liu, Zhou Linli et al.

NEURIPS 2025arXiv:2503.06677
14
citations
#3407

Learning to Generate Unit Tests for Automated Debugging

Archiki Prasad, Elias Stengel-Eskin, Justin Chen et al.

COLM 2025paperarXiv:2502.01619
14
citations
#3408

Breaking Barriers in Physical-World Adversarial Examples: Improving Robustness and Transferability via Robust Feature

Yichen Wang, Yuxuan Chou, Ziqi Zhou et al.

AAAI 2025paperarXiv:2412.16958
14
citations
#3409

Optimization with Access to Auxiliary Information

EL MAHDI CHAYTI, Sai Karimireddy

ICLR 2025arXiv:2206.00395
14
citations
#3410

Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Junsung Park, Jungbeom Lee, Jongyoon Song et al.

ICCV 2025arXiv:2501.10913
14
citations
#3411

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

Song Wang, Peng Wang, Tong Zhou et al.

ICLR 2025arXiv:2407.02408
14
citations
#3412

Scaling Laws for Optimal Data Mixtures

Mustafa Shukor, Louis Bethune, Dan Busbridge et al.

NEURIPS 2025arXiv:2507.09404
14
citations
#3413

Improving Long-Text Alignment for Text-to-Image Diffusion Models

Luping Liu, Chao Du, Tianyu Pang et al.

ICLR 2025arXiv:2410.11817
14
citations
#3414

A Manifold Perspective on the Statistical Generalization of Graph Neural Networks

Zhiyang Wang, Juan Cervino, Alejandro Ribeiro

ICML 2025arXiv:2406.05225
14
citations
#3415

Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

Changwei Wang, Shunpeng Chen, Yukun Song et al.

AAAI 2025paperarXiv:2504.09881
14
citations
#3416

Explore In-Context Segmentation via Latent Diffusion Models

Chaoyang Wang, Xiangtai Li, Henghui Ding et al.

AAAI 2025paperarXiv:2403.09616
14
citations
#3417

TextToucher: Fine-Grained Text-to-Touch Generation

Jiahang Tu, Hao Fu, Fengyu Yang et al.

AAAI 2025paperarXiv:2409.05427
14
citations
#3418

The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.

ICLR 2025arXiv:2406.12480
14
citations
#3419

Mixture of Attentions For Speculative Decoding

Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras et al.

ICLR 2025arXiv:2410.03804
14
citations
#3420

Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation

Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park et al.

CVPR 2025arXiv:2502.11477
14
citations
#3421

Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint

Junwei Zhou, Xueting Li, Lu Qi et al.

ICLR 2025arXiv:2410.15391
14
citations
#3422

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Zachary Charles, Gabriel Teston, Lucio Dery et al.

NEURIPS 2025spotlightarXiv:2503.09799
14
citations
#3423

Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Quanyu Long, Yue Deng, Leilei Gan et al.

COLM 2025paperarXiv:2402.13532
14
citations
#3424

Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation

Jun Hyeong Kim, Seonghwan Kim, Seokhyun Moon et al.

ICLR 2025arXiv:2410.01500
14
citations
#3425

Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling

Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.

ICLR 2025arXiv:2410.11236
14
citations
#3426

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Yana Wei, Liang Zhao, Jianjian Sun et al.

NEURIPS 2025arXiv:2507.05255
14
citations
#3427

CR-CTC: Consistency regularization on CTC for improved speech recognition

Zengwei Yao, Wei Kang, Xiaoyu Yang et al.

ICLR 2025oralarXiv:2410.05101
14
citations
#3428

RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Kunming Su, Qiuxia Wu, Panpan Cai et al.

AAAI 2025paperarXiv:2409.00353
14
citations
#3429

Pitfalls of Evidence-Based AI Policy

Stephen Casper, David Krueger, Dylan Hadfield-Menell

ICLR 2025arXiv:2502.09618
14
citations
#3430

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

Shengeng Tang, Jiayi He, Dan Guo et al.

AAAI 2025paperarXiv:2412.13609
14
citations
#3431

Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions

Eray Erturk, Fahad Kamran, Salar Abbaspourazad et al.

ICML 2025oralarXiv:2507.00191
14
citations
#3432

Fully-inductive Node Classification on Arbitrary Graphs

Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin et al.

ICLR 2025arXiv:2405.20445
14
citations
#3433

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Pengxiang Li, Shilin Yan, Jiayin Cai et al.

NEURIPS 2025arXiv:2505.20199
14
citations
#3434

Human Motion Instruction Tuning

Lei Li, Sen Jia, Jianhao Wang et al.

CVPR 2025arXiv:2411.16805
14
citations
#3435

Fairness-Accuracy Trade-Offs: A Causal Perspective

Drago Plecko, Elias Bareinboim

AAAI 2025paperarXiv:2405.15443
14
citations
#3436

AWRaCLe: All-Weather Image Restoration Using Visual In-Context Learning

Sudarshan Rajagopalan, Vishal M. Patel

AAAI 2025paperarXiv:2409.00263
14
citations
#3437

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Muzhi Zhu, Yuzhuo Tian, Hao Chen et al.

CVPR 2025arXiv:2503.08625
14
citations
#3438

FaceShot: Bring Any Character into Life

Junyao Gao, Yanan Sun, Fei Shen et al.

ICLR 2025arXiv:2503.00740
14
citations
#3439

Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis

Yanzuo Lu, Yuxi Ren, Xin Xia et al.

ICCV 2025highlightarXiv:2507.18569
14
citations
#3440

Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective

Xingjian Wu, Xiangfei Qiu, Hanyin Cheng et al.

NEURIPS 2025arXiv:2510.14510
14
citations
#3441

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

Andong Lu, Wanyu Wang, Chenglong Li et al.

AAAI 2025paperarXiv:2408.08827
14
citations
#3442

Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing

Peter Lippmann, Gerrit Gerhartz, Roman Remme et al.

ICLR 2025arXiv:2405.15389
14
citations
#3443

The Power of Context: How Multimodality Improves Image Super-Resolution

Kangfu Mei, Vishal M. Patel, Mojtaba Sahraee-Ardakan et al.

CVPR 2025arXiv:2503.14503
14
citations
#3444

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Qingni Wang, Tiantian Geng, Zhiyuan Wang et al.

ICLR 2025arXiv:2410.08174
14
citations
#3445

Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation

Hongwei Niu, Linhuang Xie, Jianghang Lin et al.

AAAI 2025paperarXiv:2412.12050
14
citations
#3446

When does compositional structure yield compositional generalization? A kernel theory.

Samuel Lippl, Kimberly Stachenfeld

ICLR 2025arXiv:2405.16391
14
citations
#3447

Quantized Spike-driven Transformer

Xuerui Qiu, Malu Zhang, Jieyuan Zhang et al.

ICLR 2025arXiv:2501.13492
14
citations
#3448

MiniMax-Remover: Taming Bad Noise Helps Video Object Removal

Bojia Zi, Weixuan Peng, Xianbiao Qi et al.

NEURIPS 2025arXiv:2505.24873
14
citations
#3449

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Tiehan Fan, Kepan Nan, Rui Xie et al.

CVPR 2025arXiv:2412.09283
14
citations
#3450

MiraGe: Editable 2D Images using Gaussian Splatting

Joanna Waczyńska, Tomasz Szczepanik, Piotr Borycki et al.

ICML 2025arXiv:2410.01521
14
citations
#3451

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection

Li Li, Huixian Gong, Hao Dong et al.

CVPR 2025highlightarXiv:2411.08227
14
citations
#3452

Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance

Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias et al.

CVPR 2025arXiv:2501.05379
14
citations
#3453

DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments

Shuhong Liu, Xiang Chen, Hongming Chen et al.

AAAI 2025paperarXiv:2408.11540
14
citations
#3454

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations

Ziqiao Peng, Yanbo Fan, Haoyu Wu et al.

CVPR 2025arXiv:2505.18096
14
citations
#3455

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Andrei Panferov, Jiale Chen, Rush Tabesh et al.

ICML 2025arXiv:2502.05003
14
citations
#3456

Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation

Qingchen Tang, Lei Fan, Maurice Pagnucco et al.

CVPR 2025arXiv:2503.12068
14
citations
#3457

ChatGarment: Garment Estimation, Generation and Editing via Large Language Models

Siyuan Bian, Chenghao Xu, Yuliang Xiu et al.

CVPR 2025arXiv:2412.17811
14
citations
#3458

Editable Concept Bottleneck Models

Lijie Hu, Chenyang Ren, Zhengyu Hu et al.

ICML 2025arXiv:2405.15476
14
citations
#3459

GRAM: A Generative Foundation Reward Model for Reward Generalization

Chenglong Wang, Yang Gan, Yifu Huo et al.

ICML 2025arXiv:2506.14175
14
citations
#3460

OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain

Wenzhen Yue, Yong Liu, Hao Wang et al.

NEURIPS 2025oralarXiv:2505.08550
14
citations
#3461

Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering

Yifan Lu, Yigeng Zhou, Jing Li et al.

AAAI 2025paperarXiv:2412.13782
14
citations
#3462

Improving Equivariant Networks with Probabilistic Symmetry Breaking

Hannah Lawrence, Vasco Portilheiro, Yan Zhang et al.

ICLR 2025arXiv:2503.21985
14
citations
#3463

Referring to Any Person

Qing Jiang, Lin Wu, Zhaoyang Zeng et al.

ICCV 2025arXiv:2503.08507
14
citations
#3464

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Yawen Shao, Wei Zhai, Yuhang Yang et al.

CVPR 2025arXiv:2411.19626
14
citations
#3465

MLIP Arena: Advancing Fairness and Transparency in Machine Learning Interatomic Potentials via an Open, Accessible Benchmark Platform

Yuan Chiang, Tobias Kreiman, Christine Zhang et al.

NEURIPS 2025spotlightarXiv:2509.20630
14
citations
#3466

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

Matan Levi, Yair Allouche, Daniel Ohayon et al.

AAAI 2025paperarXiv:2408.09304
14
citations
#3467

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee, Youngdo Lee, Takuma Seno et al.

ICML 2025spotlightarXiv:2502.15280
14
citations
#3468

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Yuchen Zhu, Tianrong Chen, Lingkai Kong et al.

ICLR 2025arXiv:2405.16381
14
citations
#3469

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

Xinyue Zhu, Binghao Huang, Yunzhu Li

NEURIPS 2025arXiv:2507.15062
14
citations
#3470

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Jian Yang, Dacheng Yin, Yizhou Zhou et al.

CVPR 2025arXiv:2410.10798
14
citations
#3471

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Yijun Liang, Ming Li, Chenrui Fan et al.

NEURIPS 2025arXiv:2504.10514
14
citations
#3472

Meta-World+: An Improved, Standardized, RL Benchmark

Reginald McLean, Evangelos Chatzaroulas, Luc McCutcheon et al.

NEURIPS 2025arXiv:2505.11289
14
citations
#3473

Weighted-Reward Preference Optimization for Implicit Model Fusion

Ziyi Yang, Fanqi Wan, Longguang Zhong et al.

ICLR 2025arXiv:2412.03187
14
citations
#3474

Multi-Turn Jailbreaking Large Language Models via Attention Shifting

Xiaohu Du, Fan Mo, Ming Wen et al.

AAAI 2025paper
14
citations
#3475

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Jinluan Yang, Dingnan Jin, Anke Tang et al.

NEURIPS 2025arXiv:2502.06876
14
citations
#3476

Mixture of Parrots: Experts improve memorization more than reasoning

Samy Jelassi, Clara Mohri, David Brandfonbrener et al.

ICLR 2025arXiv:2410.19034
14
citations
#3477

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space

Yifan Zhou, Zeqi Xiao, Shuai Yang et al.

CVPR 2025arXiv:2503.09419
14
citations
#3478

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

David Yifan Yao, Albert J. Zhai, Shenlong Wang

CVPR 2025highlightarXiv:2503.21761
14
citations
#3479

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy et al.

NEURIPS 2025arXiv:2508.09968
14
citations
#3480

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Jingxuan Wei, Cheng Tan, Qi Chen et al.

CVPR 2025highlightarXiv:2411.11916
14
citations
#3481

Unseen Visual Anomaly Generation

HAN SUN, Yunkang Cao, Hao Dong et al.

CVPR 2025arXiv:2406.01078
14
citations
#3482

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

Lukas Helff, Felix Friedrich, Manuel Brack et al.

ICML 2025arXiv:2406.05113
14
citations
#3483

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

Zikun Zhang, Zixiang Chen, Quanquan Gu

ICLR 2025arXiv:2410.02321
14
citations
#3484

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Shuwei Shi, Biao Gong, Xi Chen et al.

CVPR 2025arXiv:2412.05848
14
citations
#3485

Context Steering: Controllable Personalization at Inference Time

Zhiyang He, Sashrika Pandey, Mariah Schrum et al.

ICLR 2025arXiv:2405.01768
14
citations
#3486

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

Mingze Sun, Junting Dong, Junhao Chen et al.

CVPR 2025arXiv:2411.17423
14
citations
#3487

Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Xijie Huang, Xinyuan Wang, Hantao Zhang et al.

AAAI 2025paperarXiv:2405.20775
14
citations
#3488

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu, Junjie Lu, Chaofeng Qu et al.

ICML 2025arXiv:2502.13943
14
citations
#3489

CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs

Dung Nguyen, Thang Phan, Nam Le Hai et al.

ICLR 2025arXiv:2410.01999
14
citations
#3490

Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling

Dongyi Wang, Yuanwei Jiang, Zhenyi Zhang et al.

NEURIPS 2025arXiv:2505.13413
14
citations
#3491

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Huacan Wang, Ziyi Ni, Shuo Zhang et al.

NEURIPS 2025spotlightarXiv:2505.21577
14
citations
#3492

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Ziyi Wu, Anil Kag, Ivan Skorokhodov et al.

NEURIPS 2025oralarXiv:2506.03517
14
citations
#3493

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts

Minh Le, Chau Nguyen, Huy Nguyen et al.

ICLR 2025arXiv:2410.02200
14
citations
#3494

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Kun Liu, Qi Liu, Xinchen Liu et al.

CVPR 2025arXiv:2503.23715
14
citations
#3495

Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis

M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.

CVPR 2025arXiv:2412.06786
14
citations
#3496

Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters

Kevin Li, Sachin Goyal, João D Semedo et al.

ICLR 2025arXiv:2411.03312
14
citations
#3497

Variational Rectified Flow Matching

Pengsheng Guo, Alex Schwing

ICML 2025arXiv:2502.09616
14
citations
#3498

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.

ICCV 2025highlightarXiv:2503.22677
14
citations
#3499

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.

ICLR 2025arXiv:2302.06834
14
citations
#3500

Deep Distributed Optimization for Large-Scale Quadratic Programming

Augustinos Saravanos, Hunter Kuperman, Alex Oshin et al.

ICLR 2025arXiv:2412.12156
14
citations
#3501

Benchmarking LLMs' Judgments with No Gold Standard

Shengwei Xu, Yuxuan Lu, Grant Schoenebeck et al.

ICLR 2025arXiv:2411.07127
14
citations
#3502

PNVC: Towards Practical INR-based Video Compression

Ge Gao, Ho Man Kwan, Fan Zhang et al.

AAAI 2025paperarXiv:2409.00953
14
citations
#3503

LoLCATs: On Low-Rank Linearizing of Large Language Models

Michael Zhang, Simran Arora, Rahul Chalamala et al.

ICLR 2025arXiv:2410.10254
14
citations
#3504

Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels

Ruitao Pu, Yuan Sun, Yang Qin et al.

AAAI 2025paperarXiv:2501.01699
14
citations
#3505

Scaling Inference Time Compute for Diffusion Models

Nanye Ma, Shangyuan Tong, Haolin Jia et al.

CVPR 2025highlight
14
citations
#3506

Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions

Wei Yao, Haian Yin, Shangzhi Zeng et al.

ICLR 2025arXiv:2406.01992
14
citations
#3507

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

Xiang Xu, Lingdong Kong, hui shuai et al.

CVPR 2025arXiv:2501.04004
14
citations
#3508

ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts

Samar Khanna, Medhanie Irgau, David Lobell et al.

ICML 2025arXiv:2406.10973
14
citations
#3509

Sum of Squares Circuits

Lorenzo Loconte, Stefan Mengel, Antonio Vergari

AAAI 2025paperarXiv:2408.11778
14
citations
#3510

ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting

Guo Junfu, Yu Xin, Gaoyi Liu et al.

CVPR 2025arXiv:2503.08135
14
citations
#3511

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

Jun Liu, Zhenglun Kong, Pu Zhao et al.

AAAI 2025paperarXiv:2403.10799
14
citations
#3512

How to Synthesize Text Data without Model Collapse?

Xuekai Zhu, Daixuan Cheng, Hengli Li et al.

ICML 2025arXiv:2412.14689
14
citations
#3513

Contextual Bandits for Unbounded Context Distributions

Puning Zhao, Rongfei Fan, Shaowei Wang et al.

ICML 2025arXiv:2408.09655
14
citations
#3514

Model-agnostic meta-learners for estimating heterogeneous treatment effects over time

Dennis Frauen, Konstantin Hess, Stefan Feuerriegel

ICLR 2025arXiv:2407.05287
14
citations
#3515

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

Hao Tang, Chen-Wei Xie, Haiyang Wang et al.

NEURIPS 2025spotlightarXiv:2503.01342
14
citations
#3516

PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference

Jiarui Fang, Jinzhe Pan, Aoyu Li et al.

NEURIPS 2025arXiv:2405.14430
14
citations
#3517

MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow

Hanzhuo Huang, Yuan Liu, Ge Zheng et al.

ICLR 2025oralarXiv:2502.11697
14
citations
#3518

A Hitchhiker's Guide to Scaling Law Estimation

Leshem Choshen, Yang Zhang, Jacob Andreas

ICML 2025arXiv:2410.11840
14
citations
#3519

SITCOM: Step-wise Triple-Consistent Diffusion Sampling For Inverse Problems

Ismail Alkhouri, Shijun Liang, Cheng-Han Huang et al.

ICML 2025arXiv:2410.04479
14
citations
#3520

Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Feng Zhou, Ruiyang Liu, chen liu et al.

CVPR 2025arXiv:2412.08603
14
citations
#3521

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025paperarXiv:2501.06773
14
citations
#3522

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing

Tianyi Wei, Yifan Zhou, Dongdong Chen et al.

ICCV 2025arXiv:2503.16153
14
citations
#3523

CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark

Jian Wu, Linyi Yang, Zhen Wang et al.

ICLR 2025arXiv:2402.11924
14
citations
#3524

Puppeteer: Rig and Animate Your 3D Models

Chaoyue Song, Xiu Li, Fan Yang et al.

NEURIPS 2025oralarXiv:2508.10898
14
citations
#3525

Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation

HyunGi Kim, Siwon Kim, Jisoo Mok et al.

AAAI 2025paperarXiv:2501.04970
14
citations
#3526

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani et al.

CVPR 2025arXiv:2412.01801
14
citations
#3527

Ultra-Sparse Memory Network

Zihao Huang, Qiyang Min, Hongzhi Huang et al.

ICLR 2025arXiv:2411.12364
14
citations
#3528

Conformal Prediction for Causal Effects of Continuous Treatments

Maresa Schröder, Dennis Frauen, Jonas Schweisthal et al.

NEURIPS 2025arXiv:2407.03094
14
citations
#3529

Online Reasoning Video Segmentation with Just-in-Time Digital Twins

Yiqing Shen, Bohan Liu, Chenjia Li et al.

ICCV 2025arXiv:2503.21056
14
citations
#3530

Open-Canopy: Towards Very High Resolution Forest Monitoring

Fajwel Fogel, Yohann PERRON, Nikola Besic et al.

CVPR 2025highlightarXiv:2407.09392
14
citations
#3531

SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting

Hui Chen, Viet Luong, Lopamudra Mukherjee et al.

ICLR 2025oral
14
citations
#3532

Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Ming Dai, Jian Li, Jiedong Zhuang et al.

AAAI 2025paperarXiv:2501.06710
14
citations
#3533

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Yangsibo Huang, Daogao Liu, Lynn Chua et al.

ICLR 2025arXiv:2410.09591
13
citations
#3534

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.

ICLR 2025arXiv:2407.18134
13
citations
#3535

ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments

Hojae Han, seung-won hwang, Rajhans Samdani et al.

ICLR 2025arXiv:2502.19852
13
citations
#3536

Grounding Continuous Representations in Geometry: Equivariant Neural Fields

David Wessels, David Knigge, Riccardo Valperga et al.

ICLR 2025arXiv:2406.05753
13
citations
#3537

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

CVPR 2025arXiv:2411.15720
13
citations
#3538

Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck

Van Thuy Hoang, O-Joun Lee

AAAI 2025paperarXiv:2412.15589
13
citations
#3539

SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography

Xuanyu Zhang, Jiarui Meng, Zhipei Xu et al.

ICLR 2025arXiv:2503.06118
13
citations
#3540

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.

CVPR 2025highlightarXiv:2503.18803
13
citations
#3541

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

Cong Wei, Yujie Zhong, yingsen zeng et al.

ICCV 2025arXiv:2412.14006
13
citations
#3542

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction

Weirong Chen, Ganlin Zhang, Felix Wimbauer et al.

ICCV 2025arXiv:2504.14516
13
citations
#3543

LBM: Latent Bridge Matching for Fast Image-to-Image Translation

Clément Chadebec, Onur Tasar, Sanjeev Sreetharan et al.

ICCV 2025highlightarXiv:2503.07535
13
citations
#3544

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos

Dayal Singh Kalra, Tianyu He, Maissam Barkeshli

ICLR 2025arXiv:2311.02076
13
citations
#3545

Cost-efficient Collaboration between On-device and Cloud Language Models

Avanika Narayan, Dan Biderman, Sabri Eyuboglu et al.

ICML 2025arXiv:2502.15964
13
citations
#3546

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

Enshu Liu, Xuefei Ning, Yu Wang et al.

ICLR 2025arXiv:2412.17153
13
citations
#3547

Knowledge Is Power: Harnessing Large Language Models for Enhanced Cognitive Diagnosis

Zhiang Dong, Jingyuan Chen, Fei Wu

AAAI 2025paperarXiv:2502.05556
13
citations
#3548

Pathways on the Image Manifold: Image Editing via Video Generation

Noam Rotstein, Gal Yona, Daniel Silver et al.

CVPR 2025arXiv:2411.16819
13
citations
#3549

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang

NEURIPS 2025oralarXiv:2401.07844
13
citations
#3550

SmartEraser: Remove Anything from Images using Masked-Region Guidance

Longtao Jiang, Zhendong Wang, Jianmin Bao et al.

CVPR 2025arXiv:2501.08279
13
citations
#3551

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.

CVPR 2025arXiv:2502.05176
13
citations
#3552

Simplifying DINO via Coding Rate Regularization

Ziyang Wu, Jingyuan Zhang, Druv Pai et al.

ICML 2025arXiv:2502.10385
13
citations
#3553

RORem: Training a Robust Object Remover with Human-in-the-Loop

Ruibin Li, Tao Yang, Song Guo et al.

CVPR 2025arXiv:2501.00740
13
citations
#3554

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion

Eunji Kim, Siwon Kim, Minjun Park et al.

CVPR 2025arXiv:2408.12692
13
citations
#3555

Learning Transformer-based World Models with Contrastive Predictive Coding

Maxime Burchi, Radu Timofte

ICLR 2025oralarXiv:2503.04416
13
citations
#3556

Imputation for prediction: beware of diminishing returns.

Marine Le Morvan, Gael Varoquaux

ICLR 2025arXiv:2407.19804
13
citations
#3557

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Jiarui Yao, Yifan Hao, Hanning Zhang et al.

NEURIPS 2025arXiv:2505.02391
13
citations
#3558

Imagine360: Immersive 360 Video Generation from Perspective Anchor

Jing Tan, Shuai Yang, Tong Wu et al.

NEURIPS 2025arXiv:2412.03552
13
citations
#3559

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Yihong Luo, Tianyang Hu, Jiacheng Sun et al.

ICCV 2025arXiv:2503.06674
13
citations
#3560

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Andy Zhang, Joey Ji, Celeste Menders et al.

NEURIPS 2025arXiv:2505.15216
13
citations
#3561

Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration

Zhixuan Shen, Haonan Luo, Kexun Chen et al.

AAAI 2025paperarXiv:2412.18292
13
citations
#3562

Consistent and Controllable Image Animation with Motion Diffusion Models

Xin Ma, Yaohui Wang, Gengyun Jia et al.

CVPR 2025arXiv:2407.15642
13
citations
#3563

C-CLIP: Multimodal Continual Learning for Vision-Language Model

Wenzhuo Liu, Fei Zhu, Longhui Wei et al.

ICLR 2025
13
citations
#3564

Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning

Tim Lenz, Peter Neidlinger, Marta Ligero et al.

CVPR 2025arXiv:2411.13623
13
citations
#3565

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.

CVPR 2025arXiv:2502.19844
13
citations
#3566

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Yiren Song, Cheng Liu, Mike Zheng Shou

NEURIPS 2025arXiv:2505.18445
13
citations
#3567

Event-based Video Super-Resolution via State Space Models

Zeyu Xiao, Xinchao Wang

CVPR 2025
13
citations
#3568

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Jianzong Wu, Chao Tang, Jingbo Wang et al.

CVPR 2025arXiv:2412.07589
13
citations
#3569

Solving Inequality Proofs with Large Language Models

Jiayi Sheng, Luna Lyu, Jikai Jin et al.

NEURIPS 2025spotlightarXiv:2506.07927
13
citations
#3570

BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge

Terry Tong, Fei Wang, Zhe Zhao et al.

ICLR 2025arXiv:2503.00596
13
citations
#3571

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

Tianle Xia, Liang Ding, Guojia Wan et al.

AAAI 2025paperarXiv:2405.01649
13
citations
#3572

CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring

Benjamin Arnav, Pablo Bernabeu-Perez, Nathan Helm-Burger et al.

NEURIPS 2025arXiv:2505.23575
13
citations
#3573

A Unifying Framework for Representation Learning

Shaden Alshammari, John Hershey, Axel Feldmann et al.

ICLR 2025arXiv:2504.16929
13
citations
#3574

Large Language-Geometry Model: When LLM meets Equivariance

Zongzhao Li, Jiacheng Cen, Bing Su et al.

ICML 2025arXiv:2502.11149
13
citations
#3575

Learning Molecular Representation in a Cell

Gang Liu, Srijit Seal, John Arevalo et al.

ICLR 2025arXiv:2406.12056
13
citations
#3576

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin, Li Kang, Xiufeng Song et al.

ICCV 2025arXiv:2503.16408
13
citations
#3577

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Ege Özsoy, Chantal Pellegrini, Tobias Czempiel et al.

CVPR 2025arXiv:2503.02579
13
citations
#3578

CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance

Jinming Li, Yichen Zhu, Zhibin Tang et al.

ICCV 2025
13
citations
#3579

MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding

Weikang Qiu, Zheng Huang, Haoyu Hu et al.

ICML 2025arXiv:2502.15786
13
citations
#3580

Image Generation Diversity Issues and How to Tame Them

Mischa Dombrowski, Weitong Zhang, Hadrien Reynaud et al.

CVPR 2025arXiv:2411.16171
13
citations
#3581

FairGP: A Scalable and Fair Graph Transformer Using Graph Partitioning

Renqiang Luo, Huafei Huang, Ivan Lee et al.

AAAI 2025paperarXiv:2412.10669
13
citations
#3582

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

Andrew Estornell, Jean-Francois Ton, Yuanshun Yao et al.

ICLR 2025arXiv:2411.00053
13
citations
#3583

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang et al.

NEURIPS 2025arXiv:2502.05567
13
citations
#3584

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Yu Yuan, Xijun Wang, Yichen Sheng et al.

CVPR 2025highlightarXiv:2412.02168
13
citations
#3585

CoRA: Collaborative Information Perception by Large Language Model’s Weights for Recommendation

Yuting Liu, Jinghao Zhang, Yizhou Dang et al.

AAAI 2025paperarXiv:2408.10645
13
citations
#3586

Ambient Diffusion Omni: Training Good Models with Bad Data

Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans et al.

NEURIPS 2025spotlightarXiv:2506.10038
13
citations
#3587

Growing a Twig to Accelerate Large Vision-Language Models

Zhenwei Shao, Mingyang Wang, Zhou Yu et al.

ICCV 2025arXiv:2503.14075
13
citations
#3588

Automatic Curriculum Expert Iteration for Reliable LLM Reasoning

Zirui Zhao, Hanze Dong, Amrita Saha et al.

ICLR 2025arXiv:2410.07627
13
citations
#3589

ProSec: Fortifying Code LLMs with Proactive Security Alignment

Xiangzhe Xu, Zian Su, Jinyao Guo et al.

ICML 2025arXiv:2411.12882
13
citations
#3590

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Minki Kang, Jongwon Jeong, Seanie Lee et al.

NEURIPS 2025spotlightarXiv:2505.17612
13
citations
#3591

Large language models can learn and generalize steganographic chain-of-thought under process supervision

ROBERT MC CARTHY, Joey SKAF, Luis Ibanez-Lissen et al.

NEURIPS 2025arXiv:2506.01926
13
citations
#3592

ETTA: Elucidating the Design Space of Text-to-Audio Models

Sang-gil Lee, Zhifeng Kong, ARUSHI GOEL et al.

ICML 2025arXiv:2412.19351
13
citations
#3593

Segment Any Motion in Videos

Nan Huang, Wenzhao Zheng, Chenfeng Xu et al.

CVPR 2025arXiv:2503.22268
13
citations
#3594

Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence

Yuankai Luo, Lei Shi, Xiao-Ming Wu

ICML 2025arXiv:2502.09263
13
citations
#3595

MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion

Zebin He, Mx Yang, Shuhui Yang et al.

ICCV 2025highlightarXiv:2503.10289
13
citations
#3596

Post-hoc Reward Calibration: A Case Study on Length Bias

Zeyu Huang, Zihan Qiu, zili wang et al.

ICLR 2025arXiv:2409.17407
13
citations
#3597

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models

Sung-Yeon Park, Can Cui, Yunsheng Ma et al.

ICCV 2025arXiv:2503.12772
13
citations
#3598

The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD

Milad Nasr, Thomas Steinke, Borja Balle et al.

ICLR 2025arXiv:2410.06186
13
citations
#3599

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

Jinluan Yang, Anke Tang, Didi Zhu et al.

ICLR 2025arXiv:2410.13910
13
citations
#3600

Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting

Jingru Fei, Kun Yi, Wei Fan et al.

AAAI 2025paperarXiv:2501.17216
13
citations