Most Cited 2025 "constrained online convex optimization" Papers

22,274 papers found • Page 8 of 112

#1401

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025posterarXiv:2504.06632
21
citations
#1402

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel et al.

CVPR 2025posterarXiv:2504.06397
21
citations
#1403

3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

Qingyuan Zhou, Weidong Yang, Ben Fei et al.

AAAI 2025paperarXiv:2404.05522
21
citations
#1404

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Yue Li, Qi Ma, Runyi Yang et al.

ICCV 2025posterarXiv:2503.18052
21
citations
#1405

Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks

Mario Lino, Tobias Pfaff, Nils Thuerey

ICLR 2025posterarXiv:2504.02843
21
citations
#1406

Variational Diffusion Posterior Sampling with Midpoint Guidance

Badr MOUFAD, Yazid Janati el idrissi, Lisa Bedin et al.

ICLR 2025posterarXiv:2410.09945
21
citations
#1407

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Tejas Karkhanis

NEURIPS 2025spotlightarXiv:2505.15201
21
citations
#1408

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Ce Zhang, Zifu Wan, Zhehan Kan et al.

ICLR 2025posterarXiv:2502.06130
21
citations
#1409

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Jingjing Chang, Yixiao Fang, Peng Xing et al.

NEURIPS 2025posterarXiv:2506.07977
21
citations
#1410

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Zihan Liu, Shuangrui Ding, Zhixiong Zhang et al.

ICML 2025posterarXiv:2502.13128
21
citations
#1411

Flow: Modularized Agentic Workflow Automation

Boye Niu, Yiliao Song, Kai Lian et al.

ICLR 2025posterarXiv:2501.07834
21
citations
#1412

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025posterarXiv:2411.16856
21
citations
#1413

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.

ICCV 2025posterarXiv:2504.07960
21
citations
#1414

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma et al.

NEURIPS 2025posterarXiv:2504.19162
21
citations
#1415

Monitoring Latent World States in Language Models with Propositional Probes

Jiahai Feng, Stuart Russell, Jacob Steinhardt

ICLR 2025posterarXiv:2406.19501
21
citations
#1416

Agent-Oriented Planning in Multi-Agent Systems

Ao LI, Yuexiang Xie, Songze Li et al.

ICLR 2025posterarXiv:2410.02189
21
citations
#1417

FonTS: Text Rendering With Typography and Style Controls

Wenda SHI, Yiren Song, Dengming Zhang et al.

ICCV 2025posterarXiv:2412.00136
21
citations
#1418

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan

ICLR 2025posterarXiv:2501.09009
21
citations
#1419

Scalable Ranked Preference Optimization for Text-to-Image Generation

Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.

ICCV 2025posterarXiv:2410.18013
21
citations
#1420

Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning

Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga et al.

ICLR 2025posterarXiv:2405.16507
21
citations
#1421

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi, Wenbo Li, Yuechen Zhang et al.

AAAI 2025paperarXiv:2406.16476
21
citations
#1422

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.

AAAI 2025paperarXiv:2409.09345
21
citations
#1423

GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering

Hongze CHEN, Zehong Lin, Jun Zhang

ICLR 2025posterarXiv:2410.02619
21
citations
#1424

TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems

Si-Yang Liu, Han-Jia Ye

ICML 2025posterarXiv:2502.02527
21
citations
#1425

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.

CVPR 2025posterarXiv:2412.10209
21
citations
#1426

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

Jiaqi Liao, Zhengyuan Yang, Linjie Li et al.

ICCV 2025posterarXiv:2503.19312
21
citations
#1427

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?

Ben Yao, Yazhou Zhang, Qiuchi Li et al.

AAAI 2025paperarXiv:2407.12725
21
citations
#1428

Understanding Optimization in Deep Learning with Central Flows

Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.

ICLR 2025posterarXiv:2410.24206
21
citations
#1429

The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis

El Mehdi Achour, Francois Malgouyres, Sebastien Gerchinovitz

ICLR 2025posterarXiv:2107.13289
21
citations
#1430

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

Ma Teng, Xiaojun Jia, Ranjie Duan et al.

ICCV 2025posterarXiv:2412.05934
21
citations
#1431

Diverse Preference Learning for Capabilities and Alignment

Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell

ICLR 2025posterarXiv:2511.08594
21
citations
#1432

A Transfer Attack to Image Watermarks

Yuepeng Hu, Zhengyuan Jiang, Moyang Guo et al.

ICLR 2025posterarXiv:2403.15365
21
citations
#1433

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685
21
citations
#1434

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025highlightarXiv:2506.18903
21
citations
#1435

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Jiaqi Huang, Zunnan Xu, Ting Liu et al.

AAAI 2025paperarXiv:2501.08580
21
citations
#1436

Improving Semantic Understanding in Speech Language Models via Brain-tuning

Omer Moussa, Dietrich Klakow, Mariya Toneva

ICLR 2025posterarXiv:2410.09230
21
citations
#1437

Training on the Benchmark Is Not All You Need

Shiwen Ni, Xiangtao Kong, Chengming Li et al.

AAAI 2025paperarXiv:2409.01790
21
citations
#1438

Reinforced Lifelong Editing for Language Models

Zherui Li, Houcheng Jiang, Hao Chen et al.

ICML 2025posterarXiv:2502.05759
21
citations
#1439

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Yiwu Zhong, Zhuoming Liu, Yin Li et al.

ICCV 2025posterarXiv:2412.03248
21
citations
#1440

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.

ICLR 2025posterarXiv:2406.02548
21
citations
#1441

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.

COLM 2025paper
21
citations
#1442

SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

Zouying Cao, Yifei Yang, Hai Zhao

AAAI 2025paperarXiv:2408.11491
21
citations
#1443

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Yuanbo Yang, Jiahao Shao, Xinyang Li et al.

CVPR 2025posterarXiv:2412.21117
21
citations
#1444

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Xingrun Xing, Boyan Gao, Zheng Liu et al.

ICLR 2025posterarXiv:2407.04752
21
citations
#1445

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Hao Zhao, Maksym Andriushchenko, francesco croce et al.

ICLR 2025posterarXiv:2405.19874
21
citations
#1446

Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

Qichao Shentu, Beibu Li, Kai Zhao et al.

ICLR 2025posterarXiv:2405.15273
21
citations
#1447

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

Jingtao Li, Yingyi Liu, XINYU WANG et al.

CVPR 2025posterarXiv:2503.21841
20
citations
#1448

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Gleb Rodionov, Roman Garipov, Alina Shutova et al.

NEURIPS 2025spotlightarXiv:2504.06261
20
citations
#1449

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Boyu Gou, Zanming Huang, Yuting Ning et al.

NEURIPS 2025posterarXiv:2506.21506
20
citations
#1450

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Jaihoon Kim, Taehoon Yoon, Jisung Hwang et al.

NEURIPS 2025posterarXiv:2503.19385
20
citations
#1451

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paperarXiv:2412.04403
20
citations
#1452

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025posterarXiv:2412.14592
20
citations
#1453

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Nick Hansen, Jyothir S V, Vlad Sobal et al.

ICLR 2025posterarXiv:2405.18418
20
citations
#1454

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.

ICLR 2025posterarXiv:2409.15477
20
citations
#1455

SynCity: Training-Free Generation of 3D Cities

Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.

ICCV 2025poster
20
citations
#1456

Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification

Haojian Huang, Chuanyu Qin, Zhe Liu et al.

AAAI 2025paperarXiv:2409.00755
20
citations
#1457

Parallel Scaling Law for Language Models

Mouxiang Chen, Binyuan Hui, Zeyu Cui et al.

NEURIPS 2025posterarXiv:2505.10475
20
citations
#1458

Rope to Nope and Back Again: A New Hybrid Attention Strategy

Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.

NEURIPS 2025posterarXiv:2501.18795
20
citations
#1459

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025posterarXiv:2411.19189
20
citations
#1460

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.

ICLR 2025posterarXiv:2410.03765
20
citations
#1461

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paperarXiv:2501.18512
20
citations
#1462

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Songhua Liu, Zhenxiong Tan, Xinchao Wang

NEURIPS 2025posterarXiv:2412.16112
20
citations
#1463

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025posterarXiv:2412.10494
20
citations
#1464

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.

ICCV 2025posterarXiv:2412.01812
20
citations
#1465

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.

CVPR 2025posterarXiv:2412.09982
20
citations
#1466

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025posterarXiv:2503.18278
20
citations
#1467

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang et al.

CVPR 2025highlightarXiv:2503.16979
20
citations
#1468

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

Xinnan Dai, Haohao QU, Yifei Shen et al.

ICLR 2025posterarXiv:2410.05298
20
citations
#1469

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Yifan Pu, Yiming Zhao, Zhicong Tang et al.

CVPR 2025posterarXiv:2502.18364
20
citations
#1470

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu et al.

CVPR 2025highlightarXiv:2503.08257
20
citations
#1471

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

Jin Wang, Yao Lai, Aoxue Li et al.

NEURIPS 2025spotlightarXiv:2505.20147
20
citations
#1472

OccMamba: Semantic Occupancy Prediction with State Space Models

Heng Li, Yuenan Hou, Xiaohan Xing et al.

CVPR 2025posterarXiv:2408.09859
20
citations
#1473

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds

Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.

ICCV 2025posterarXiv:2504.02261
20
citations
#1474

{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains

Shunyu Yao, Noah Shinn, Pedram Razavi et al.

ICLR 2025poster
20
citations
#1475

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025posterarXiv:2412.03283
20
citations
#1476

Text2PDE: Latent Diffusion Models for Accessible Physics Simulation

Anthony Zhou, Zijie Li, Michael Schneier et al.

ICLR 2025oralarXiv:2410.01153
20
citations
#1477

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025posterarXiv:2502.20056
20
citations
#1478

Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting

Marcel Kollovieh, Marten Lienen, David Lüdke et al.

ICLR 2025oralarXiv:2410.03024
20
citations
#1479

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation

Ding Zhong, Xu Zheng, Chenfei Liao et al.

ICCV 2025highlightarXiv:2503.07098
20
citations
#1480

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

Hui Zhang, Zuxuan Wu, Zhen Xing et al.

AAAI 2025paperarXiv:2311.14768
20
citations
#1481

Modeling Complex System Dynamics with Flow Matching Across Time and Conditions

Martin Rohbeck, Edward De Brouwer, Charlotte Bunne et al.

ICLR 2025oral
20
citations
#1482

P(all-atom) Is Unlocking New Path For Protein Design

Wei Qu, Jiawei Guan, Rui Ma et al.

ICML 2025spotlight
20
citations
#1483

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Barrett Tang, Zile Huang, Chengzhi Liu et al.

ICLR 2025poster
20
citations
#1484

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025posterarXiv:2412.19761
20
citations
#1485

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Nikolai Kalischek, Michael Oechsle, Fabian Manhardt et al.

ICLR 2025posterarXiv:2501.17162
20
citations
#1486

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Minheng Ni, YuTao Fan, Lei Zhang et al.

ICLR 2025posterarXiv:2410.03321
20
citations
#1487

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li et al.

CVPR 2025posterarXiv:2411.17864
20
citations
#1488

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

chengqian gao, Haonan Li, Liu Liu et al.

ICML 2025posterarXiv:2502.09650
20
citations
#1489

D^3: Scaling Up Deepfake Detection by Learning from Discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu et al.

CVPR 2025posterarXiv:2404.04584
20
citations
#1490

Reflective Gaussian Splatting

Yuxuan Yao, Zixuan Zeng, Chun Gu et al.

ICLR 2025posterarXiv:2412.19282
20
citations
#1491

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025posterarXiv:2410.23208
20
citations
#1492

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.

NEURIPS 2025spotlightarXiv:2505.23883
20
citations
#1493

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Weixian Lei, Jiacong Wang, Haochen Wang et al.

ICCV 2025highlightarXiv:2504.10462
20
citations
#1494

Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution

Zeyu Xiao, Zhuoyuan Li, Wei Jia

AAAI 2025paper
20
citations
#1495

First-Person Fairness in Chatbots

Tyna Eloundou, Alex Beutel, David Robinson et al.

ICLR 2025posterarXiv:2410.19803
20
citations
#1496

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen et al.

ICLR 2025posterarXiv:2412.15182
20
citations
#1497

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Haoyu Wang, Zhilu Zhang, Donglin Di et al.

AAAI 2025paperarXiv:2404.17364
20
citations
#1498

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.

ICML 2025posterarXiv:2505.07395
20
citations
#1499

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.

NEURIPS 2025posterarXiv:2501.19252
20
citations
#1500

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025posterarXiv:2503.23135
20
citations
#1501

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025posterarXiv:2503.08269
20
citations
#1502

Is Your Multimodal Language Model Oversensitive to Safe Queries?

Xirui Li, Hengguang Zhou, Ruochen Wang et al.

ICLR 2025posterarXiv:2406.17806
20
citations
#1503

Temporal Reasoning Transfer from Text to Video

Lei Li, Yuanxin Liu, Linli Yao et al.

ICLR 2025oralarXiv:2410.06166
20
citations
#1504

STIV: Scalable Text and Image Conditioned Video Generation

Zongyu Lin, Wei Liu, Chen Chen et al.

ICCV 2025posterarXiv:2412.07730
20
citations
#1505

Efficient Reinforcement Learning with Large Language Model Priors

Xue Yan, Yan Song, Xidong Feng et al.

ICLR 2025posterarXiv:2410.07927
20
citations
#1506

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025posterarXiv:2412.00678
20
citations
#1507

Self-Challenging Language Model Agents

Yifei Zhou, Sergey Levine, Jason Weston et al.

NEURIPS 2025posterarXiv:2506.01716
20
citations
#1508

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Massimiliano Viola, Kevin Qu, Nando Metzger et al.

ICCV 2025posterarXiv:2412.13389
20
citations
#1509

Controlling Large Language Models Through Concept Activation Vectors

Hanyu Zhang, Xiting Wang, Chengao Li et al.

AAAI 2025paperarXiv:2501.05764
20
citations
#1510

ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models

Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim et al.

ICLR 2025posterarXiv:2503.00564
20
citations
#1511

Online Preference Alignment for Language Models via Count-based Exploration

Chenjia Bai, Yang Zhang, Shuang Qiu et al.

ICLR 2025posterarXiv:2501.12735
20
citations
#1512

Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns

Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.

AAAI 2025paperarXiv:2404.14812
20
citations
#1513

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

Fatemeh Ghezloo, Saygin Seyfioglu, Rustin Soraki et al.

ICCV 2025posterarXiv:2502.08916
20
citations
#1514

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification

Wei Li, Renshan Zhang, Rui Shao et al.

NEURIPS 2025posterarXiv:2508.21046
20
citations
#1515

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

Jixun Yao, Yang Yuguang, Yu Pan et al.

AAAI 2025paperarXiv:2412.04724
20
citations
#1516

BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion

Huafeng Li, Dayong Su, Qing Cai et al.

AAAI 2025paperarXiv:2412.08050
20
citations
#1517

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Xue zhucun, Jiangning Zhang, Teng Hu et al.

NEURIPS 2025posterarXiv:2506.13691
20
citations
#1518

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

Yutong Wang, Jiali Zeng, Xuebo Liu et al.

ICLR 2025posterarXiv:2410.08143
20
citations
#1519

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025posterarXiv:2411.17787
20
citations
#1520

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.

ICLR 2025posterarXiv:2502.19148
20
citations
#1521

Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Shizhe Diao, Yu Yang, Yonggan Fu et al.

NEURIPS 2025spotlightarXiv:2504.13161
20
citations
#1522

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.

NEURIPS 2025posterarXiv:2502.14819
20
citations
#1523

Framer: Interactive Frame Interpolation

Wen Wang, Qiuyu Wang, Kecheng Zheng et al.

ICLR 2025posterarXiv:2410.18978
20
citations
#1524

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Weiyu Huang, Yuezhou Hu, Guohao Jian et al.

AAAI 2025paperarXiv:2407.20584
20
citations
#1525

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025posterarXiv:2501.12389
20
citations
#1526

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Xudong LU, Yinghao Chen, chencheng Chen et al.

CVPR 2025posterarXiv:2411.10640
20
citations
#1527

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Lei Yang, Xinyu Zhang, Jun Li et al.

NEURIPS 2025spotlightarXiv:2411.10962
20
citations
#1528

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Letitia Parcalabescu, Anette Frank

ICLR 2025posterarXiv:2404.18624
20
citations
#1529

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025posterarXiv:2411.19417
20
citations
#1530

LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction

Er Jin, Qihui Feng, Yongli Mou et al.

AAAI 2025paperarXiv:2501.01767
20
citations
#1531

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025posterarXiv:2411.16443
19
citations
#1532

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
19
citations
#1533

Towards a Unified Copernicus Foundation Model for Earth Vision

Yi Wang, Zhitong Xiong, Chenying Liu et al.

ICCV 2025posterarXiv:2503.11849
19
citations
#1534

Emergence of meta-stable clustering in mean-field transformer models

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

ICLR 2025posterarXiv:2410.23228
19
citations
#1535

E(n) Equivariant Topological Neural Networks

Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.

ICLR 2025posterarXiv:2405.15429
19
citations
#1536

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2025posterarXiv:2501.08549
19
citations
#1537

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

Yuxuan Bian, Ailing Zeng, Xuan Ju et al.

AAAI 2025paperarXiv:2407.21136
19
citations
#1538

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Jiarui Jin, Haoyu Wang, Hongyan Li et al.

ICLR 2025posterarXiv:2502.10707
19
citations
#1539

Improved Video VAE for Latent Video Diffusion Model

Pingyu Wu, Kai Zhu, Yu Liu et al.

CVPR 2025posterarXiv:2411.06449
19
citations
#1540

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.

ICLR 2025posterarXiv:2406.12034
19
citations
#1541

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Yang Liu, Ming Ma, Xiaomin Yu et al.

NEURIPS 2025posterarXiv:2505.12448
19
citations
#1542

Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment

Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.

ICLR 2025posterarXiv:2405.18654
19
citations
#1543

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Yue Chen, Xingyu Chen, Anpei Chen et al.

CVPR 2025posterarXiv:2412.09606
19
citations
#1544

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025posterarXiv:2504.15485
19
citations
#1545

Design Principles and Challenges for Gaze + Pinch Interaction in XR

Ken Pfeuffer, Hans Gellersen, Mar Gonzalez-Franco

ISMAR 2025paper
19
citations
#1546

Perturbation-Restrained Sequential Model Editing

Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.

ICLR 2025posterarXiv:2405.16821
19
citations
#1547

Design Principle Transfer in Neural Architecture Search via Large Language Models

Xun Zhou, Xingyu Wu, Liang Feng et al.

AAAI 2025paperarXiv:2408.11330
19
citations
#1548

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.

CVPR 2025posterarXiv:2411.11505
19
citations
#1549

Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Haoran Sun, Yurong Chen, Siwei Wang et al.

NEURIPS 2025posterarXiv:2405.16276
19
citations
#1550

Zero-shot forecasting of chaotic systems

Yuanzhao Zhang, William Gilpin

ICLR 2025posterarXiv:2409.15771
19
citations
#1551

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025posterarXiv:2410.03878
19
citations
#1552

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Tian-Xing Xu, Xiangjun Gao, Wenbo Hu et al.

ICCV 2025posterarXiv:2504.01016
19
citations
#1553

Efficiently Scaling LLM Reasoning Programs with Certaindex

Yichao Fu, Junda Chen, Siqi Zhu et al.

NEURIPS 2025poster
19
citations
#1554

Universal Length Generalization with Turing Programs

Kaiying Hou, David Brandfonbrener, Sham Kakade et al.

ICML 2025posterarXiv:2407.03310
19
citations
#1555

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.

NEURIPS 2025posterarXiv:2502.20313
19
citations
#1556

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NEURIPS 2025posterarXiv:2505.13031
19
citations
#1557

Investigating Non-Transitivity in LLM-as-a-Judge

Yi Xu, Laura Ruis, Tim Rocktäschel et al.

ICML 2025spotlightarXiv:2502.14074
19
citations
#1558

Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models

Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy et al.

NEURIPS 2025posterarXiv:2506.04210
19
citations
#1559

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NEURIPS 2025posterarXiv:2505.20347
19
citations
#1560

COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training

Haocheng Xi, Han Cai, Ligeng Zhu et al.

ICLR 2025posterarXiv:2410.19313
19
citations
#1561

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.

COLM 2025paperarXiv:2504.04377
19
citations
#1562

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Zijing Hu, Fengda Zhang, Long Chen et al.

CVPR 2025posterarXiv:2503.11240
19
citations
#1563

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025posterarXiv:2501.16629
19
citations
#1564

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu et al.

ICLR 2025posterarXiv:2503.01174
19
citations
#1565

CRANE: Reasoning with constrained LLM generation

Debangshu Banerjee, Tarun Suresh, Shubham Ugare et al.

ICML 2025posterarXiv:2502.09061
19
citations
#1566

GameArena: Evaluating LLM Reasoning through Live Computer Games

Lanxiang Hu, Qiyu Li, Anze Xie et al.

ICLR 2025posterarXiv:2412.06394
19
citations
#1567

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025posterarXiv:2412.19326
19
citations
#1568

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu, Maan Qraitem, Anisa Majhi et al.

ICLR 2025posterarXiv:2407.17773
19
citations
#1569

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025posterarXiv:2411.18499
19
citations
#1570

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Haiwen Diao, Xiaotong Li, Yufeng Cui et al.

ICCV 2025highlightarXiv:2502.06788
19
citations
#1571

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025posterarXiv:2504.09228
19
citations
#1572

KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA

Xiaorui Su, Yibo Wang, Shanghua Gao et al.

ICLR 2025posterarXiv:2410.04660
19
citations
#1573

A Rainbow in Deep Network Black Boxes

Florentin Guth, Brice Ménard, Gaspar Rochette et al.

ICLR 2025posterarXiv:2305.18512
19
citations
#1574

Influence-Guided Diffusion for Dataset Distillation

Mingyang Chen, Jiawei Du, Bo Huang et al.

ICLR 2025poster
19
citations
#1575

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Yuchi Wang, Junliang Guo, Jianhong Bai et al.

AAAI 2025paperarXiv:2405.15758
19
citations
#1576

Pre-training Auto-regressive Robotic Models with 4D Representations

Dantong Niu, Yuvan Sharma, Haoru Xue et al.

ICML 2025posterarXiv:2502.13142
19
citations
#1577

Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

Federico Errica, Henrik Christiansen, Viktor Zaverkin et al.

ICML 2025posterarXiv:2312.16560
19
citations
#1578

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Yexin Liu, Zhengyang Liang, Yueze Wang et al.

CVPR 2025posterarXiv:2406.10638
19
citations
#1579

Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models

Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee et al.

AAAI 2025paperarXiv:2403.15249
19
citations
#1580

SELF-EVOLVED REWARD LEARNING FOR LLMS

Chenghua Huang, Zhizhen Fan, Lu Wang et al.

ICLR 2025posterarXiv:2411.00418
19
citations
#1581

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee et al.

ICLR 2025posterarXiv:2406.19292
19
citations
#1582

Is Noise Conditioning Necessary for Denoising Generative Models?

Qiao Sun, Zhicheng Jiang, Hanhong Zhao et al.

ICML 2025posterarXiv:2502.13129
19
citations
#1583

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Gang Liu, Michael Sun, Wojciech Matusik et al.

ICLR 2025posterarXiv:2410.04223
19
citations
#1584

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Qizhe Zhang, Mengzhen Liu, Lichen Li et al.

NEURIPS 2025posterarXiv:2506.10967
19
citations
#1585

Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Hanzhe Liang, Guoyang Xie, Chengbin Hou et al.

AAAI 2025paperarXiv:2412.13461
19
citations
#1586

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025posterarXiv:2411.02336
19
citations
#1587

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.

AAAI 2025paperarXiv:2409.02664
19
citations
#1588

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Wenhui Tan, Jiaze Li, Jianzhong Ju et al.

NEURIPS 2025posterarXiv:2505.16552
19
citations
#1589

Segmenting Maxillofacial Structures in CBCT Volumes

Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.

CVPR 2025poster
19
citations
#1590

Reducing Tool Hallucination via Reliability Alignment

Hongshen Xu, Zichen Zhu, Lei Pan et al.

ICML 2025posterarXiv:2412.04141
19
citations
#1591

Language Models Need Inductive Biases to Count Inductively

Yingshan Chang, Yonatan Bisk

ICLR 2025posterarXiv:2405.20131
19
citations
#1592

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

Zhen Liu, Tim Xiao, Weiyang Liu et al.

ICLR 2025posterarXiv:2412.07775
19
citations
#1593

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025posterarXiv:2410.17637
19
citations
#1594

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

Hengjia Li, Haonan Qiu, Shiwei Zhang et al.

ICCV 2025posterarXiv:2411.17048
19
citations
#1595

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025posterarXiv:2405.12661
19
citations
#1596

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang, Yuwei An, Hongyi Liu et al.

NEURIPS 2025spotlightarXiv:2506.09991
19
citations
#1597

Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

Junyi Li, Zhilu Zhang, Wangmeng Zuo

AAAI 2025paperarXiv:2404.07846
19
citations
#1598

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025posterarXiv:2412.01243
19
citations
#1599

Progress or Regress? Self-Improvement Reversal in Post-training

Ting Wu, Xuefeng Li, Pengfei Liu

ICLR 2025posterarXiv:2407.05013
19
citations
#1600

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
19
citations