Most Cited 2025 "model weight vectors" Papers

22,274 papers found • Page 8 of 112

#1401

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685
21
citations
#1402

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025highlightarXiv:2506.18903
21
citations
#1403

Training on the Benchmark Is Not All You Need

Shiwen Ni, Xiangtao Kong, Chengming Li et al.

AAAI 2025paperarXiv:2409.01790
21
citations
#1404

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025posterarXiv:2504.06632
21
citations
#1405

Reinforced Lifelong Editing for Language Models

Zherui Li, Houcheng Jiang, Hao Chen et al.

ICML 2025posterarXiv:2502.05759
21
citations
#1406

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Yiwu Zhong, Zhuoming Liu, Yin Li et al.

ICCV 2025posterarXiv:2412.03248
21
citations
#1407

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Jiaqi Huang, Zunnan Xu, Ting Liu et al.

AAAI 2025paperarXiv:2501.08580
21
citations
#1408

Improving Semantic Understanding in Speech Language Models via Brain-tuning

Omer Moussa, Dietrich Klakow, Mariya Toneva

ICLR 2025posterarXiv:2410.09230
21
citations
#1409

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.

ICLR 2025posterarXiv:2406.02548
21
citations
#1410

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Hao Zhao, Maksym Andriushchenko, francesco croce et al.

ICLR 2025posterarXiv:2405.19874
21
citations
#1411

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Yuanbo Yang, Jiahao Shao, Xinyang Li et al.

CVPR 2025posterarXiv:2412.21117
21
citations
#1412

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Xingrun Xing, Boyan Gao, Zheng Liu et al.

ICLR 2025posterarXiv:2407.04752
21
citations
#1413

Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

Qichao Shentu, Beibu Li, Kai Zhao et al.

ICLR 2025posterarXiv:2405.15273
21
citations
#1414

A Transfer Attack to Image Watermarks

Yuepeng Hu, Zhengyuan Jiang, Moyang Guo et al.

ICLR 2025posterarXiv:2403.15365
21
citations
#1415

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Gleb Rodionov, Roman Garipov, Alina Shutova et al.

NEURIPS 2025spotlightarXiv:2504.06261
20
citations
#1416

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

Jingtao Li, Yingyi Liu, XINYU WANG et al.

CVPR 2025posterarXiv:2503.21841
20
citations
#1417

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Jaihoon Kim, Taehoon Yoon, Jisung Hwang et al.

NEURIPS 2025posterarXiv:2503.19385
20
citations
#1418

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Nick Hansen, Jyothir S V, Vlad Sobal et al.

ICLR 2025posterarXiv:2405.18418
20
citations
#1419

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Boyu Gou, Zanming Huang, Yuting Ning et al.

NEURIPS 2025posterarXiv:2506.21506
20
citations
#1420

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025posterarXiv:2412.14592
20
citations
#1421

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

Xinnan Dai, Haohao QU, Yifei Shen et al.

ICLR 2025posterarXiv:2410.05298
20
citations
#1422

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.

ICLR 2025posterarXiv:2409.15477
20
citations
#1423

{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains

Shunyu Yao, Noah Shinn, Pedram Razavi et al.

ICLR 2025poster
20
citations
#1424

Parallel Scaling Law for Language Models

Mouxiang Chen, Binyuan Hui, Zeyu Cui et al.

NEURIPS 2025posterarXiv:2505.10475
20
citations
#1425

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paperarXiv:2412.04403
20
citations
#1426

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025posterarXiv:2411.19189
20
citations
#1427

Rope to Nope and Back Again: A New Hybrid Attention Strategy

Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.

NEURIPS 2025posterarXiv:2501.18795
20
citations
#1428

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds

Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.

ICCV 2025posterarXiv:2504.02261
20
citations
#1429

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.

ICLR 2025posterarXiv:2410.03765
20
citations
#1430

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Songhua Liu, Zhenxiong Tan, Xinchao Wang

NEURIPS 2025posterarXiv:2412.16112
20
citations
#1431

Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification

Haojian Huang, Chuanyu Qin, Zhe Liu et al.

AAAI 2025paperarXiv:2409.00755
20
citations
#1432

SynCity: Training-Free Generation of 3D Cities

Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.

ICCV 2025poster
20
citations
#1433

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paperarXiv:2501.18512
20
citations
#1434

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025posterarXiv:2412.10494
20
citations
#1435

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

Jin Wang, Yao Lai, Aoxue Li et al.

NEURIPS 2025spotlightarXiv:2505.20147
20
citations
#1436

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.

CVPR 2025posterarXiv:2412.09982
20
citations
#1437

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025posterarXiv:2503.18278
20
citations
#1438

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang et al.

CVPR 2025highlightarXiv:2503.16979
20
citations
#1439

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.

ICCV 2025posterarXiv:2412.01812
20
citations
#1440

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Yifan Pu, Yiming Zhao, Zhicong Tang et al.

CVPR 2025posterarXiv:2502.18364
20
citations
#1441

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu et al.

CVPR 2025highlightarXiv:2503.08257
20
citations
#1442

OccMamba: Semantic Occupancy Prediction with State Space Models

Heng Li, Yuenan Hou, Xiaohan Xing et al.

CVPR 2025posterarXiv:2408.09859
20
citations
#1443

Text2PDE: Latent Diffusion Models for Accessible Physics Simulation

Anthony Zhou, Zijie Li, Michael Schneier et al.

ICLR 2025oralarXiv:2410.01153
20
citations
#1444

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025posterarXiv:2412.03283
20
citations
#1445

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025posterarXiv:2502.20056
20
citations
#1446

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

Hui Zhang, Zuxuan Wu, Zhen Xing et al.

AAAI 2025paperarXiv:2311.14768
20
citations
#1447

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation

Ding Zhong, Xu Zheng, Chenfei Liao et al.

ICCV 2025highlightarXiv:2503.07098
20
citations
#1448

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

chengqian gao, Haonan Li, Liu Liu et al.

ICML 2025posterarXiv:2502.09650
20
citations
#1449

Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting

Marcel Kollovieh, Marten Lienen, David Lüdke et al.

ICLR 2025oralarXiv:2410.03024
20
citations
#1450

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Barrett Tang, Zile Huang, Chengzhi Liu et al.

ICLR 2025poster
20
citations
#1451

Modeling Complex System Dynamics with Flow Matching Across Time and Conditions

Martin Rohbeck, Edward De Brouwer, Charlotte Bunne et al.

ICLR 2025oral
20
citations
#1452

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Minheng Ni, YuTao Fan, Lei Zhang et al.

ICLR 2025posterarXiv:2410.03321
20
citations
#1453

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Nikolai Kalischek, Michael Oechsle, Fabian Manhardt et al.

ICLR 2025posterarXiv:2501.17162
20
citations
#1454

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025posterarXiv:2412.19761
20
citations
#1455

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li et al.

CVPR 2025posterarXiv:2411.17864
20
citations
#1456

D^3: Scaling Up Deepfake Detection by Learning from Discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu et al.

CVPR 2025posterarXiv:2404.04584
20
citations
#1457

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.

NEURIPS 2025spotlightarXiv:2505.23883
20
citations
#1458

Reflective Gaussian Splatting

Yuxuan Yao, Zixuan Zeng, Chun Gu et al.

ICLR 2025posterarXiv:2412.19282
20
citations
#1459

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025posterarXiv:2410.23208
20
citations
#1460

P(all-atom) Is Unlocking New Path For Protein Design

Wei Qu, Jiawei Guan, Rui Ma et al.

ICML 2025spotlight
20
citations
#1461

Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution

Zeyu Xiao, Zhuoyuan Li, Wei Jia

AAAI 2025paper
20
citations
#1462

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen et al.

ICLR 2025posterarXiv:2412.15182
20
citations
#1463

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.

ICML 2025posterarXiv:2505.07395
20
citations
#1464

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025posterarXiv:2503.23135
20
citations
#1465

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Weixian Lei, Jiacong Wang, Haochen Wang et al.

ICCV 2025highlightarXiv:2504.10462
20
citations
#1466

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.

NEURIPS 2025posterarXiv:2501.19252
20
citations
#1467

First-Person Fairness in Chatbots

Tyna Eloundou, Alex Beutel, David Robinson et al.

ICLR 2025posterarXiv:2410.19803
20
citations
#1468

Is Your Multimodal Language Model Oversensitive to Safe Queries?

Xirui Li, Hengguang Zhou, Ruochen Wang et al.

ICLR 2025posterarXiv:2406.17806
20
citations
#1469

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025posterarXiv:2503.08269
20
citations
#1470

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Haoyu Wang, Zhilu Zhang, Donglin Di et al.

AAAI 2025paperarXiv:2404.17364
20
citations
#1471

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025posterarXiv:2412.00678
20
citations
#1472

Self-Challenging Language Model Agents

Yifei Zhou, Sergey Levine, Jason Weston et al.

NEURIPS 2025posterarXiv:2506.01716
20
citations
#1473

Controlling Large Language Models Through Concept Activation Vectors

Hanyu Zhang, Xiting Wang, Chengao Li et al.

AAAI 2025paperarXiv:2501.05764
20
citations
#1474

STIV: Scalable Text and Image Conditioned Video Generation

Zongyu Lin, Wei Liu, Chen Chen et al.

ICCV 2025posterarXiv:2412.07730
20
citations
#1475

Efficient Reinforcement Learning with Large Language Model Priors

Xue Yan, Yan Song, Xidong Feng et al.

ICLR 2025posterarXiv:2410.07927
20
citations
#1476

Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns

Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.

AAAI 2025paperarXiv:2404.14812
20
citations
#1477

ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models

Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim et al.

ICLR 2025posterarXiv:2503.00564
20
citations
#1478

Online Preference Alignment for Language Models via Count-based Exploration

Chenjia Bai, Yang Zhang, Shuang Qiu et al.

ICLR 2025posterarXiv:2501.12735
20
citations
#1479

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Massimiliano Viola, Kevin Qu, Nando Metzger et al.

ICCV 2025posterarXiv:2412.13389
20
citations
#1480

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

Fatemeh Ghezloo, Saygin Seyfioglu, Rustin Soraki et al.

ICCV 2025posterarXiv:2502.08916
20
citations
#1481

Temporal Reasoning Transfer from Text to Video

Lei Li, Yuanxin Liu, Linli Yao et al.

ICLR 2025oralarXiv:2410.06166
20
citations
#1482

BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion

Huafeng Li, Dayong Su, Qing Cai et al.

AAAI 2025paperarXiv:2412.08050
20
citations
#1483

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

Jixun Yao, Yang Yuguang, Yu Pan et al.

AAAI 2025paperarXiv:2412.04724
20
citations
#1484

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Xue zhucun, Jiangning Zhang, Teng Hu et al.

NEURIPS 2025posterarXiv:2506.13691
20
citations
#1485

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification

Wei Li, Renshan Zhang, Rui Shao et al.

NEURIPS 2025posterarXiv:2508.21046
20
citations
#1486

LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction

Er Jin, Qihui Feng, Yongli Mou et al.

AAAI 2025paperarXiv:2501.01767
20
citations
#1487

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025posterarXiv:2411.17787
20
citations
#1488

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.

ICLR 2025posterarXiv:2502.19148
20
citations
#1489

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

Yutong Wang, Jiali Zeng, Xuebo Liu et al.

ICLR 2025posterarXiv:2410.08143
20
citations
#1490

Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Shizhe Diao, Yu Yang, Yonggan Fu et al.

NEURIPS 2025spotlightarXiv:2504.13161
20
citations
#1491

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Weiyu Huang, Yuezhou Hu, Guohao Jian et al.

AAAI 2025paperarXiv:2407.20584
20
citations
#1492

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025posterarXiv:2501.12389
20
citations
#1493

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Xudong LU, Yinghao Chen, chencheng Chen et al.

CVPR 2025posterarXiv:2411.10640
20
citations
#1494

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.

NEURIPS 2025posterarXiv:2502.14819
20
citations
#1495

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Letitia Parcalabescu, Anette Frank

ICLR 2025posterarXiv:2404.18624
20
citations
#1496

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Lei Yang, Xinyu Zhang, Jun Li et al.

NEURIPS 2025spotlightarXiv:2411.10962
20
citations
#1497

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025posterarXiv:2411.19417
20
citations
#1498

Framer: Interactive Frame Interpolation

Wen Wang, Qiuyu Wang, Kecheng Zheng et al.

ICLR 2025posterarXiv:2410.18978
20
citations
#1499

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025posterarXiv:2411.16443
19
citations
#1500

Emergence of meta-stable clustering in mean-field transformer models

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

ICLR 2025posterarXiv:2410.23228
19
citations
#1501

E(n) Equivariant Topological Neural Networks

Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.

ICLR 2025posterarXiv:2405.15429
19
citations
#1502

Towards a Unified Copernicus Foundation Model for Earth Vision

Yi Wang, Zhitong Xiong, Chenying Liu et al.

ICCV 2025posterarXiv:2503.11849
19
citations
#1503

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

Yuxuan Bian, Ailing Zeng, Xuan Ju et al.

AAAI 2025paperarXiv:2407.21136
19
citations
#1504

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
19
citations
#1505

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2025posterarXiv:2501.08549
19
citations
#1506

Improved Video VAE for Latent Video Diffusion Model

Pingyu Wu, Kai Zhu, Yu Liu et al.

CVPR 2025posterarXiv:2411.06449
19
citations
#1507

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Yang Liu, Ming Ma, Xiaomin Yu et al.

NEURIPS 2025posterarXiv:2505.12448
19
citations
#1508

Design Principles and Challenges for Gaze + Pinch Interaction in XR

Ken Pfeuffer, Hans Gellersen, Mar Gonzalez-Franco

ISMAR 2025paper
19
citations
#1509

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Yue Chen, Xingyu Chen, Anpei Chen et al.

CVPR 2025posterarXiv:2412.09606
19
citations
#1510

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025posterarXiv:2504.15485
19
citations
#1511

Perturbation-Restrained Sequential Model Editing

Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.

ICLR 2025posterarXiv:2405.16821
19
citations
#1512

Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment

Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.

ICLR 2025posterarXiv:2405.18654
19
citations
#1513

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.

CVPR 2025posterarXiv:2411.11505
19
citations
#1514

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Jiarui Jin, Haoyu Wang, Hongyan Li et al.

ICLR 2025posterarXiv:2502.10707
19
citations
#1515

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.

ICLR 2025posterarXiv:2406.12034
19
citations
#1516

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025posterarXiv:2410.03878
19
citations
#1517

Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Haoran Sun, Yurong Chen, Siwei Wang et al.

NEURIPS 2025posterarXiv:2405.16276
19
citations
#1518

Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models

Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy et al.

NEURIPS 2025posterarXiv:2506.04210
19
citations
#1519

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Tian-Xing Xu, Xiangjun Gao, Wenbo Hu et al.

ICCV 2025posterarXiv:2504.01016
19
citations
#1520

Efficiently Scaling LLM Reasoning Programs with Certaindex

Yichao Fu, Junda Chen, Siqi Zhu et al.

NEURIPS 2025poster
19
citations
#1521

Universal Length Generalization with Turing Programs

Kaiying Hou, David Brandfonbrener, Sham Kakade et al.

ICML 2025posterarXiv:2407.03310
19
citations
#1522

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.

NEURIPS 2025posterarXiv:2502.20313
19
citations
#1523

Zero-shot forecasting of chaotic systems

Yuanzhao Zhang, William Gilpin

ICLR 2025posterarXiv:2409.15771
19
citations
#1524

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NEURIPS 2025posterarXiv:2505.13031
19
citations
#1525

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NEURIPS 2025posterarXiv:2505.20347
19
citations
#1526

Investigating Non-Transitivity in LLM-as-a-Judge

Yi Xu, Laura Ruis, Tim Rocktäschel et al.

ICML 2025spotlightarXiv:2502.14074
19
citations
#1527

COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training

Haocheng Xi, Han Cai, Ligeng Zhu et al.

ICLR 2025posterarXiv:2410.19313
19
citations
#1528

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Zijing Hu, Fengda Zhang, Long Chen et al.

CVPR 2025posterarXiv:2503.11240
19
citations
#1529

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu et al.

ICLR 2025posterarXiv:2503.01174
19
citations
#1530

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.

COLM 2025paperarXiv:2504.04377
19
citations
#1531

CRANE: Reasoning with constrained LLM generation

Debangshu Banerjee, Tarun Suresh, Shubham Ugare et al.

ICML 2025posterarXiv:2502.09061
19
citations
#1532

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025posterarXiv:2412.19326
19
citations
#1533

GameArena: Evaluating LLM Reasoning through Live Computer Games

Lanxiang Hu, Qiyu Li, Anze Xie et al.

ICLR 2025posterarXiv:2412.06394
19
citations
#1534

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu, Maan Qraitem, Anisa Majhi et al.

ICLR 2025posterarXiv:2407.17773
19
citations
#1535

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025posterarXiv:2411.18499
19
citations
#1536

KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA

Xiaorui Su, Yibo Wang, Shanghua Gao et al.

ICLR 2025posterarXiv:2410.04660
19
citations
#1537

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025posterarXiv:2504.09228
19
citations
#1538

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Haiwen Diao, Xiaotong Li, Yufeng Cui et al.

ICCV 2025highlightarXiv:2502.06788
19
citations
#1539

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Gang Liu, Michael Sun, Wojciech Matusik et al.

ICLR 2025posterarXiv:2410.04223
19
citations
#1540

Influence-Guided Diffusion for Dataset Distillation

Mingyang Chen, Jiawei Du, Bo Huang et al.

ICLR 2025poster
19
citations
#1541

A Rainbow in Deep Network Black Boxes

Florentin Guth, Brice Ménard, Gaspar Rochette et al.

ICLR 2025posterarXiv:2305.18512
19
citations
#1542

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Yuchi Wang, Junliang Guo, Jianhong Bai et al.

AAAI 2025paperarXiv:2405.15758
19
citations
#1543

Pre-training Auto-regressive Robotic Models with 4D Representations

Dantong Niu, Yuvan Sharma, Haoru Xue et al.

ICML 2025posterarXiv:2502.13142
19
citations
#1544

Is Noise Conditioning Necessary for Denoising Generative Models?

Qiao Sun, Zhicheng Jiang, Hanhong Zhao et al.

ICML 2025posterarXiv:2502.13129
19
citations
#1545

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Yexin Liu, Zhengyang Liang, Yueze Wang et al.

CVPR 2025posterarXiv:2406.10638
19
citations
#1546

Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

Federico Errica, Henrik Christiansen, Viktor Zaverkin et al.

ICML 2025posterarXiv:2312.16560
19
citations
#1547

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee et al.

ICLR 2025posterarXiv:2406.19292
19
citations
#1548

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Qizhe Zhang, Mengzhen Liu, Lichen Li et al.

NEURIPS 2025posterarXiv:2506.10967
19
citations
#1549

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025posterarXiv:2501.16629
19
citations
#1550

Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Hanzhe Liang, Guoyang Xie, Chengbin Hou et al.

AAAI 2025paperarXiv:2412.13461
19
citations
#1551

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025posterarXiv:2411.02336
19
citations
#1552

Language Models Need Inductive Biases to Count Inductively

Yingshan Chang, Yonatan Bisk

ICLR 2025posterarXiv:2405.20131
19
citations
#1553

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.

AAAI 2025paperarXiv:2409.02664
19
citations
#1554

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Wenhui Tan, Jiaze Li, Jianzhong Ju et al.

NEURIPS 2025posterarXiv:2505.16552
19
citations
#1555

Segmenting Maxillofacial Structures in CBCT Volumes

Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.

CVPR 2025poster
19
citations
#1556

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang, Yuwei An, Hongyi Liu et al.

NEURIPS 2025spotlightarXiv:2506.09991
19
citations
#1557

Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

Junyi Li, Zhilu Zhang, Wangmeng Zuo

AAAI 2025paperarXiv:2404.07846
19
citations
#1558

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

Hengjia Li, Haonan Qiu, Shiwei Zhang et al.

ICCV 2025posterarXiv:2411.17048
19
citations
#1559

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025posterarXiv:2405.12661
19
citations
#1560

Reducing Tool Hallucination via Reliability Alignment

Hongshen Xu, Zichen Zhu, Lei Pan et al.

ICML 2025posterarXiv:2412.04141
19
citations
#1561

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025posterarXiv:2412.01243
19
citations
#1562

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

Zhen Liu, Tim Xiao, Weiyang Liu et al.

ICLR 2025posterarXiv:2412.07775
19
citations
#1563

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025posterarXiv:2410.17637
19
citations
#1564

Benchmarking Agentic Workflow Generation

Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.

ICLR 2025posterarXiv:2410.07869
19
citations
#1565

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
19
citations
#1566

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025posterarXiv:2503.11423
19
citations
#1567

On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity

Quentin Bertrand, Anne Gagneux, Mathurin Massias et al.

NEURIPS 2025oralarXiv:2506.03719
19
citations
#1568

Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.

CVPR 2025posterarXiv:2411.18625
19
citations
#1569

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Michael-Andrei Panaitescu-Liess, Zora Che, Bang An et al.

AAAI 2025paperarXiv:2407.17417
19
citations
#1570

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

CVPR 2025highlightarXiv:2412.09586
19
citations
#1571

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Xin Di, Long Peng, Peizhe Xia et al.

CVPR 2025posterarXiv:2408.08665
19
citations
#1572

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025posterarXiv:2505.23766
19
citations
#1573

Automated Proof Generation for Rust Code via Self-Evolution

Tianyu Chen, Shuai Lu, Shan Lu et al.

ICLR 2025posterarXiv:2410.15756
19
citations
#1574

Robotouille: An Asynchronous Planning Benchmark for LLM Agents

Gonzalo Gonzalez-Pumariega, Leong Yean, Neha Sunkara et al.

ICLR 2025posterarXiv:2502.05227
19
citations
#1575

Progress or Regress? Self-Improvement Reversal in Post-training

Ting Wu, Xuefeng Li, Pengfei Liu

ICLR 2025posterarXiv:2407.05013
19
citations
#1576

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Div Garg, Diego Caples, Andis Draguns et al.

NEURIPS 2025posterarXiv:2504.11543
19
citations
#1577

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

Zhenhua Xu, Yan Bai, Yujia Zhang et al.

CVPR 2025highlight
19
citations
#1578

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

CVPR 2025posterarXiv:2410.00379
19
citations
#1579

A Unified Approach to Routing and Cascading for LLMs

Jasper Dekoninck, Maximilian Baader, Martin Vechev

ICML 2025posterarXiv:2410.10347
19
citations
#1580

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang et al.

ICCV 2025posterarXiv:2503.16421
19
citations
#1581

Cut Your Losses in Large-Vocabulary Language Models

Erik Wijmans, Brody Huval, Alexander Hertzberg et al.

ICLR 2025posterarXiv:2411.09009
19
citations
#1582

TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets

Yuzhe YANG, Yifei Zhang, Minghao Wu et al.

NEURIPS 2025oralarXiv:2502.01506
19
citations
#1583

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025posterarXiv:2503.01645
19
citations
#1584

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

Boyi Deng, Wenjie Wang, Fengbin Zhu et al.

AAAI 2025paperarXiv:2406.11497
19
citations
#1585

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Zuopeng Yang, Jiluan Fan, Anli Yan et al.

CVPR 2025highlightarXiv:2502.10794
19
citations
#1586

NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments

Xuan Yao, Junyu Gao, Changsheng Xu

ICCV 2025posterarXiv:2506.23468
19
citations
#1587

DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model

Yi Liu, Changran Xu, Yunhao Zhou et al.

ICLR 2025posterarXiv:2502.15832
19
citations
#1588

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.

ICLR 2025posterarXiv:2405.20935
19
citations
#1589

UniK3D: Universal Camera Monocular 3D Estimation

Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.

CVPR 2025posterarXiv:2503.16591
19
citations
#1590

Learning Long Range Dependencies on Graphs via Random Walks

Dexiong Chen, Till Schulz, Karsten Borgwardt

ICLR 2025posterarXiv:2406.03386
19
citations
#1591

Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models

Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee et al.

AAAI 2025paperarXiv:2403.15249
19
citations
#1592

SELF-EVOLVED REWARD LEARNING FOR LLMS

Chenghua Huang, Zhizhen Fan, Lu Wang et al.

ICLR 2025posterarXiv:2411.00418
19
citations
#1593

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

Xiaowen Ma, Zhen-Liang Ni, Xinghao Chen

ICCV 2025posterarXiv:2411.17473
19
citations
#1594

CAD-Recode: Reverse Engineering CAD Code from Point Clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.

ICCV 2025posterarXiv:2412.14042
18
citations
#1595

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.

NEURIPS 2025oralarXiv:2503.13139
18
citations
#1596

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Yihong Luo, Xiaolong Chen, Xinghua Qu et al.

ICLR 2025posterarXiv:2403.12931
18
citations
#1597

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang et al.

AAAI 2025paperarXiv:2408.14909
18
citations
#1598

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.

ICML 2025posterarXiv:2411.16375
18
citations
#1599

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, JiaKui Hu et al.

CVPR 2025posterarXiv:2411.19654
18
citations
#1600

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubic, Federico Soldà, Aurelio Sulser et al.

ICLR 2025posterarXiv:2405.16674
18
citations