Most Cited 2025 "federated vision-language benchmarks" Papers

22,274 papers found • Page 9 of 112

#1601

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NEURIPS 2025posterarXiv:2505.20347
19
citations
#1602

Forking Paths in Neural Text Generation

Eric Bigelow, Ari Holtzman, Hidenori Tanaka et al.

ICLR 2025posterarXiv:2412.07961
18
citations
#1603

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

zhenwei Wang, Tengfei Wang, Zexin He et al.

ICLR 2025posterarXiv:2409.11406
18
citations
#1604

Do as We Do, Not as You Think: the Conformity of Large Language Models

Zhiyuan Weng, Guikun Chen, Wenguan Wang

ICLR 2025posterarXiv:2501.13381
18
citations
#1605

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, JiaKui Hu et al.

CVPR 2025posterarXiv:2411.19654
18
citations
#1606

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Ziheng Zhou, Jinxing Zhou, Wei Qian et al.

AAAI 2025paperarXiv:2412.12628
18
citations
#1607

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.

CVPR 2025posterarXiv:2412.18108
18
citations
#1608

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Zhitong Xu, Haitao Wang, Jeff Phillips et al.

ICLR 2025posterarXiv:2402.02746
18
citations
#1609

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.

ICML 2025posterarXiv:2411.16375
18
citations
#1610

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.

COLM 2025paperarXiv:2503.08893
18
citations
#1611

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

Issar Tzachor, Boaz Lerner, Matan Levy et al.

ICLR 2025posterarXiv:2405.18065
18
citations
#1612

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Junfeng Ni, Yu Liu, Ruijie Lu et al.

CVPR 2025posterarXiv:2503.14830
18
citations
#1613

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Miruna Cretu, Charles Harris, Ilia Igashov et al.

ICLR 2025posterarXiv:2405.01155
18
citations
#1614

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Yukang Cao, Liang Pan, Kai Han et al.

ICLR 2025posterarXiv:2410.07164
18
citations
#1615

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi et al.

CVPR 2025highlightarXiv:2412.04458
18
citations
#1616

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

Rui Ye, Jingyi Chai, Xiangrui Liu et al.

ICLR 2025posterarXiv:2406.10630
18
citations
#1617

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NEURIPS 2025posterarXiv:2505.16394
18
citations
#1618

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Simran Kaur, Simon Park, Anirudh Goyal et al.

ICLR 2025posterarXiv:2408.14774
18
citations
#1619

SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

Wujiang Xu, Qitian Wu, Zujie Liang et al.

ICLR 2025oralarXiv:2405.17890
18
citations
#1620

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025posterarXiv:2503.00467
18
citations
#1621

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025posterarXiv:2412.07776
18
citations
#1622

VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs

Qiucheng Wu, Handong Zhao, Michael Saxon et al.

ICCV 2025poster
18
citations
#1623

MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo

Zhenlong Yuan, Cong Liu, Fei Shen et al.

AAAI 2025paperarXiv:2407.19323
18
citations
#1624

Generalization through variance: how noise shapes inductive biases in diffusion models

John Vastola

ICLR 2025posterarXiv:2504.12532
18
citations
#1625

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Eric Xue, Yijiang Li, Haoyang Liu et al.

AAAI 2025paperarXiv:2403.10045
18
citations
#1626

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Yihong Luo, Xiaolong Chen, Xinghua Qu et al.

ICLR 2025posterarXiv:2403.12931
18
citations
#1627

Diversity-Aware Policy Optimization for Large Language Model Reasoning

Jian Yao, Ran Cheng, Xingyu Wu et al.

NEURIPS 2025spotlightarXiv:2505.23433
18
citations
#1628

Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models

Hulingxiao He, Geng Li, Zijun Geng et al.

ICLR 2025posterarXiv:2501.15140
18
citations
#1629

Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training

Zhenxin Li, Shihao Wang, Shiyi Lan et al.

ICCV 2025posterarXiv:2503.12030
18
citations
#1630

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paperarXiv:2409.12181
18
citations
#1631

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubic, Federico Soldà, Aurelio Sulser et al.

ICLR 2025posterarXiv:2405.16674
18
citations
#1632

Commit0: Library Generation from Scratch

Wenting Zhao, Nan Jiang, Celine Lee et al.

ICLR 2025posterarXiv:2412.01769
18
citations
#1633

Ref-GS: Directional Factorization for 2D Gaussian Splatting

Youjia Zhang, Anpei Chen, Yumin Wan et al.

CVPR 2025posterarXiv:2412.00905
18
citations
#1634

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment

YOUHE JIANG, Ran Yan, Binhang Yuan

ICLR 2025posterarXiv:2502.07903
18
citations
#1635

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang, Baoxiong Jia, Yan Wang et al.

CVPR 2025posterarXiv:2503.22420
18
citations
#1636

Perm: A Parametric Representation for Multi-Style 3D Hair Modeling

Chengan He, Xin Sun, Zhixin Shu et al.

ICLR 2025posterarXiv:2407.19451
18
citations
#1637

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
18
citations
#1638

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025posterarXiv:2503.01610
18
citations
#1639

MoonCast: High-Quality Zero-Shot Podcast Generation

Zeqian Ju, Dongchao Yang, Shen Kai et al.

NEURIPS 2025oralarXiv:2503.14345
18
citations
#1640

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.

ICML 2025oralarXiv:2503.07067
18
citations
#1641

Block Verification Accelerates Speculative Decoding

Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.

ICLR 2025posterarXiv:2403.10444
18
citations
#1642

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025posterarXiv:2411.17673
18
citations
#1643

Delta Decompression for MoE-based LLMs Compression

Hao Gu, Wei Li, Lujun Li et al.

ICML 2025posterarXiv:2502.17298
18
citations
#1644

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025posterarXiv:2412.08614
18
citations
#1645

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

NEURIPS 2025poster
18
citations
#1646

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Li Hao, He CAO, Bin Feng et al.

NEURIPS 2025posterarXiv:2505.21318
18
citations
#1647

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025posterarXiv:2504.14666
18
citations
#1648

Palu: KV-Cache Compression with Low-Rank Projection

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin et al.

ICLR 2025poster
18
citations
#1649

What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph

Yutao Jiang, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2501.02268
18
citations
#1650

BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics

Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.

ICLR 2025posterarXiv:2403.10380
18
citations
#1651

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NEURIPS 2025posterarXiv:2210.14051
18
citations
#1652

Scaling Optimal LR Across Token Horizons

Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.

ICLR 2025posterarXiv:2409.19913
18
citations
#1653

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.

ICLR 2025posterarXiv:2410.02479
18
citations
#1654

CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.

ICLR 2025posterarXiv:2405.01033
18
citations
#1655

Controlling Language and Diffusion Models by Transporting Activations

Pau Rodriguez, Arno Blaas, Michal Klein et al.

ICLR 2025posterarXiv:2410.23054
18
citations
#1656

FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency

Han Huang, Yulun Wu, Chao Deng et al.

AAAI 2025paperarXiv:2501.04628
18
citations
#1657

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

ZeMing Gong, Austin Wang, Xiaoliang Huo et al.

ICLR 2025posterarXiv:2405.17537
18
citations
#1658

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.

CVPR 2025posterarXiv:2503.24026
18
citations
#1659

Benchmarking Predictive Coding Networks -- Made Simple

Luca Pinchetti, Chang Qi, Oleh Lokshyn et al.

ICLR 2025posterarXiv:2407.01163
18
citations
#1660

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025posterarXiv:2505.22094
18
citations
#1661

VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Chaoya Jiang, Yongrui Heng, Wei Ye et al.

NEURIPS 2025poster
18
citations
#1662

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Stefan Lionar, Jiabin Liang, Gim Hee Lee

CVPR 2025posterarXiv:2503.11629
18
citations
#1663

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025highlightarXiv:2412.04462
18
citations
#1664

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Kele Shao, Keda TAO, Can Qin et al.

NEURIPS 2025oralarXiv:2505.21334
18
citations
#1665

Encryption-Friendly LLM Architecture

Donghwan Rho, Taeseong Kim, Minje Park et al.

ICLR 2025posterarXiv:2410.02486
18
citations
#1666

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025posterarXiv:2503.21219
18
citations
#1667

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

Yi Chen, Yuying Ge, Weiliang Tang et al.

ICCV 2025posterarXiv:2412.04445
18
citations
#1668

Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators

Wenhan Gao, Ruichen Xu, Yuefan Deng et al.

ICLR 2025poster
18
citations
#1669

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Zeyu Yang, Zijie Pan, Chun Gu et al.

ICLR 2025oralarXiv:2404.02148
18
citations
#1670

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang et al.

AAAI 2025paperarXiv:2408.14909
18
citations
#1671

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.

NEURIPS 2025spotlightarXiv:2503.04412
18
citations
#1672

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

Xiao Wang, Yu Jin, Wentao Wu et al.

CVPR 2025posterarXiv:2412.06647
18
citations
#1673

Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems

Fu Luo, Xi Lin, Yaoxin Wu et al.

ICLR 2025poster
18
citations
#1674

MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

Junjie Li, Yang Liu, Weiqing Liu et al.

ICLR 2025posterarXiv:2409.07486
18
citations
#1675

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention

XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.

ICLR 2025posterarXiv:2507.23143
18
citations
#1676

Gradient-Free Generation for Hard-Constrained Systems

Chaoran Cheng, Boran Han, Danielle Maddix et al.

ICLR 2025posterarXiv:2412.01786
18
citations
#1677

Structure-Adaptive Multi-View Graph Clustering for Remote Sensing Data

Renxiang Guan, Wenxuan Tu, Siwei Wang et al.

AAAI 2025paper
18
citations
#1678

TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation

Yanyong Huang, Minghui Lu, Wei Huang et al.

AAAI 2025paper
18
citations
#1679

Hash3D: Training-free Acceleration for 3D Generation

Xingyi Yang, Songhua Liu, Xinchao Wang

CVPR 2025posterarXiv:2404.06091
18
citations
#1680

CAD-Recode: Reverse Engineering CAD Code from Point Clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.

ICCV 2025posterarXiv:2412.14042
18
citations
#1681

Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems

jindong tian, Yuxuan Liang, Ronghui Xu et al.

ICLR 2025oralarXiv:2410.19892
18
citations
#1682

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.

NEURIPS 2025oralarXiv:2503.13139
18
citations
#1683

The Illusion of Empathy: How AI Chatbots Shape Conversation Perception

Tingting Liu, Salvatore Giorgi, Ankit Aich et al.

AAAI 2025paperarXiv:2411.12877
18
citations
#1684

Learning to Discretize Denoising Diffusion ODEs

Vinh Tong, Trung-Dung Hoang, Anji Liu et al.

ICLR 2025posterarXiv:2405.15506
18
citations
#1685

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.

CVPR 2025posterarXiv:2412.01615
18
citations
#1686

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Seth Aycock, David Stap, Di Wu et al.

ICLR 2025posterarXiv:2409.19151
18
citations
#1687

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025posterarXiv:2503.20823
18
citations
#1688

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Bhavya, Stelian Coros, Andreas Krause et al.

ICLR 2025posterarXiv:2412.12098
18
citations
#1689

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Jiale Xu, Shenghua Gao, Ying Shan

ICCV 2025posterarXiv:2412.09573
18
citations
#1690

A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection

Junjun Pan, Yixin Liu, Xin Zheng et al.

AAAI 2025paperarXiv:2502.13308
18
citations
#1691

Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma, Haiteng Zhao, Junlei Zhang et al.

ICLR 2025posterarXiv:2410.17195
18
citations
#1692

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.

ICLR 2025posterarXiv:2411.16525
18
citations
#1693

A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning

Chen-Yu Liu, Chao-Han Huck Yang, Hsi-Sheng Goan et al.

ICLR 2025posterarXiv:2410.09846
17
citations
#1694

Graph Sparsification via Mixture of Graphs

Guibin Zhang, Xiangguo SUN, Yanwei Yue et al.

ICLR 2025posterarXiv:2405.14260
17
citations
#1695

MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences

Genta Winata, David Anugraha, Lucky Susanto et al.

ICLR 2025posterarXiv:2410.02381
17
citations
#1696

Learning 3D Persistent Embodied World Models

Siyuan Zhou, Yilun Du, Yuncong Yang et al.

NEURIPS 2025posterarXiv:2505.05495
17
citations
#1697

Large Images Are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting

Lingting Zhu, Guying Lin, Jinnan Chen et al.

AAAI 2025paperarXiv:2502.09039
17
citations
#1698

ViLLa: Video Reasoning Segmentation with Large Language Model

rongkun Zheng, Lu Qi, Xi Chen et al.

ICCV 2025posterarXiv:2407.14500
17
citations
#1699

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NEURIPS 2025posterarXiv:2505.21494
17
citations
#1700

Optimal Transport for Time Series Imputation

Hao Wang, zhengnan li, Haoxuan Li et al.

ICLR 2025oral
17
citations
#1701

No Preference Left Behind: Group Distributional Preference Optimization

Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.

ICLR 2025posterarXiv:2412.20299
17
citations
#1702

DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds

Youyu Chen, Junjun Jiang, Kui Jiang et al.

CVPR 2025highlightarXiv:2503.18402
17
citations
#1703

SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs

Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.

COLM 2025paper
17
citations
#1704

DarkBench: Benchmarking Dark Patterns in Large Language Models

Esben Kran, Hieu Minh Nguyen, Akash Kundu et al.

ICLR 2025posterarXiv:2503.10728
17
citations
#1705

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193
17
citations
#1706

VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model

Zuwei Long, Yunhang Shen, Chaoyou Fu et al.

NEURIPS 2025poster
17
citations
#1707

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

Xiangxiang Chu, Renda Li, Yong Wang

ICCV 2025posterarXiv:2503.06132
17
citations
#1708

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Enea Monzio Compagnoni, Tianlin Liu, Rustem Islamov et al.

ICLR 2025posterarXiv:2411.15958
17
citations
#1709

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.

CVPR 2025posterarXiv:2501.02464
17
citations
#1710

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.

CVPR 2025posterarXiv:2406.14235
17
citations
#1711

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert, Abhay Puri, Gabriel Huang et al.

COLM 2025paperarXiv:2504.14064
17
citations
#1712

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Junqi Ge, Ziyi Chen, Jintao Lin et al.

ICCV 2025posterarXiv:2412.09616
17
citations
#1713

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

Junha Hyung, Kinam Kim, Susung Hong et al.

CVPR 2025posterarXiv:2411.18664
17
citations
#1714

QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

Yaoyu Zhu, Di Huang, Hanqi Lyu et al.

NEURIPS 2025posterarXiv:2505.24183
17
citations
#1715

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč et al.

ICLR 2025posterarXiv:2402.10727
17
citations
#1716

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

Yue Yang, Shuibo Zhang, Kaipeng Zhang et al.

ICLR 2025posterarXiv:2410.08695
17
citations
#1717

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations

Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.

CVPR 2025posterarXiv:2412.04037
17
citations
#1718

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

Junxuan Wang, Xuyang Ge, Wentao Shu et al.

ICLR 2025posterarXiv:2410.06672
17
citations
#1719

Spiking Transformer with Spatial-Temporal Attention

Donghyun Lee, Yuhang Li, Youngeun Kim et al.

CVPR 2025posterarXiv:2409.19764
17
citations
#1720

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.

CVPR 2025highlightarXiv:2412.11077
17
citations
#1721

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

Yili Wang, Yixin Liu, Xu Shen et al.

ICLR 2025posterarXiv:2406.15523
17
citations
#1722

Learning Clustering-based Prototypes for Compositional Zero-Shot Learning

Hongyu Qu, Jianan Wei, Xiangbo Shu et al.

ICLR 2025posterarXiv:2502.06501
17
citations
#1723

PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection

Jianan Ye, Weiguang Zhao, Xi Yang et al.

CVPR 2025posterarXiv:2412.12617
17
citations
#1724

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Yuqing Wang, Zhijie Lin, Yao Teng et al.

ICCV 2025posterarXiv:2503.16430
17
citations
#1725

Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving

Yuhang Lu, Yichen Yao, Jiadong Tu et al.

AAAI 2025paperarXiv:2409.02914
17
citations
#1726

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Yue Liu, Shengfang Zhai, Mingzhe Du et al.

NEURIPS 2025posterarXiv:2505.11049
17
citations
#1727

Neighboring Autoregressive Modeling for Efficient Visual Generation

Yefei He, Yuanyu He, Shaoxuan He et al.

ICCV 2025posterarXiv:2503.10696
17
citations
#1728

Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data

Manuel Brenner, Elias Weber, Georgia Koppe et al.

ICLR 2025posterarXiv:2410.04814
17
citations
#1729

CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation

Gaojie Lin, Jianwen Jiang, Chao Liang et al.

ICLR 2025poster
17
citations
#1730

OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs

Jitai Hao, Yuke Zhu, Tian Wang et al.

ICLR 2025poster
17
citations
#1731

Generative Flows on Synthetic Pathway for Drug Design

Seonghwan Seo, Minsu Kim, Tony Shen et al.

ICLR 2025posterarXiv:2410.04542
17
citations
#1732

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models

Matvei Popov, Peter Robicheaux, Anish Madan et al.

NEURIPS 2025posterarXiv:2505.20612
17
citations
#1733

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Hanyang Zhao, Genta Winata, Anirban Das et al.

ICLR 2025posterarXiv:2410.04203
17
citations
#1734

Reasoning of Large Language Models over Knowledge Graphs with Super-Relations

Song Wang, Junhong Lin, Xiaojie Guo et al.

ICLR 2025posterarXiv:2503.22166
17
citations
#1735

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Jieming Bian, Lei Wang, Letian Zhang et al.

ICCV 2025posterarXiv:2411.14961
17
citations
#1736

TULIP: Token-length Upgraded CLIP

Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.

ICLR 2025posterarXiv:2410.10034
17
citations
#1737

A Probabilistic Perspective on Unlearning and Alignment for Large Language Models

Yan Scholten, Stephan Günnemann, Leo Schwinn

ICLR 2025posterarXiv:2410.03523
17
citations
#1738

VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales

ICLR 2025posterarXiv:2403.13164
17
citations
#1739

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang et al.

COLM 2025paperarXiv:2504.04915
17
citations
#1740

Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization

Jiaming Zhou, Ke Ye, Jiayi Liu et al.

NEURIPS 2025posterarXiv:2505.15660
17
citations
#1741

Mellow: a small audio language model for reasoning

Soham Deshmukh, Satvik Dixit, Rita Singh et al.

NEURIPS 2025posterarXiv:2503.08540
17
citations
#1742

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation

Hongyin Zhang, Pengxiang Ding, Shangke Lyu et al.

ICLR 2025posterarXiv:2502.09268
17
citations
#1743

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Yining Hong, Beide Liu, Maxine Wu et al.

ICLR 2025oralarXiv:2410.23277
17
citations
#1744

Accelerating neural network training: An analysis of the AlgoPerf competition

Priya Kasimbeg, Frank Schneider, Runa Eschenhagen et al.

ICLR 2025posterarXiv:2502.15015
17
citations
#1745

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

CVPR 2025highlight
17
citations
#1746

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Mingze Xu, Mingfei Gao, Shiyu Li et al.

COLM 2025paperarXiv:2503.18943
17
citations
#1747

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao, Xiang Zheng, Lin Luo et al.

ICLR 2025posterarXiv:2410.20971
17
citations
#1748

M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Yanshu Li, Yi Cao, Hongyang He et al.

COLM 2025paper
17
citations
#1749

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Ollie Liu, Deqing Fu, Dani Yogatama et al.

ICLR 2025posterarXiv:2402.02392
17
citations
#1750

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

Felipe Maia Polo, Seamus Somerstep, Leshem Choshen et al.

NEURIPS 2025posterarXiv:2412.06540
17
citations
#1751

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025posterarXiv:2503.18942
17
citations
#1752

UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.

CVPR 2025highlightarXiv:2501.13134
17
citations
#1753

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.

NEURIPS 2025oralarXiv:2505.18079
17
citations
#1754

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025posterarXiv:2311.01479
17
citations
#1755

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts

Hongcheng Gao, Tianyu Pang, Chao Du et al.

ICCV 2025posterarXiv:2410.12777
17
citations
#1756

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.

ICLR 2025posterarXiv:2410.18325
17
citations
#1757

Improving Reasoning Performance in Large Language Models via Representation Engineering

Bertram Højer, Oliver Jarvis, Stefan Heinrich

ICLR 2025posterarXiv:2504.19483
17
citations
#1758

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Xiaobiao Du, Yida Wang, Haiyang Sun et al.

ICCV 2025posterarXiv:2406.04875
17
citations
#1759

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Oussama Zekri, Nicolas Boulle

NEURIPS 2025posterarXiv:2502.01384
17
citations
#1760

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025posterarXiv:2503.23368
17
citations
#1761

Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene

Jiahao Wu, Rui Peng, Zhiyan Wang et al.

ICLR 2025poster
17
citations
#1762

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

Lehan Wang, Haonan Wang, Honglong Yang et al.

ICLR 2025posterarXiv:2410.18387
17
citations
#1763

Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

Wei Guo, Molei Tao, Yongxin Chen

ICLR 2025posterarXiv:2407.16936
17
citations
#1764

Controllable Context Sensitivity and the Knob Behind It

Julian Minder, Kevin Du, Niklas Stoehr et al.

ICLR 2025posterarXiv:2411.07404
17
citations
#1765

Design Principle Transfer in Neural Architecture Search via Large Language Models

Xun Zhou, Xingyu Wu, Liang Feng et al.

AAAI 2025paperarXiv:2408.11330
17
citations
#1766

u-$\mu$P: The Unit-Scaled Maximal Update Parametrization

Charles Blake, Constantin Eichenberg, Josef Dean et al.

ICLR 2025poster
17
citations
#1767

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

Botao Ren, Xue Yang, Yi Yu et al.

ICLR 2025posterarXiv:2410.08210
17
citations
#1768

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.

ICLR 2025posterarXiv:2407.06189
17
citations
#1769

GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

Xiang Lan, Feng Wu, Kai He et al.

NEURIPS 2025posterarXiv:2503.06073
17
citations
#1770

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.

ICLR 2025posterarXiv:2405.15683
17
citations
#1771

GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments

Enjun Du, Xunkai Li, Tian Jin et al.

NEURIPS 2025spotlightarXiv:2504.00711
17
citations
#1772

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang et al.

CVPR 2025posterarXiv:2411.15411
17
citations
#1773

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Li, Nikolaos Tsagkas, Jifei Song et al.

ICCV 2025posterarXiv:2408.10123
17
citations
#1774

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025posterarXiv:2412.00833
17
citations
#1775

MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation

Donggon Jang, Yucheol Cho, Suin Lee et al.

ICLR 2025posterarXiv:2503.13881
17
citations
#1776

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

Yuan Wang, Ouxiang Li, Tingting Mu et al.

CVPR 2025posterarXiv:2412.06143
17
citations
#1777

Idiosyncrasies in Large Language Models

Mingjie Sun, Yida Yin, Zhiqiu (Oscar) Xu et al.

ICML 2025posterarXiv:2502.12150
17
citations
#1778

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Yiqun Mei, Mingming He, Li Ma et al.

CVPR 2025posterarXiv:2503.14485
17
citations
#1779

On the Guidance of Flow Matching

Ruiqi Feng, Chenglei Yu, Wenhao Deng et al.

ICML 2025spotlightarXiv:2502.02150
17
citations
#1780

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Zhuoman Liu, Weicai Ye, Yan Luximon et al.

CVPR 2025posterarXiv:2411.14423
17
citations
#1781

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

Jinglei Zhang, Jiankang Deng, Chao Ma et al.

CVPR 2025highlightarXiv:2501.02973
17
citations
#1782

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NEURIPS 2025posterarXiv:2506.01946
17
citations
#1783

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

Haitao Lin, Guojiang Zhao, Odin Zhang et al.

ICLR 2025posterarXiv:2406.10840
17
citations
#1784

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning

Guanxing Lu, Ziwei Wang, Changliu Liu et al.

ICLR 2025posterarXiv:2312.07062
17
citations
#1785

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

Haotian Sun, Tao Lei, Bowen Zhang et al.

ICLR 2025posterarXiv:2410.02098
17
citations
#1786

Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models

Dvir Samuel, Barak Meiri, Haggai Maron et al.

ICLR 2025posterarXiv:2312.12540
17
citations
#1787

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Tianqi Liu, Zihao Huang, Zhaoxi Chen et al.

ICCV 2025posterarXiv:2503.20785
17
citations
#1788

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon, Hyeonseo Cho, Doojin Baek et al.

ICML 2025spotlightarXiv:2502.07202
17
citations
#1789

Grokking at the Edge of Numerical Stability

Lucas Prieto, Melih Barsbey, Pedro Mediano et al.

ICLR 2025posterarXiv:2501.04697
17
citations
#1790

Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

Zhiwei Li, Guodong Long, Tianyi Zhou et al.

AAAI 2025paperarXiv:2408.08931
17
citations
#1791

VeriThinker: Learning to Verify Makes Reasoning Model Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

NEURIPS 2025posterarXiv:2505.17941
16
citations
#1792

Model Equality Testing: Which Model is this API Serving?

Irena Gao, Percy Liang, Carlos Guestrin

ICLR 2025posterarXiv:2410.20247
16
citations
#1793

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Zijian Chen, tingzhu chen, Wenjun Zhang et al.

ICLR 2025posterarXiv:2412.01175
16
citations
#1794

xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories

Maurice Kraus, Felix Divo, Devendra Singh Dhami et al.

NEURIPS 2025oralarXiv:2410.16928
16
citations
#1795

Reinforce LLM Reasoning through Multi-Agent Reflection

Yurun Yuan, Tengyang Xie

ICML 2025posterarXiv:2506.08379
16
citations
#1796

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou et al.

CVPR 2025highlightarXiv:2410.16770
16
citations
#1797

MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang, Shuming Hu, Shiyu Zhao et al.

CVPR 2025highlightarXiv:2501.00192
16
citations
#1798

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Xiaoyuan Liu, Tian Liang, Zhiwei He et al.

NEURIPS 2025posterarXiv:2505.13445
16
citations
#1799

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Zhe Shan, Yang Liu, Lei Zhou et al.

CVPR 2025posterarXiv:2503.12006
16
citations
#1800

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.

CVPR 2025posterarXiv:2412.00556
16
citations