Most Cited 2025 "joint segmentation decoders" Papers

22,274 papers found • Page 9 of 112

#1601

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.

ICML 2025posterarXiv:2411.16375
18
citations
#1602

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

Issar Tzachor, Boaz Lerner, Matan Levy et al.

ICLR 2025posterarXiv:2405.18065
18
citations
#1603

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Junfeng Ni, Yu Liu, Ruijie Lu et al.

CVPR 2025posterarXiv:2503.14830
18
citations
#1604

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Miruna Cretu, Charles Harris, Ilia Igashov et al.

ICLR 2025posterarXiv:2405.01155
18
citations
#1605

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Yukang Cao, Liang Pan, Kai Han et al.

ICLR 2025posterarXiv:2410.07164
18
citations
#1606

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi et al.

CVPR 2025highlightarXiv:2412.04458
18
citations
#1607

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

Rui Ye, Jingyi Chai, Xiangrui Liu et al.

ICLR 2025posterarXiv:2406.10630
18
citations
#1608

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NEURIPS 2025posterarXiv:2505.16394
18
citations
#1609

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Simran Kaur, Simon Park, Anirudh Goyal et al.

ICLR 2025posterarXiv:2408.14774
18
citations
#1610

SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

Wujiang Xu, Qitian Wu, Zujie Liang et al.

ICLR 2025oralarXiv:2405.17890
18
citations
#1611

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025posterarXiv:2503.00467
18
citations
#1612

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025posterarXiv:2412.07776
18
citations
#1613

VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs

Qiucheng Wu, Handong Zhao, Michael Saxon et al.

ICCV 2025poster
18
citations
#1614

MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo

Zhenlong Yuan, Cong Liu, Fei Shen et al.

AAAI 2025paperarXiv:2407.19323
18
citations
#1615

Generalization through variance: how noise shapes inductive biases in diffusion models

John Vastola

ICLR 2025posterarXiv:2504.12532
18
citations
#1616

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Eric Xue, Yijiang Li, Haoyang Liu et al.

AAAI 2025paperarXiv:2403.10045
18
citations
#1617

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Yihong Luo, Xiaolong Chen, Xinghua Qu et al.

ICLR 2025posterarXiv:2403.12931
18
citations
#1618

Diversity-Aware Policy Optimization for Large Language Model Reasoning

Jian Yao, Ran Cheng, Xingyu Wu et al.

NEURIPS 2025spotlightarXiv:2505.23433
18
citations
#1619

Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models

Hulingxiao He, Geng Li, Zijun Geng et al.

ICLR 2025posterarXiv:2501.15140
18
citations
#1620

Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training

Zhenxin Li, Shihao Wang, Shiyi Lan et al.

ICCV 2025posterarXiv:2503.12030
18
citations
#1621

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paperarXiv:2409.12181
18
citations
#1622

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubic, Federico Soldà, Aurelio Sulser et al.

ICLR 2025posterarXiv:2405.16674
18
citations
#1623

Commit0: Library Generation from Scratch

Wenting Zhao, Nan Jiang, Celine Lee et al.

ICLR 2025posterarXiv:2412.01769
18
citations
#1624

Ref-GS: Directional Factorization for 2D Gaussian Splatting

Youjia Zhang, Anpei Chen, Yumin Wan et al.

CVPR 2025posterarXiv:2412.00905
18
citations
#1625

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment

YOUHE JIANG, Ran Yan, Binhang Yuan

ICLR 2025posterarXiv:2502.07903
18
citations
#1626

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang, Baoxiong Jia, Yan Wang et al.

CVPR 2025posterarXiv:2503.22420
18
citations
#1627

Perm: A Parametric Representation for Multi-Style 3D Hair Modeling

Chengan He, Xin Sun, Zhixin Shu et al.

ICLR 2025posterarXiv:2407.19451
18
citations
#1628

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
18
citations
#1629

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025posterarXiv:2503.01610
18
citations
#1630

MoonCast: High-Quality Zero-Shot Podcast Generation

Zeqian Ju, Dongchao Yang, Shen Kai et al.

NEURIPS 2025oralarXiv:2503.14345
18
citations
#1631

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.

ICML 2025oralarXiv:2503.07067
18
citations
#1632

Block Verification Accelerates Speculative Decoding

Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.

ICLR 2025posterarXiv:2403.10444
18
citations
#1633

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025posterarXiv:2411.17673
18
citations
#1634

Delta Decompression for MoE-based LLMs Compression

Hao Gu, Wei Li, Lujun Li et al.

ICML 2025posterarXiv:2502.17298
18
citations
#1635

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025posterarXiv:2412.08614
18
citations
#1636

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

NEURIPS 2025poster
18
citations
#1637

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Li Hao, He CAO, Bin Feng et al.

NEURIPS 2025posterarXiv:2505.21318
18
citations
#1638

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025posterarXiv:2504.14666
18
citations
#1639

Palu: KV-Cache Compression with Low-Rank Projection

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin et al.

ICLR 2025poster
18
citations
#1640

What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph

Yutao Jiang, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2501.02268
18
citations
#1641

BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics

Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.

ICLR 2025posterarXiv:2403.10380
18
citations
#1642

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NEURIPS 2025posterarXiv:2210.14051
18
citations
#1643

Scaling Optimal LR Across Token Horizons

Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.

ICLR 2025posterarXiv:2409.19913
18
citations
#1644

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.

ICLR 2025posterarXiv:2410.02479
18
citations
#1645

CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.

ICLR 2025posterarXiv:2405.01033
18
citations
#1646

Controlling Language and Diffusion Models by Transporting Activations

Pau Rodriguez, Arno Blaas, Michal Klein et al.

ICLR 2025posterarXiv:2410.23054
18
citations
#1647

FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency

Han Huang, Yulun Wu, Chao Deng et al.

AAAI 2025paperarXiv:2501.04628
18
citations
#1648

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

ZeMing Gong, Austin Wang, Xiaoliang Huo et al.

ICLR 2025posterarXiv:2405.17537
18
citations
#1649

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.

CVPR 2025posterarXiv:2503.24026
18
citations
#1650

Benchmarking Predictive Coding Networks -- Made Simple

Luca Pinchetti, Chang Qi, Oleh Lokshyn et al.

ICLR 2025posterarXiv:2407.01163
18
citations
#1651

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025posterarXiv:2505.22094
18
citations
#1652

VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Chaoya Jiang, Yongrui Heng, Wei Ye et al.

NEURIPS 2025poster
18
citations
#1653

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Stefan Lionar, Jiabin Liang, Gim Hee Lee

CVPR 2025posterarXiv:2503.11629
18
citations
#1654

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025highlightarXiv:2412.04462
18
citations
#1655

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Kele Shao, Keda TAO, Can Qin et al.

NEURIPS 2025oralarXiv:2505.21334
18
citations
#1656

Encryption-Friendly LLM Architecture

Donghwan Rho, Taeseong Kim, Minje Park et al.

ICLR 2025posterarXiv:2410.02486
18
citations
#1657

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025posterarXiv:2503.21219
18
citations
#1658

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

Yi Chen, Yuying Ge, Weiliang Tang et al.

ICCV 2025posterarXiv:2412.04445
18
citations
#1659

Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators

Wenhan Gao, Ruichen Xu, Yuefan Deng et al.

ICLR 2025poster
18
citations
#1660

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Zeyu Yang, Zijie Pan, Chun Gu et al.

ICLR 2025oralarXiv:2404.02148
18
citations
#1661

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang et al.

AAAI 2025paperarXiv:2408.14909
18
citations
#1662

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.

NEURIPS 2025spotlightarXiv:2503.04412
18
citations
#1663

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

Xiao Wang, Yu Jin, Wentao Wu et al.

CVPR 2025posterarXiv:2412.06647
18
citations
#1664

Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems

Fu Luo, Xi Lin, Yaoxin Wu et al.

ICLR 2025poster
18
citations
#1665

MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

Junjie Li, Yang Liu, Weiqing Liu et al.

ICLR 2025posterarXiv:2409.07486
18
citations
#1666

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention

XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.

ICLR 2025posterarXiv:2507.23143
18
citations
#1667

Gradient-Free Generation for Hard-Constrained Systems

Chaoran Cheng, Boran Han, Danielle Maddix et al.

ICLR 2025posterarXiv:2412.01786
18
citations
#1668

Structure-Adaptive Multi-View Graph Clustering for Remote Sensing Data

Renxiang Guan, Wenxuan Tu, Siwei Wang et al.

AAAI 2025paper
18
citations
#1669

TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation

Yanyong Huang, Minghui Lu, Wei Huang et al.

AAAI 2025paper
18
citations
#1670

Hash3D: Training-free Acceleration for 3D Generation

Xingyi Yang, Songhua Liu, Xinchao Wang

CVPR 2025posterarXiv:2404.06091
18
citations
#1671

CAD-Recode: Reverse Engineering CAD Code from Point Clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.

ICCV 2025posterarXiv:2412.14042
18
citations
#1672

Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems

jindong tian, Yuxuan Liang, Ronghui Xu et al.

ICLR 2025oralarXiv:2410.19892
18
citations
#1673

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.

NEURIPS 2025oralarXiv:2503.13139
18
citations
#1674

The Illusion of Empathy: How AI Chatbots Shape Conversation Perception

Tingting Liu, Salvatore Giorgi, Ankit Aich et al.

AAAI 2025paperarXiv:2411.12877
18
citations
#1675

Learning to Discretize Denoising Diffusion ODEs

Vinh Tong, Trung-Dung Hoang, Anji Liu et al.

ICLR 2025posterarXiv:2405.15506
18
citations
#1676

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.

CVPR 2025posterarXiv:2412.01615
18
citations
#1677

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Seth Aycock, David Stap, Di Wu et al.

ICLR 2025posterarXiv:2409.19151
18
citations
#1678

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025posterarXiv:2503.20823
18
citations
#1679

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Bhavya, Stelian Coros, Andreas Krause et al.

ICLR 2025posterarXiv:2412.12098
18
citations
#1680

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Jiale Xu, Shenghua Gao, Ying Shan

ICCV 2025posterarXiv:2412.09573
18
citations
#1681

A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection

Junjun Pan, Yixin Liu, Xin Zheng et al.

AAAI 2025paperarXiv:2502.13308
18
citations
#1682

Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma, Haiteng Zhao, Junlei Zhang et al.

ICLR 2025posterarXiv:2410.17195
18
citations
#1683

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.

ICLR 2025posterarXiv:2411.16525
18
citations
#1684

A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning

Chen-Yu Liu, Chao-Han Huck Yang, Hsi-Sheng Goan et al.

ICLR 2025posterarXiv:2410.09846
17
citations
#1685

Graph Sparsification via Mixture of Graphs

Guibin Zhang, Xiangguo SUN, Yanwei Yue et al.

ICLR 2025posterarXiv:2405.14260
17
citations
#1686

MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences

Genta Winata, David Anugraha, Lucky Susanto et al.

ICLR 2025posterarXiv:2410.02381
17
citations
#1687

Learning 3D Persistent Embodied World Models

Siyuan Zhou, Yilun Du, Yuncong Yang et al.

NEURIPS 2025posterarXiv:2505.05495
17
citations
#1688

Large Images Are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting

Lingting Zhu, Guying Lin, Jinnan Chen et al.

AAAI 2025paperarXiv:2502.09039
17
citations
#1689

ViLLa: Video Reasoning Segmentation with Large Language Model

rongkun Zheng, Lu Qi, Xi Chen et al.

ICCV 2025posterarXiv:2407.14500
17
citations
#1690

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NEURIPS 2025posterarXiv:2505.21494
17
citations
#1691

Optimal Transport for Time Series Imputation

Hao Wang, zhengnan li, Haoxuan Li et al.

ICLR 2025oral
17
citations
#1692

No Preference Left Behind: Group Distributional Preference Optimization

Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.

ICLR 2025posterarXiv:2412.20299
17
citations
#1693

DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds

Youyu Chen, Junjun Jiang, Kui Jiang et al.

CVPR 2025highlightarXiv:2503.18402
17
citations
#1694

DarkBench: Benchmarking Dark Patterns in Large Language Models

Esben Kran, Hieu Minh Nguyen, Akash Kundu et al.

ICLR 2025posterarXiv:2503.10728
17
citations
#1695

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193
17
citations
#1696

VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model

Zuwei Long, Yunhang Shen, Chaoyou Fu et al.

NEURIPS 2025poster
17
citations
#1697

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

Xiangxiang Chu, Renda Li, Yong Wang

ICCV 2025posterarXiv:2503.06132
17
citations
#1698

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Enea Monzio Compagnoni, Tianlin Liu, Rustem Islamov et al.

ICLR 2025posterarXiv:2411.15958
17
citations
#1699

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.

CVPR 2025posterarXiv:2501.02464
17
citations
#1700

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.

CVPR 2025posterarXiv:2406.14235
17
citations
#1701

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert, Abhay Puri, Gabriel Huang et al.

COLM 2025paperarXiv:2504.14064
17
citations
#1702

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Junqi Ge, Ziyi Chen, Jintao Lin et al.

ICCV 2025posterarXiv:2412.09616
17
citations
#1703

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

Junha Hyung, Kinam Kim, Susung Hong et al.

CVPR 2025posterarXiv:2411.18664
17
citations
#1704

QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

Yaoyu Zhu, Di Huang, Hanqi Lyu et al.

NEURIPS 2025posterarXiv:2505.24183
17
citations
#1705

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč et al.

ICLR 2025posterarXiv:2402.10727
17
citations
#1706

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

Yue Yang, Shuibo Zhang, Kaipeng Zhang et al.

ICLR 2025posterarXiv:2410.08695
17
citations
#1707

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations

Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.

CVPR 2025posterarXiv:2412.04037
17
citations
#1708

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

Junxuan Wang, Xuyang Ge, Wentao Shu et al.

ICLR 2025posterarXiv:2410.06672
17
citations
#1709

Spiking Transformer with Spatial-Temporal Attention

Donghyun Lee, Yuhang Li, Youngeun Kim et al.

CVPR 2025posterarXiv:2409.19764
17
citations
#1710

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.

CVPR 2025highlightarXiv:2412.11077
17
citations
#1711

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

Yili Wang, Yixin Liu, Xu Shen et al.

ICLR 2025posterarXiv:2406.15523
17
citations
#1712

Learning Clustering-based Prototypes for Compositional Zero-Shot Learning

Hongyu Qu, Jianan Wei, Xiangbo Shu et al.

ICLR 2025posterarXiv:2502.06501
17
citations
#1713

PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection

Jianan Ye, Weiguang Zhao, Xi Yang et al.

CVPR 2025posterarXiv:2412.12617
17
citations
#1714

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Yuqing Wang, Zhijie Lin, Yao Teng et al.

ICCV 2025posterarXiv:2503.16430
17
citations
#1715

Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving

Yuhang Lu, Yichen Yao, Jiadong Tu et al.

AAAI 2025paperarXiv:2409.02914
17
citations
#1716

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Yue Liu, Shengfang Zhai, Mingzhe Du et al.

NEURIPS 2025posterarXiv:2505.11049
17
citations
#1717

Neighboring Autoregressive Modeling for Efficient Visual Generation

Yefei He, Yuanyu He, Shaoxuan He et al.

ICCV 2025posterarXiv:2503.10696
17
citations
#1718

Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data

Manuel Brenner, Elias Weber, Georgia Koppe et al.

ICLR 2025posterarXiv:2410.04814
17
citations
#1719

CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation

Gaojie Lin, Jianwen Jiang, Chao Liang et al.

ICLR 2025poster
17
citations
#1720

OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs

Jitai Hao, Yuke Zhu, Tian Wang et al.

ICLR 2025poster
17
citations
#1721

Generative Flows on Synthetic Pathway for Drug Design

Seonghwan Seo, Minsu Kim, Tony Shen et al.

ICLR 2025posterarXiv:2410.04542
17
citations
#1722

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models

Matvei Popov, Peter Robicheaux, Anish Madan et al.

NEURIPS 2025posterarXiv:2505.20612
17
citations
#1723

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Hanyang Zhao, Genta Winata, Anirban Das et al.

ICLR 2025posterarXiv:2410.04203
17
citations
#1724

Reasoning of Large Language Models over Knowledge Graphs with Super-Relations

Song Wang, Junhong Lin, Xiaojie Guo et al.

ICLR 2025posterarXiv:2503.22166
17
citations
#1725

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Jieming Bian, Lei Wang, Letian Zhang et al.

ICCV 2025posterarXiv:2411.14961
17
citations
#1726

TULIP: Token-length Upgraded CLIP

Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.

ICLR 2025posterarXiv:2410.10034
17
citations
#1727

A Probabilistic Perspective on Unlearning and Alignment for Large Language Models

Yan Scholten, Stephan Günnemann, Leo Schwinn

ICLR 2025posterarXiv:2410.03523
17
citations
#1728

VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales

ICLR 2025posterarXiv:2403.13164
17
citations
#1729

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang et al.

COLM 2025paperarXiv:2504.04915
17
citations
#1730

Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization

Jiaming Zhou, Ke Ye, Jiayi Liu et al.

NEURIPS 2025posterarXiv:2505.15660
17
citations
#1731

Mellow: a small audio language model for reasoning

Soham Deshmukh, Satvik Dixit, Rita Singh et al.

NEURIPS 2025posterarXiv:2503.08540
17
citations
#1732

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation

Hongyin Zhang, Pengxiang Ding, Shangke Lyu et al.

ICLR 2025posterarXiv:2502.09268
17
citations
#1733

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Yining Hong, Beide Liu, Maxine Wu et al.

ICLR 2025oralarXiv:2410.23277
17
citations
#1734

Accelerating neural network training: An analysis of the AlgoPerf competition

Priya Kasimbeg, Frank Schneider, Runa Eschenhagen et al.

ICLR 2025posterarXiv:2502.15015
17
citations
#1735

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

CVPR 2025highlight
17
citations
#1736

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Mingze Xu, Mingfei Gao, Shiyu Li et al.

COLM 2025paperarXiv:2503.18943
17
citations
#1737

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao, Xiang Zheng, Lin Luo et al.

ICLR 2025posterarXiv:2410.20971
17
citations
#1738

M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Yanshu Li, Yi Cao, Hongyang He et al.

COLM 2025paper
17
citations
#1739

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Ollie Liu, Deqing Fu, Dani Yogatama et al.

ICLR 2025posterarXiv:2402.02392
17
citations
#1740

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

Felipe Maia Polo, Seamus Somerstep, Leshem Choshen et al.

NEURIPS 2025posterarXiv:2412.06540
17
citations
#1741

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025posterarXiv:2503.18942
17
citations
#1742

UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.

CVPR 2025highlightarXiv:2501.13134
17
citations
#1743

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.

NEURIPS 2025oralarXiv:2505.18079
17
citations
#1744

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025posterarXiv:2311.01479
17
citations
#1745

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts

Hongcheng Gao, Tianyu Pang, Chao Du et al.

ICCV 2025posterarXiv:2410.12777
17
citations
#1746

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.

ICLR 2025posterarXiv:2410.18325
17
citations
#1747

Improving Reasoning Performance in Large Language Models via Representation Engineering

Bertram Højer, Oliver Jarvis, Stefan Heinrich

ICLR 2025posterarXiv:2504.19483
17
citations
#1748

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Xiaobiao Du, Yida Wang, Haiyang Sun et al.

ICCV 2025posterarXiv:2406.04875
17
citations
#1749

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Oussama Zekri, Nicolas Boulle

NEURIPS 2025posterarXiv:2502.01384
17
citations
#1750

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025posterarXiv:2503.23368
17
citations
#1751

Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene

Jiahao Wu, Rui Peng, Zhiyan Wang et al.

ICLR 2025poster
17
citations
#1752

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

Lehan Wang, Haonan Wang, Honglong Yang et al.

ICLR 2025posterarXiv:2410.18387
17
citations
#1753

Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

Wei Guo, Molei Tao, Yongxin Chen

ICLR 2025posterarXiv:2407.16936
17
citations
#1754

Controllable Context Sensitivity and the Knob Behind It

Julian Minder, Kevin Du, Niklas Stoehr et al.

ICLR 2025posterarXiv:2411.07404
17
citations
#1755

Design Principle Transfer in Neural Architecture Search via Large Language Models

Xun Zhou, Xingyu Wu, Liang Feng et al.

AAAI 2025paperarXiv:2408.11330
17
citations
#1756

u-$\mu$P: The Unit-Scaled Maximal Update Parametrization

Charles Blake, Constantin Eichenberg, Josef Dean et al.

ICLR 2025poster
17
citations
#1757

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

Botao Ren, Xue Yang, Yi Yu et al.

ICLR 2025posterarXiv:2410.08210
17
citations
#1758

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.

ICLR 2025posterarXiv:2407.06189
17
citations
#1759

GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

Xiang Lan, Feng Wu, Kai He et al.

NEURIPS 2025posterarXiv:2503.06073
17
citations
#1760

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.

ICLR 2025posterarXiv:2405.15683
17
citations
#1761

GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments

Enjun Du, Xunkai Li, Tian Jin et al.

NEURIPS 2025spotlightarXiv:2504.00711
17
citations
#1762

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang et al.

CVPR 2025posterarXiv:2411.15411
17
citations
#1763

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Li, Nikolaos Tsagkas, Jifei Song et al.

ICCV 2025posterarXiv:2408.10123
17
citations
#1764

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025posterarXiv:2412.00833
17
citations
#1765

MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation

Donggon Jang, Yucheol Cho, Suin Lee et al.

ICLR 2025posterarXiv:2503.13881
17
citations
#1766

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

Yuan Wang, Ouxiang Li, Tingting Mu et al.

CVPR 2025posterarXiv:2412.06143
17
citations
#1767

Idiosyncrasies in Large Language Models

Mingjie Sun, Yida Yin, Zhiqiu (Oscar) Xu et al.

ICML 2025posterarXiv:2502.12150
17
citations
#1768

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Yiqun Mei, Mingming He, Li Ma et al.

CVPR 2025posterarXiv:2503.14485
17
citations
#1769

On the Guidance of Flow Matching

Ruiqi Feng, Chenglei Yu, Wenhao Deng et al.

ICML 2025spotlightarXiv:2502.02150
17
citations
#1770

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Zhuoman Liu, Weicai Ye, Yan Luximon et al.

CVPR 2025posterarXiv:2411.14423
17
citations
#1771

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

Jinglei Zhang, Jiankang Deng, Chao Ma et al.

CVPR 2025highlightarXiv:2501.02973
17
citations
#1772

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NEURIPS 2025posterarXiv:2506.01946
17
citations
#1773

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

Haitao Lin, Guojiang Zhao, Odin Zhang et al.

ICLR 2025posterarXiv:2406.10840
17
citations
#1774

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning

Guanxing Lu, Ziwei Wang, Changliu Liu et al.

ICLR 2025posterarXiv:2312.07062
17
citations
#1775

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

Haotian Sun, Tao Lei, Bowen Zhang et al.

ICLR 2025posterarXiv:2410.02098
17
citations
#1776

Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models

Dvir Samuel, Barak Meiri, Haggai Maron et al.

ICLR 2025posterarXiv:2312.12540
17
citations
#1777

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Tianqi Liu, Zihao Huang, Zhaoxi Chen et al.

ICCV 2025posterarXiv:2503.20785
17
citations
#1778

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon, Hyeonseo Cho, Doojin Baek et al.

ICML 2025spotlightarXiv:2502.07202
17
citations
#1779

Grokking at the Edge of Numerical Stability

Lucas Prieto, Melih Barsbey, Pedro Mediano et al.

ICLR 2025posterarXiv:2501.04697
17
citations
#1780

Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

Zhiwei Li, Guodong Long, Tianyi Zhou et al.

AAAI 2025paperarXiv:2408.08931
17
citations
#1781

VeriThinker: Learning to Verify Makes Reasoning Model Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

NEURIPS 2025posterarXiv:2505.17941
16
citations
#1782

Model Equality Testing: Which Model is this API Serving?

Irena Gao, Percy Liang, Carlos Guestrin

ICLR 2025posterarXiv:2410.20247
16
citations
#1783

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Zijian Chen, tingzhu chen, Wenjun Zhang et al.

ICLR 2025posterarXiv:2412.01175
16
citations
#1784

xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories

Maurice Kraus, Felix Divo, Devendra Singh Dhami et al.

NEURIPS 2025oralarXiv:2410.16928
16
citations
#1785

Reinforce LLM Reasoning through Multi-Agent Reflection

Yurun Yuan, Tengyang Xie

ICML 2025posterarXiv:2506.08379
16
citations
#1786

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou et al.

CVPR 2025highlightarXiv:2410.16770
16
citations
#1787

MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang, Shuming Hu, Shiyu Zhao et al.

CVPR 2025highlightarXiv:2501.00192
16
citations
#1788

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Xiaoyuan Liu, Tian Liang, Zhiwei He et al.

NEURIPS 2025posterarXiv:2505.13445
16
citations
#1789

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Zhe Shan, Yang Liu, Lei Zhou et al.

CVPR 2025posterarXiv:2503.12006
16
citations
#1790

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.

CVPR 2025posterarXiv:2412.00556
16
citations
#1791

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Shuai Tan, Biao Gong, Yutong Feng et al.

CVPR 2025posterarXiv:2412.03085
16
citations
#1792

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

Yuanhao Cai, He Zhang, Kai Zhang et al.

ICCV 2025posterarXiv:2411.14384
16
citations
#1793

Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects

Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.

CVPR 2025posterarXiv:2412.00518
16
citations
#1794

Does SGD really happen in tiny subspaces?

Minhak Song, Kwangjun Ahn, Chulhee Yun

ICLR 2025posterarXiv:2405.16002
16
citations
#1795

RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code

Dhruv Gautam, Spandan Garg, Jinu Jang et al.

ICLR 2025posterarXiv:2503.07832
16
citations
#1796

MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Akio Hayakawa, Masato Ishii, Takashi Shibuya et al.

ICLR 2025posterarXiv:2405.17842
16
citations
#1797

SensorLM: Learning the Language of Wearable Sensors

Yuwei Zhang, Kumar Ayush, Siyuan Qiao et al.

NEURIPS 2025posterarXiv:2506.09108
16
citations
#1798

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Heyang Zhao, Chenlu Ye, Quanquan Gu et al.

NEURIPS 2025posterarXiv:2411.04625
16
citations
#1799

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

Tudor Cebere, Aurélien Bellet, Nicolas Papernot

ICLR 2025posterarXiv:2405.14457
16
citations
#1800

ContextGNN: Beyond Two-Tower Recommendation Systems

Yiwen Yuan, Zecheng Zhang, Xinwei He et al.

ICLR 2025posterarXiv:2411.19513
16
citations