Most Cited 2025 "masked autoencoder paradigm" Papers

22,274 papers found • Page 16 of 112

#3001

Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

AAAI 2025paperarXiv:2404.09468
16
citations
#3002

The Jailbreak Tax: How Useful are Your Jailbreak Outputs?

Kristina Nikolić, Luze Sun, Jie Zhang et al.

ICML 2025spotlightarXiv:2504.10694
16
citations
#3003

SIGMA: Selective Gated Mamba for Sequential Recommendation

Ziwei Liu, Qidong Liu, Yejing Wang et al.

AAAI 2025paperarXiv:2408.11451
16
citations
#3004

Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach

Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.

AAAI 2025paperarXiv:2404.03702
16
citations
#3005

Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization

Guanghan Li, Xun Zhang, Yufei Zhang et al.

AAAI 2025paperarXiv:2412.13771
16
citations
#3006

How to Merge Your Multimodal Models Over Time?

Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth et al.

CVPR 2025arXiv:2412.06712
16
citations
#3007

Scaling Vision Pre-Training to 4K Resolution

Baifeng Shi, Boyi Li, Han Cai et al.

CVPR 2025highlightarXiv:2503.19903
16
citations
#3008

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.

ICLR 2025arXiv:2503.09543
16
citations
#3009

OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition

Stephen Zhang, Vardan Papyan

ICLR 2025arXiv:2409.13652
16
citations
#3010

COME: Test-time Adaption by Conservatively Minimizing Entropy

Qingyang Zhang, Yatao Bian, Xinke Kong et al.

ICLR 2025arXiv:2410.10894
16
citations
#3011

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

Zijian Gu, Jianwei Ma, Yan Huang et al.

AAAI 2025paperarXiv:2412.11489
16
citations
#3012

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Minsoo Kim, Kyuhong Shim, Jungwook Choi et al.

NEURIPS 2025oralarXiv:2506.15745
16
citations
#3013

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

Zhanwei Zhang, Shizhao Sun, Wenxiao Wang et al.

ICLR 2025arXiv:2411.05823
16
citations
#3014

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

Wei Suo, Lijun Zhang, Mengyang Sun et al.

CVPR 2025highlightarXiv:2503.00361
16
citations
#3015

ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY

Chenrui Tie, Yue Chen, Ruihai Wu et al.

ICLR 2025arXiv:2411.03990
16
citations
#3016

ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing

Huadai Liu, Kaicheng Luo, Jialei Wang et al.

NEURIPS 2025oral
16
citations
#3017

Degradation-Aware Feature Perturbation for All-in-One Image Restoration

Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.

CVPR 2025arXiv:2505.12630
16
citations
#3018

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025arXiv:2508.02095
16
citations
#3019

DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints

Xia Jiang, Yaoxin Wu, Chenhao Zhang et al.

ICLR 2025
16
citations
#3020

MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning

Hai-Long Sun, Da-Wei Zhou, Hanbin Zhao et al.

AAAI 2025paperarXiv:2412.09441
16
citations
#3021

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.

CVPR 2025arXiv:2501.05031
16
citations
#3022

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu et al.

CVPR 2025highlightarXiv:2408.15270
16
citations
#3023

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Changdae Oh, Yixuan Li, Kyungwoo Song et al.

ICLR 2025arXiv:2410.03782
16
citations
#3024

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling

Pinxin Liu, Luchuan Song, Junhua Huang et al.

ICCV 2025arXiv:2501.18898
16
citations
#3025

Neural Encoding and Decoding at Scale

Yizi Zhang, Yanchen Wang, Mehdi Azabou et al.

ICML 2025oralarXiv:2504.08201
16
citations
#3026

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Yongsheng Yu, Ziyun Zeng, Haitian Zheng et al.

ICCV 2025arXiv:2503.08677
16
citations
#3027

Revisiting MAE Pre-training for 3D Medical Image Segmentation

Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.

CVPR 2025highlightarXiv:2410.23132
16
citations
#3028

LLM Unlearning via Neural Activation Redirection

William Shen, Xinchi Qiu, Meghdad Kurmanji et al.

NEURIPS 2025arXiv:2502.07218
16
citations
#3029

LLMs Can Plan Only If We Tell Them

Bilgehan Sel, Ruoxi Jia, Ming Jin

ICLR 2025arXiv:2501.13545
16
citations
#3030

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

Wei Deng, Mengshi Qi, Huadong Ma

CVPR 2025arXiv:2503.18476
16
citations
#3031

Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning

Linjiajie Fang, Ruoxue Liu, Jing Zhang et al.

ICLR 2025arXiv:2405.20555
16
citations
#3032

Metadata Conditioning Accelerates Language Model Pre-training

Tianyu Gao, Alexander Wettig, Luxi He et al.

ICML 2025arXiv:2501.01956
16
citations
#3033

The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling

Ruochen Zhang, Qinan Yu, Matianyu Zang et al.

ICLR 2025arXiv:2410.09223
16
citations
#3034

Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs

Sagnik Mukherjee, Abhinav Chinta, Takyoung Kim et al.

ICML 2025arXiv:2502.02362
16
citations
#3035

SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers

Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.

COLM 2025paperarXiv:2504.00255
16
citations
#3036

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization

Yougang Lyu, Lingyong Yan, Zihan Wang et al.

ICLR 2025oralarXiv:2410.07672
16
citations
#3037

The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Dulhan Jayalath, Gilad Landau, Brendan Shillingford et al.

ICML 2025arXiv:2406.04328
16
citations
#3038

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

Jun Zhou, Jiahao Li, Zunnan Xu et al.

CVPR 2025arXiv:2503.19839
16
citations
#3039

Law of Vision Representation in MLLMs

Shijia Yang, Bohan Zhai, Quanzeng You et al.

COLM 2025paperarXiv:2408.16357
16
citations
#3040

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Zijian Chen, tingzhu chen, Wenjun Zhang et al.

ICLR 2025arXiv:2412.01175
16
citations
#3041

Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes

Isabella Liu, Hao Su, Xiaolong Wang

ICLR 2025oralarXiv:2404.12379
16
citations
#3042

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Lijie Yang, Zhihao Zhang, Zhuofu Chen et al.

ICLR 2025arXiv:2410.05076
16
citations
#3043

Logically Consistent Language Models via Neuro-Symbolic Integration

Diego Calanzone, Stefano Teso, Antonio Vergari

ICLR 2025arXiv:2409.13724
16
citations
#3044

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NEURIPS 2025oralarXiv:2509.21100
16
citations
#3045

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.

COLM 2025paperarXiv:2504.01995
16
citations
#3046

LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Tianyi Zhang, Anshumali Shrivastava

ICLR 2025arXiv:2407.10032
16
citations
#3047

POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction

Songyan Zhang, Yongtao Ge, Jinyuan Tian et al.

ICCV 2025arXiv:2504.05692
16
citations
#3048

Track-On: Transformer-based Online Point Tracking with Memory

Görkay Aydemir, Xiongyi Cai, Weidi Xie et al.

ICLR 2025oralarXiv:2501.18487
16
citations
#3049

On Calibration of LLM-based Guard Models for Reliable Content Moderation

Hongfu Liu, Hengguan Huang, Xiangming Gu et al.

ICLR 2025arXiv:2410.10414
16
citations
#3050

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.

NEURIPS 2025arXiv:2506.08989
16
citations
#3051

CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs

Sijia Chen, Xiaomin Li, mengxue zhang et al.

NEURIPS 2025arXiv:2505.11413
16
citations
#3052

Let LRMs Break Free from Overthinking via Self-Braking Tuning

Haoran Zhao, Yuchen Yan, Yongliang Shen et al.

NEURIPS 2025arXiv:2505.14604
16
citations
#3053

Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations

Ji-An Li, Huadong Xiong, Robert Wilson et al.

NEURIPS 2025arXiv:2505.13763
16
citations
#3054

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

Gaoxiang Cong, Jiadong Pan, Liang Li et al.

CVPR 2025highlightarXiv:2412.08988
16
citations
#3055

DynaSaur: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.

COLM 2025paperarXiv:2411.01747
16
citations
#3056

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis

Yuji Wang, Jingchen Ni, Yong Liu et al.

AAAI 2025paperarXiv:2503.00936
16
citations
#3057

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Mianchu Wang, Rui Yang, Xi Chen et al.

ICLR 2025arXiv:2310.20025
16
citations
#3058

An Empirical Analysis of Uncertainty in Large Language Model Evaluations

Qiujie Xie, Qingqiu Li, Zhuohao Yu et al.

ICLR 2025arXiv:2502.10709
16
citations
#3059

CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection

Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.

AAAI 2025paperarXiv:2501.00346
16
citations
#3060

Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining

Raghuveer Thirukovalluru, Rui Meng, Ye Liu et al.

NEURIPS 2025spotlightarXiv:2505.11293
16
citations
#3061

A Many-Objective Problem Where Crossover Is Provably Indispensable

Andre Opris

AAAI 2025paper
16
citations
#3062

Improved Bounds for Online Facility Location with Predictions

Dimitris Fotakis, Evangelia Gergatsouli, Themistoklis Gouleakis et al.

AAAI 2025paperarXiv:2107.08277
16
citations
#3063

Speeding Up the NSGA-II with a Simple Tie-Breaking Rule

Benjamin Doerr, Tudor Ivan, Martin S. Krejca

AAAI 2025paperarXiv:2412.11931
16
citations
#3064

HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers

Yifeng Wang, Yi Zhao

AAAI 2025paperarXiv:2502.18064
16
citations
#3065

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Julie Kallini, Shikhar Murty, Christopher Manning et al.

ICLR 2025arXiv:2410.20771
16
citations
#3066

Black-Box Detection of Language Model Watermarks

Thibaud Gloaguen, Nikola Jovanović, Robin Staab et al.

ICLR 2025arXiv:2405.20777
16
citations
#3067

Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance

Marta Gentiloni Silveri, Antonio Ocello

ICML 2025arXiv:2501.02298
16
citations
#3068

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Yuheng Zhang, Dian Yu, Tao Ge et al.

NEURIPS 2025spotlightarXiv:2502.16852
16
citations
#3069

Where am I? Cross-View Geo-localization with Natural Language Descriptions

Junyan Ye, Honglin Lin, Leyan Ou et al.

ICCV 2025arXiv:2412.17007
16
citations
#3070

Task Vectors in In-Context Learning: Emergence, Formation, and Benefits

Liu Yang, Ziqian Lin, Kangwook Lee et al.

COLM 2025paperarXiv:2501.09240
16
citations
#3071

DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS

Rana Shahout, Eran Malach, Chunwei Liu et al.

ICLR 2025
15
citations
#3072

Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer

Haopeng Sun, Yingwei Zhang, Lumin Xu et al.

AAAI 2025paperarXiv:2412.10181
15
citations
#3073

Establishing Best Practices in Building Rigorous Agentic Benchmarks

Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun et al.

NEURIPS 2025arXiv:2507.02825
15
citations
#3074

MoE-LPR: Multilingual Extension of Large Language Models Through Mixture-of-Experts with Language Priors Routing

Hao Zhou, Zhijun Wang, Shujian Huang et al.

AAAI 2025paperarXiv:2408.11396
15
citations
#3075

Language Guided Skill Discovery

Seungeun Rho, Laura Smith, Tianyu Li et al.

ICLR 2025arXiv:2406.06615
15
citations
#3076

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

Yinghao Zhu, Ziyi He, Haoran Hu et al.

NEURIPS 2025arXiv:2505.12371
15
citations
#3077

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Ling Yang, Zhaochen Yu, Tianjun Zhang et al.

ICLR 2025arXiv:2410.09008
15
citations
#3078

Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

Mengjie Qin, Yuchao Feng, Zongliang Wu et al.

AAAI 2025paperarXiv:2501.01262
15
citations
#3079

Bridging the Data Provenance Gap Across Text, Speech, and Video

Shayne Longpre, Nikhil Singh, Manuel Cherep et al.

ICLR 2025arXiv:2412.17847
15
citations
#3080

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

Tian Liu, Huixin Zhang, Shubham Parashar et al.

CVPR 2025arXiv:2406.11148
15
citations
#3081

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Runhui Huang, Xinpeng Ding, Chunwei Wang et al.

CVPR 2025arXiv:2407.08706
15
citations
#3082

VisionArena: 230k Real World User-VLM Conversations with Preference Labels

Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.

CVPR 2025arXiv:2412.08687
15
citations
#3083

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.

AAAI 2025paperarXiv:2406.03686
15
citations
#3084

Toward Understanding In-context vs. In-weight Learning

Bryan Chan, Xinyi Chen, Andras Gyorgy et al.

ICLR 2025arXiv:2410.23042
15
citations
#3085

Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints

Sam Bowyer, Laurence Aitchison, Desi Ivanova

ICML 2025spotlightarXiv:2503.01747
15
citations
#3086

Scaling Properties of Diffusion Models For Perceptual Tasks

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.

CVPR 2025arXiv:2411.08034
15
citations
#3087

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

Jiaru Zou, Yikun Ban, Zihao Li et al.

NEURIPS 2025spotlightarXiv:2505.16270
15
citations
#3088

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Kevin Wang, Ishaan Javali, Michał Bortkiewicz et al.

NEURIPS 2025oralarXiv:2503.14858
15
citations
#3089

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Jiatao Gu, Tianrong Chen, David Berthelot et al.

NEURIPS 2025spotlightarXiv:2506.06276
15
citations
#3090

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Dingkang Yang, Dongling Xiao, Jinjie Wei et al.

AAAI 2025paperarXiv:2408.12325
15
citations
#3091

On Reasoning Strength Planning in Large Reasoning Models

Leheng Sheng, An Zhang, Zijian Wu et al.

NEURIPS 2025arXiv:2506.08390
15
citations
#3092

MVSAnywhere: Zero-Shot Multi-View Stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.

CVPR 2025arXiv:2503.22430
15
citations
#3093

AutoPartGen: Autoregressive 3D Part Generation and Discovery

Minghao Chen, Jianyuan Wang, Roman Shapovalov et al.

NEURIPS 2025
15
citations
#3094

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Lijun Sheng, Jian Liang, Zilei Wang et al.

CVPR 2025arXiv:2504.11195
15
citations
#3095

Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

Zhihai Wang, Zijie Geng, Zhaojie Tu et al.

NEURIPS 2025arXiv:2407.15026
15
citations
#3096

PILAF: Optimal Human Preference Sampling for Reward Modeling

Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng et al.

ICML 2025arXiv:2502.04270
15
citations
#3097

Generating Multi-Image Synthetic Data for Text-to-Image Customization

Nupur Kumari, Xi Yin, Jun-Yan Zhu et al.

ICCV 2025arXiv:2502.01720
15
citations
#3098

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

CVPR 2025arXiv:2503.17109
15
citations
#3099

AniDoc: Animation Creation Made Easier

Yihao Meng, Hao Ouyang, Hanlin Wang et al.

CVPR 2025arXiv:2412.14173
15
citations
#3100

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs

Zijia Zhao, Haoyu Lu, Yuqi Huo et al.

ICLR 2025oralarXiv:2406.09367
15
citations
#3101

Beyond Message Passing: Neural Graph Pattern Machine

Zehong Wang, Zheyuan Zhang, Tianyi MA et al.

ICML 2025arXiv:2501.18739
15
citations
#3102

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Zihao Wang, Bin CUI, Shaoduo Gan

ICLR 2025arXiv:2404.04793
15
citations
#3103

Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models

JINHAO LIANG, Jacob Christopher, Sven Koenig et al.

ICML 2025arXiv:2502.03607
15
citations
#3104

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Huanxuan Liao, Shizhu He, Yao Xu et al.

AAAI 2025paperarXiv:2409.13203
15
citations
#3105

Truncated Consistency Models

Sangyun Lee, Yilun Xu, Tomas Geffner et al.

ICLR 2025arXiv:2410.14895
15
citations
#3106

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

Hao LU, Tianshuo Xu, Wenzhao Zheng et al.

NEURIPS 2025arXiv:2412.09043
15
citations
#3107

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.

ICLR 2025oralarXiv:2405.13861
15
citations
#3108

FrameBridge: Improving Image-to-Video Generation with Bridge Models

Yuji Wang, Zehua Chen, Chen Xiaoyu et al.

ICML 2025arXiv:2410.15371
15
citations
#3109

Vision Transformers Don't Need Trained Registers

Nicholas Jiang, Amil Dravid, Alexei Efros et al.

NEURIPS 2025spotlightarXiv:2506.08010
15
citations
#3110

PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model

Yunlong Huang, Junshuo Liu, Ke Xian et al.

AAAI 2025paperarXiv:2408.03540
15
citations
#3111

BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis

David Svitov, Pietro Morerio, Lourdes Agapito et al.

ICCV 2025arXiv:2411.08508
15
citations
#3112

Citations and Trust in LLM Generated Responses

Yifan Ding, Matthew Facciani, Ellen Joyce et al.

AAAI 2025paperarXiv:2501.01303
15
citations
#3113

CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization

Junhao Xu, Yanan Zhang, Zhi Cai et al.

CVPR 2025arXiv:2503.03430
15
citations
#3114

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025arXiv:2409.01392
15
citations
#3115

LightGTS: A Lightweight General Time Series Forecasting Model

Yihang Wang, Yuying Qiu, Peng Chen et al.

ICML 2025arXiv:2506.06005
15
citations
#3116

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

Qirui Jiao, Daoyuan Chen, Yilun Huang et al.

CVPR 2025arXiv:2408.04594
15
citations
#3117

A Unified Theory of Quantum Neural Network Loss Landscapes

Eric Anschuetz

ICLR 2025arXiv:2408.11901
15
citations
#3118

GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation

Danny Wang, Ruihong Qiu, Guangdong Bai et al.

ICLR 2025arXiv:2502.05780
15
citations
#3119

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Cheol Jun Cho, Nicholas Lee, Akshat Gupta et al.

ICLR 2025arXiv:2410.07168
15
citations
#3120

Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis

Qunzhong WANG, Xiangguo Sun, Hong Cheng

ICML 2025arXiv:2410.01635
15
citations
#3121

TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees

Weibin Liao, Xu Chu, Yasha Wang

ICLR 2025arXiv:2410.12854
15
citations
#3122

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models

Hongyang Wei, Shuaizheng Liu, Chun Yuan et al.

ICCV 2025arXiv:2503.11073
15
citations
#3123

Personalized Preference Fine-tuning of Diffusion Models

Meihua Dang, Anikait Singh, Linqi Zhou et al.

CVPR 2025arXiv:2501.06655
15
citations
#3124

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Kaizheng Wang

NEURIPS 2025arXiv:2302.10160
15
citations
#3125

FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs

Mothilal Asokan, Kebin wu, Fatima Albreiki

CVPR 2025arXiv:2504.01916
15
citations
#3126

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025
15
citations
#3127

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

Yiren Song, Pei Yang, Hai Ci et al.

CVPR 2025arXiv:2412.11638
15
citations
#3128

Assessing and Learning Alignment of Unimodal Vision and Language Models

Le Zhang, Qian Yang, Aishwarya Agrawal

CVPR 2025highlightarXiv:2412.04616
15
citations
#3129

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation

Yifeng Xu, Zhenliang He, Shiguang Shan et al.

ICLR 2025arXiv:2410.09400
15
citations
#3130

ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling

Dongchao Yang, Songxiang Liu, Haohan Guo et al.

ICML 2025arXiv:2504.10344
15
citations
#3131

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee, Haebin Seong, Dong Bok Lee et al.

ICLR 2025arXiv:2410.01524
15
citations
#3132

X-Dyna: Expressive Dynamic Human Image Animation

Di Chang, Hongyi Xu, You Xie et al.

CVPR 2025highlightarXiv:2501.10021
15
citations
#3133

DRAWER: Digital Reconstruction and Articulation With Environment Realism

Hongchi Xia, Entong Su, Marius Memmel et al.

CVPR 2025arXiv:2504.15278
15
citations
#3134

LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph

Tu Ao, Yanhua Yu, Yuling Wang et al.

AAAI 2025paperarXiv:2504.03137
15
citations
#3135

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

Arth Shukla, Stone Tao, Hao Su

ICLR 2025arXiv:2412.13211
15
citations
#3136

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.

ICCV 2025arXiv:2503.07478
15
citations
#3137

V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

Jianqi Chen, Biao Zhang, Xiangjun Tang et al.

ICCV 2025arXiv:2503.09631
15
citations
#3138

I2VControl: Disentangled and Unified Video Motion Synthesis Control

Wanquan Feng, Tianhao Qi, Jiawei Liu et al.

ICCV 2025arXiv:2411.17765
15
citations
#3139

Retrieval Augmented Time Series Forecasting

Sungwon Han, Seungeon Lee, MEEYOUNG CHA et al.

ICML 2025oralarXiv:2505.04163
15
citations
#3140

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov et al.

AAAI 2025paperarXiv:2409.00134
15
citations
#3141

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Jiachen Yao, Abbas Mammadov, Julius Berner et al.

NEURIPS 2025arXiv:2505.17004
15
citations
#3142

Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging

Junkang Liu, Yuanyuan Liu, Fanhua Shang et al.

ICML 2025arXiv:2507.20016
15
citations
#3143

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently

Yuanhe Zhang, Fanghui Liu, Yudong Chen

ICML 2025oralarXiv:2502.01235
15
citations
#3144

xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Qingchen Yu, Zifan Zheng, Shichao Song et al.

ICLR 2025arXiv:2405.11874
15
citations
#3145

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Jinxiu Liu, Shaoheng Lin, Yinxiao Li et al.

CVPR 2025arXiv:2412.11100
15
citations
#3146

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Zekang Yang, Wang Zeng, Sheng Jin et al.

AAAI 2025paperarXiv:2402.15351
15
citations
#3147

Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens

Xixian Yong, Xiao Zhou, Yingying Zhang et al.

NEURIPS 2025spotlightarXiv:2505.18237
15
citations
#3148

Equivariant Neural Functional Networks for Transformers

Viet-Hoang Tran, Thieu Vo, An Nguyen et al.

ICLR 2025arXiv:2410.04209
15
citations
#3149

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

Xu Zheng, Farhad Shirani, Zhuomin Chen et al.

ICLR 2025arXiv:2410.02970
15
citations
#3150

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Yarden As, Bhavya, Lenart Treven et al.

ICLR 2025arXiv:2410.09486
15
citations
#3151

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Mahir Labib Dihan, Tanvir Hassan, Md Tanvir Parvez et al.

ICML 2025spotlightarXiv:2501.00316
15
citations
#3152

LOCATE 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Paul McVay, Sergio Arnaud, Ada Martin et al.

ICML 2025spotlightarXiv:2504.14151
15
citations
#3153

LongRoPE2: Near-Lossless LLM Context Window Scaling

Ning Shang, Li Lyna Zhang, Siyuan Wang et al.

ICML 2025arXiv:2502.20082
15
citations
#3154

Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming

Haoyang Liu, Jie Wang, Zijie Geng et al.

ICLR 2025arXiv:2503.01129
15
citations
#3155

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

Yuheng Shi, Minjing Dong, Chang Xu

ICCV 2025arXiv:2411.09219
15
citations
#3156

Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

Aliyah Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri et al.

ICLR 2025arXiv:2407.00886
15
citations
#3157

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025arXiv:2410.05346
15
citations
#3158

GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks

Sarp Aykent, Tian Xia

ICLR 2025
15
citations
#3159

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.

ICLR 2025arXiv:2505.11245
15
citations
#3160

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Bo Wang, Qinyuan Cheng, Runyu Peng et al.

NEURIPS 2025arXiv:2507.00018
15
citations
#3161

S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting

Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.

CVPR 2025arXiv:2503.04314
15
citations
#3162

Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective

Zeyu Gan, Yong Liu

ICLR 2025arXiv:2410.01720
15
citations
#3163

TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models

Xin Wang, Kai Chen, Jiaming Zhang et al.

CVPR 2025arXiv:2411.13136
15
citations
#3164

GenEx: Generating an Explorable World

TaiMing Lu, Tianmin Shu, Alan Yuille et al.

ICLR 2025arXiv:2412.09624
15
citations
#3165

AGENTIF: Benchmarking Large Language Models Instruction Following Ability in Agentic Scenarios

Yunjia Qi, Hao Peng, Xiaozhi Wang et al.

NEURIPS 2025spotlight
15
citations
#3166

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

Yi Chen, Jian Xu, Xu-Yao Zhang et al.

AAAI 2025paperarXiv:2409.01179
15
citations
#3167

UnZipLoRA: Separating Content and Style from a Single Image

Chang Liu, Viraj Shah, Aiyu Cui et al.

ICCV 2025highlightarXiv:2412.04465
15
citations
#3168

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

Jiyuan Shi, Xinzhe Liu, Dewei Wang et al.

NEURIPS 2025arXiv:2504.14305
15
citations
#3169

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

ICLR 2025arXiv:2405.14953
15
citations
#3170

Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models

Jean Park, Kuk Jin Jang, Basam Alasaly et al.

AAAI 2025paperarXiv:2408.12763
15
citations
#3171

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Tianjin Huang, Ziquan Zhu, Gaojie Jin et al.

ICLR 2025arXiv:2501.06842
15
citations
#3172

Efficient Track Anything

Yunyang Xiong, Chong Zhou, Xiaoyu Xiang et al.

ICCV 2025arXiv:2411.18933
15
citations
#3173

Ensembling Diffusion Models via Adaptive Feature Aggregation

Cong Wang, kuan tian, Yonghang Guan et al.

ICLR 2025arXiv:2405.17082
15
citations
#3174

WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models

Shengda Fan, Xin Cong, Yuepeng Fu et al.

ICLR 2025arXiv:2411.05451
15
citations
#3175

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.

CVPR 2025arXiv:2503.13399
15
citations
#3176

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Gaurav Sahu, Abhay Puri, Juan A. Rodriguez et al.

ICLR 2025arXiv:2407.06423
15
citations
#3177

Peri-LN: Revisiting Normalization Layer in the Transformer Architecture

Jeonghoon Kim, Byeongchan Lee, Cheonbok Park et al.

ICML 2025spotlightarXiv:2502.02732
15
citations
#3178

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Han Wang, Yuxiang Nie, Yongjie Ye et al.

ICCV 2025arXiv:2412.09530
15
citations
#3179

HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking

Runquan Gui, Zhihai Wang, Jie Wang et al.

ICML 2025arXiv:2505.02322
15
citations
#3180

Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning

Jian Lang, Zhangtao Cheng, Ting Zhong et al.

AAAI 2025paperarXiv:2501.01120
15
citations
#3181

Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability

Yingdong Shi, Changming Li, Yifan Wang et al.

CVPR 2025arXiv:2503.20483
15
citations
#3182

Docopilot: Improving Multimodal Models for Document-Level Understanding

Yuchen Duan, Zhe Chen, Yusong Hu et al.

CVPR 2025arXiv:2507.14675
15
citations
#3183

TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster

Kanghui Ning, Zijie Pan, Yu Liu et al.

NEURIPS 2025arXiv:2503.07649
15
citations
#3184

TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts

Yu-Hao Huang, Chang Xu, Yueying Wu et al.

AAAI 2025paperarXiv:2501.05403
15
citations
#3185

Probing Visual Language Priors in VLMs

Tiange Luo, Ang Cao, Gunhee Lee et al.

ICML 2025arXiv:2501.00569
15
citations
#3186

GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

Enguang Wang, Zhimao Peng, Zhengyuan Xie et al.

CVPR 2025arXiv:2403.09974
15
citations
#3187

FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

Yifan Wang, Peishan Yang, Zhen Xu et al.

CVPR 2025
15
citations
#3188

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Kang You, Tong Chen, Dandan Ding et al.

CVPR 2025arXiv:2503.12382
15
citations
#3189

Geolocation Representation from Large Language Models Are Generic Enhancers for Spatio-Temporal Learning

Junlin He, Tong Nie, Wei Ma

AAAI 2025paperarXiv:2408.12116
15
citations
#3190

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

Yan Gong, Yiren Song, Yicheng Li et al.

NEURIPS 2025arXiv:2506.02528
15
citations
#3191

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.

CVPR 2025arXiv:2309.03904
15
citations
#3192

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NEURIPS 2025arXiv:2505.01481
15
citations
#3193

EPIC: Efficient Position-Independent Caching for Serving Large Language Models

JUNHAO HU, Wenrui Huang, Weidong Wang et al.

ICML 2025arXiv:2410.15332
15
citations
#3194

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025arXiv:2506.03093
15
citations
#3195

Transformers Struggle to Learn to Search

Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.

ICLR 2025arXiv:2412.04703
15
citations
#3196

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj et al.

ICLR 2025arXiv:2410.01335
15
citations
#3197

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration

Jipeng Cen, Jiaxin Liu, Zhixu Li et al.

AAAI 2025paperarXiv:2406.13408
15
citations
#3198

MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation

Ning Li, Xiangmou Qu, Jiamu Zhou et al.

NEURIPS 2025oral
15
citations
#3199

Maximum Entropy Reinforcement Learning with Diffusion Policy

Xiaoyi Dong, Jian Cheng, Xi Zhang

ICML 2025arXiv:2502.11612
15
citations
#3200

Each Fake News Is Fake in Its Own Way: An Attribution Multi-Granularity Benchmark for Multimodal Fake News Detection

Hao Guo, Zihan Ma, Zhi Zeng et al.

AAAI 2025paperarXiv:2412.14686
15
citations