Most Cited 2025 "masked autoencoder paradigm" Papers

22,274 papers found • Page 16 of 112

Filters:Most Cited 2025 masked autoencoder paradigm Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#3001

Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

AAAI 2025paperarXiv:2404.09468

citations

#3002

The Jailbreak Tax: How Useful are Your Jailbreak Outputs?

Kristina Nikolić, Luze Sun, Jie Zhang et al.

ICML 2025spotlightarXiv:2504.10694

citations

#3003

SIGMA: Selective Gated Mamba for Sequential Recommendation

Ziwei Liu, Qidong Liu, Yejing Wang et al.

AAAI 2025paperarXiv:2408.11451

citations

#3004

Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach

Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.

AAAI 2025paperarXiv:2404.03702

citations

#3005

Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization

Guanghan Li, Xun Zhang, Yufei Zhang et al.

AAAI 2025paperarXiv:2412.13771

citations

#3006

How to Merge Your Multimodal Models Over Time?

Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth et al.

CVPR 2025arXiv:2412.06712

citations

#3007

Scaling Vision Pre-Training to 4K Resolution

Baifeng Shi, Boyi Li, Han Cai et al.

CVPR 2025highlightarXiv:2503.19903

citations

#3008

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.

ICLR 2025arXiv:2503.09543

citations

#3009

OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition

Stephen Zhang, Vardan Papyan

ICLR 2025arXiv:2409.13652

citations

#3010

COME: Test-time Adaption by Conservatively Minimizing Entropy

Qingyang Zhang, Yatao Bian, Xinke Kong et al.

ICLR 2025arXiv:2410.10894

citations

#3011

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

Zijian Gu, Jianwei Ma, Yan Huang et al.

AAAI 2025paperarXiv:2412.11489

citations

#3012

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Minsoo Kim, Kyuhong Shim, Jungwook Choi et al.

NEURIPS 2025oralarXiv:2506.15745

citations

#3013

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

Zhanwei Zhang, Shizhao Sun, Wenxiao Wang et al.

ICLR 2025arXiv:2411.05823

citations

#3014

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

Wei Suo, Lijun Zhang, Mengyang Sun et al.

CVPR 2025highlightarXiv:2503.00361

citations

#3015

ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY

Chenrui Tie, Yue Chen, Ruihai Wu et al.

ICLR 2025arXiv:2411.03990

citations

#3016

ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing

Huadai Liu, Kaicheng Luo, Jialei Wang et al.

NEURIPS 2025oral

citations

#3017

Degradation-Aware Feature Perturbation for All-in-One Image Restoration

Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.

CVPR 2025arXiv:2505.12630

citations

#3018

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025arXiv:2508.02095

citations

#3019

DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints

Xia Jiang, Yaoxin Wu, Chenhao Zhang et al.

ICLR 2025

citations

#3020

MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning

Hai-Long Sun, Da-Wei Zhou, Hanbin Zhao et al.

AAAI 2025paperarXiv:2412.09441

citations

#3021

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.

CVPR 2025arXiv:2501.05031

citations

#3022

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu et al.

CVPR 2025highlightarXiv:2408.15270

citations

#3023

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Changdae Oh, Yixuan Li, Kyungwoo Song et al.

ICLR 2025arXiv:2410.03782

citations

#3024

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling

Pinxin Liu, Luchuan Song, Junhua Huang et al.

ICCV 2025arXiv:2501.18898

citations

#3025

Neural Encoding and Decoding at Scale

Yizi Zhang, Yanchen Wang, Mehdi Azabou et al.

ICML 2025oralarXiv:2504.08201

citations

#3026

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Yongsheng Yu, Ziyun Zeng, Haitian Zheng et al.

ICCV 2025arXiv:2503.08677

citations

#3027

Revisiting MAE Pre-training for 3D Medical Image Segmentation

Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.

CVPR 2025highlightarXiv:2410.23132

citations

#3028

LLM Unlearning via Neural Activation Redirection

William Shen, Xinchi Qiu, Meghdad Kurmanji et al.

NEURIPS 2025arXiv:2502.07218

citations

#3029

LLMs Can Plan Only If We Tell Them

Bilgehan Sel, Ruoxi Jia, Ming Jin

ICLR 2025arXiv:2501.13545

citations

#3030

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

Wei Deng, Mengshi Qi, Huadong Ma

CVPR 2025arXiv:2503.18476

citations

#3031

Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning

Linjiajie Fang, Ruoxue Liu, Jing Zhang et al.

ICLR 2025arXiv:2405.20555

citations

#3032

Metadata Conditioning Accelerates Language Model Pre-training

Tianyu Gao, Alexander Wettig, Luxi He et al.

ICML 2025arXiv:2501.01956

citations

#3033

The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling

Ruochen Zhang, Qinan Yu, Matianyu Zang et al.

ICLR 2025arXiv:2410.09223

citations

#3034

Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs

Sagnik Mukherjee, Abhinav Chinta, Takyoung Kim et al.

ICML 2025arXiv:2502.02362

citations

#3035

SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers

Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.

COLM 2025paperarXiv:2504.00255

citations

#3036

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization

Yougang Lyu, Lingyong Yan, Zihan Wang et al.

ICLR 2025oralarXiv:2410.07672

citations

#3037

The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Dulhan Jayalath, Gilad Landau, Brendan Shillingford et al.

ICML 2025arXiv:2406.04328

citations

#3038

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

Jun Zhou, Jiahao Li, Zunnan Xu et al.

CVPR 2025arXiv:2503.19839

citations

#3039

Law of Vision Representation in MLLMs

Shijia Yang, Bohan Zhai, Quanzeng You et al.

COLM 2025paperarXiv:2408.16357

citations

#3040

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Zijian Chen, tingzhu chen, Wenjun Zhang et al.

ICLR 2025arXiv:2412.01175

citations

#3041

Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes

Isabella Liu, Hao Su, Xiaolong Wang

ICLR 2025oralarXiv:2404.12379

citations

#3042

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Lijie Yang, Zhihao Zhang, Zhuofu Chen et al.

ICLR 2025arXiv:2410.05076

citations

#3043

Logically Consistent Language Models via Neuro-Symbolic Integration

Diego Calanzone, Stefano Teso, Antonio Vergari

ICLR 2025arXiv:2409.13724

citations

#3044

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NEURIPS 2025oralarXiv:2509.21100

citations

#3045

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.

COLM 2025paperarXiv:2504.01995

citations

#3046

LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Tianyi Zhang, Anshumali Shrivastava

ICLR 2025arXiv:2407.10032

citations

#3047

POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction

Songyan Zhang, Yongtao Ge, Jinyuan Tian et al.

ICCV 2025arXiv:2504.05692

citations

#3048

Track-On: Transformer-based Online Point Tracking with Memory

Görkay Aydemir, Xiongyi Cai, Weidi Xie et al.

ICLR 2025oralarXiv:2501.18487

citations

#3049

On Calibration of LLM-based Guard Models for Reliable Content Moderation

Hongfu Liu, Hengguan Huang, Xiangming Gu et al.

ICLR 2025arXiv:2410.10414

citations

#3050

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.

NEURIPS 2025arXiv:2506.08989

citations

#3051

CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs

Sijia Chen, Xiaomin Li, mengxue zhang et al.

NEURIPS 2025arXiv:2505.11413

citations

#3052

Let LRMs Break Free from Overthinking via Self-Braking Tuning

Haoran Zhao, Yuchen Yan, Yongliang Shen et al.

NEURIPS 2025arXiv:2505.14604

citations

#3053

Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations

Ji-An Li, Huadong Xiong, Robert Wilson et al.

NEURIPS 2025arXiv:2505.13763

citations

#3054

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

Gaoxiang Cong, Jiadong Pan, Liang Li et al.

CVPR 2025highlightarXiv:2412.08988

citations

#3055

DynaSaur: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.

COLM 2025paperarXiv:2411.01747

citations

#3056

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis

Yuji Wang, Jingchen Ni, Yong Liu et al.

AAAI 2025paperarXiv:2503.00936

citations

#3057

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Mianchu Wang, Rui Yang, Xi Chen et al.

ICLR 2025arXiv:2310.20025

citations

#3058

An Empirical Analysis of Uncertainty in Large Language Model Evaluations

Qiujie Xie, Qingqiu Li, Zhuohao Yu et al.

ICLR 2025arXiv:2502.10709

citations

#3059

CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection

Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.

AAAI 2025paperarXiv:2501.00346

citations

#3060

Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining

Raghuveer Thirukovalluru, Rui Meng, Ye Liu et al.

NEURIPS 2025spotlightarXiv:2505.11293

citations

#3061

A Many-Objective Problem Where Crossover Is Provably Indispensable

Andre Opris

AAAI 2025paper

citations

#3062

Improved Bounds for Online Facility Location with Predictions

Dimitris Fotakis, Evangelia Gergatsouli, Themistoklis Gouleakis et al.

AAAI 2025paperarXiv:2107.08277

citations

#3063

Speeding Up the NSGA-II with a Simple Tie-Breaking Rule

Benjamin Doerr, Tudor Ivan, Martin S. Krejca

AAAI 2025paperarXiv:2412.11931

citations

#3064

HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers

Yifeng Wang, Yi Zhao

AAAI 2025paperarXiv:2502.18064

citations

#3065

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Julie Kallini, Shikhar Murty, Christopher Manning et al.

ICLR 2025arXiv:2410.20771

citations

#3066

Black-Box Detection of Language Model Watermarks

Thibaud Gloaguen, Nikola Jovanović, Robin Staab et al.

ICLR 2025arXiv:2405.20777

citations

#3067

Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance

Marta Gentiloni Silveri, Antonio Ocello

ICML 2025arXiv:2501.02298

citations

#3068

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Yuheng Zhang, Dian Yu, Tao Ge et al.

NEURIPS 2025spotlightarXiv:2502.16852

citations

#3069

Where am I? Cross-View Geo-localization with Natural Language Descriptions

Junyan Ye, Honglin Lin, Leyan Ou et al.

ICCV 2025arXiv:2412.17007

citations

#3070

Task Vectors in In-Context Learning: Emergence, Formation, and Benefits

Liu Yang, Ziqian Lin, Kangwook Lee et al.

COLM 2025paperarXiv:2501.09240

citations

#3071

DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS

Rana Shahout, Eran Malach, Chunwei Liu et al.

ICLR 2025

citations

#3072

Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer

Haopeng Sun, Yingwei Zhang, Lumin Xu et al.

AAAI 2025paperarXiv:2412.10181

citations

#3073

Establishing Best Practices in Building Rigorous Agentic Benchmarks

Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun et al.

NEURIPS 2025arXiv:2507.02825

citations

#3074

MoE-LPR: Multilingual Extension of Large Language Models Through Mixture-of-Experts with Language Priors Routing

Hao Zhou, Zhijun Wang, Shujian Huang et al.

AAAI 2025paperarXiv:2408.11396

citations

#3075

Language Guided Skill Discovery

Seungeun Rho, Laura Smith, Tianyu Li et al.

ICLR 2025arXiv:2406.06615

citations

#3076

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

Yinghao Zhu, Ziyi He, Haoran Hu et al.

NEURIPS 2025arXiv:2505.12371

citations

#3077

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Ling Yang, Zhaochen Yu, Tianjun Zhang et al.

ICLR 2025arXiv:2410.09008

citations

#3078

Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

Mengjie Qin, Yuchao Feng, Zongliang Wu et al.

AAAI 2025paperarXiv:2501.01262

citations

#3079

Bridging the Data Provenance Gap Across Text, Speech, and Video

Shayne Longpre, Nikhil Singh, Manuel Cherep et al.

ICLR 2025arXiv:2412.17847

citations

#3080

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

Tian Liu, Huixin Zhang, Shubham Parashar et al.

CVPR 2025arXiv:2406.11148

citations

#3081

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Runhui Huang, Xinpeng Ding, Chunwei Wang et al.

CVPR 2025arXiv:2407.08706

citations

#3082

VisionArena: 230k Real World User-VLM Conversations with Preference Labels

Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.

CVPR 2025arXiv:2412.08687

citations

#3083

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.

AAAI 2025paperarXiv:2406.03686

citations

#3084

Toward Understanding In-context vs. In-weight Learning

Bryan Chan, Xinyi Chen, Andras Gyorgy et al.

ICLR 2025arXiv:2410.23042

citations

#3085

Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints

Sam Bowyer, Laurence Aitchison, Desi Ivanova

ICML 2025spotlightarXiv:2503.01747

citations

#3086

Scaling Properties of Diffusion Models For Perceptual Tasks

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.

CVPR 2025arXiv:2411.08034

citations

#3087

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

Jiaru Zou, Yikun Ban, Zihao Li et al.

NEURIPS 2025spotlightarXiv:2505.16270

citations

#3088

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Kevin Wang, Ishaan Javali, Michał Bortkiewicz et al.

NEURIPS 2025oralarXiv:2503.14858

citations

#3089

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Jiatao Gu, Tianrong Chen, David Berthelot et al.

NEURIPS 2025spotlightarXiv:2506.06276

citations

#3090

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Dingkang Yang, Dongling Xiao, Jinjie Wei et al.

AAAI 2025paperarXiv:2408.12325

citations

#3091

On Reasoning Strength Planning in Large Reasoning Models

Leheng Sheng, An Zhang, Zijian Wu et al.

NEURIPS 2025arXiv:2506.08390

citations

#3092

MVSAnywhere: Zero-Shot Multi-View Stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.

CVPR 2025arXiv:2503.22430

citations

#3093

AutoPartGen: Autoregressive 3D Part Generation and Discovery

Minghao Chen, Jianyuan Wang, Roman Shapovalov et al.

NEURIPS 2025

citations

#3094

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Lijun Sheng, Jian Liang, Zilei Wang et al.

CVPR 2025arXiv:2504.11195

citations

#3095

Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

Zhihai Wang, Zijie Geng, Zhaojie Tu et al.

NEURIPS 2025arXiv:2407.15026

citations

#3096

PILAF: Optimal Human Preference Sampling for Reward Modeling

Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng et al.

ICML 2025arXiv:2502.04270

citations

#3097

Generating Multi-Image Synthetic Data for Text-to-Image Customization

Nupur Kumari, Xi Yin, Jun-Yan Zhu et al.

ICCV 2025arXiv:2502.01720

citations

#3098

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

CVPR 2025arXiv:2503.17109

citations

#3099

AniDoc: Animation Creation Made Easier

Yihao Meng, Hao Ouyang, Hanlin Wang et al.

CVPR 2025arXiv:2412.14173

citations

#3100

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs

Zijia Zhao, Haoyu Lu, Yuqi Huo et al.

ICLR 2025oralarXiv:2406.09367

citations

#3101

Beyond Message Passing: Neural Graph Pattern Machine

Zehong Wang, Zheyuan Zhang, Tianyi MA et al.

ICML 2025arXiv:2501.18739

citations

#3102

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Zihao Wang, Bin CUI, Shaoduo Gan

ICLR 2025arXiv:2404.04793

citations

#3103

Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models

JINHAO LIANG, Jacob Christopher, Sven Koenig et al.

ICML 2025arXiv:2502.03607

citations

#3104

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Huanxuan Liao, Shizhu He, Yao Xu et al.

AAAI 2025paperarXiv:2409.13203

citations

#3105

Truncated Consistency Models

Sangyun Lee, Yilun Xu, Tomas Geffner et al.

ICLR 2025arXiv:2410.14895

citations

#3106

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

Hao LU, Tianshuo Xu, Wenzhao Zheng et al.

NEURIPS 2025arXiv:2412.09043

citations

#3107

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.

ICLR 2025oralarXiv:2405.13861

citations

#3108

FrameBridge: Improving Image-to-Video Generation with Bridge Models

Yuji Wang, Zehua Chen, Chen Xiaoyu et al.

ICML 2025arXiv:2410.15371

citations

#3109

Vision Transformers Don't Need Trained Registers

Nicholas Jiang, Amil Dravid, Alexei Efros et al.

NEURIPS 2025spotlightarXiv:2506.08010

citations

#3110

PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model

Yunlong Huang, Junshuo Liu, Ke Xian et al.

AAAI 2025paperarXiv:2408.03540

citations

#3111

BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis

David Svitov, Pietro Morerio, Lourdes Agapito et al.

ICCV 2025arXiv:2411.08508

citations

#3112

Citations and Trust in LLM Generated Responses

Yifan Ding, Matthew Facciani, Ellen Joyce et al.

AAAI 2025paperarXiv:2501.01303

citations

#3113

CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization

Junhao Xu, Yanan Zhang, Zhi Cai et al.

CVPR 2025arXiv:2503.03430

citations

#3114

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025arXiv:2409.01392

citations

#3115

LightGTS: A Lightweight General Time Series Forecasting Model

Yihang Wang, Yuying Qiu, Peng Chen et al.

ICML 2025arXiv:2506.06005

citations

#3116

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

Qirui Jiao, Daoyuan Chen, Yilun Huang et al.

CVPR 2025arXiv:2408.04594

citations

#3117

A Unified Theory of Quantum Neural Network Loss Landscapes

Eric Anschuetz

ICLR 2025arXiv:2408.11901

citations

#3118

GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation

Danny Wang, Ruihong Qiu, Guangdong Bai et al.

ICLR 2025arXiv:2502.05780

citations

#3119

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Cheol Jun Cho, Nicholas Lee, Akshat Gupta et al.

ICLR 2025arXiv:2410.07168

citations

#3120

Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis

Qunzhong WANG, Xiangguo Sun, Hong Cheng

ICML 2025arXiv:2410.01635

citations

#3121

TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees

Weibin Liao, Xu Chu, Yasha Wang

ICLR 2025arXiv:2410.12854

citations

#3122

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models

Hongyang Wei, Shuaizheng Liu, Chun Yuan et al.

ICCV 2025arXiv:2503.11073

citations

#3123

Personalized Preference Fine-tuning of Diffusion Models

Meihua Dang, Anikait Singh, Linqi Zhou et al.

CVPR 2025arXiv:2501.06655

citations

#3124

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Kaizheng Wang

NEURIPS 2025arXiv:2302.10160

citations

#3125

FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs

Mothilal Asokan, Kebin wu, Fatima Albreiki

CVPR 2025arXiv:2504.01916

citations

#3126

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025

citations

#3127

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

Yiren Song, Pei Yang, Hai Ci et al.

CVPR 2025arXiv:2412.11638

citations

#3128

Assessing and Learning Alignment of Unimodal Vision and Language Models

Le Zhang, Qian Yang, Aishwarya Agrawal

CVPR 2025highlightarXiv:2412.04616

citations

#3129

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation

Yifeng Xu, Zhenliang He, Shiguang Shan et al.

ICLR 2025arXiv:2410.09400

citations

#3130

ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling

Dongchao Yang, Songxiang Liu, Haohan Guo et al.

ICML 2025arXiv:2504.10344

citations

#3131

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee, Haebin Seong, Dong Bok Lee et al.

ICLR 2025arXiv:2410.01524

citations

#3132

X-Dyna: Expressive Dynamic Human Image Animation

Di Chang, Hongyi Xu, You Xie et al.

CVPR 2025highlightarXiv:2501.10021

citations

#3133

DRAWER: Digital Reconstruction and Articulation With Environment Realism

Hongchi Xia, Entong Su, Marius Memmel et al.

CVPR 2025arXiv:2504.15278

citations

#3134

LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph

Tu Ao, Yanhua Yu, Yuling Wang et al.

AAAI 2025paperarXiv:2504.03137

citations

#3135

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

Arth Shukla, Stone Tao, Hao Su

ICLR 2025arXiv:2412.13211

citations

#3136

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.

ICCV 2025arXiv:2503.07478

citations

#3137

V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

Jianqi Chen, Biao Zhang, Xiangjun Tang et al.

ICCV 2025arXiv:2503.09631

citations

#3138

I2VControl: Disentangled and Unified Video Motion Synthesis Control

Wanquan Feng, Tianhao Qi, Jiawei Liu et al.

ICCV 2025arXiv:2411.17765

citations

#3139

Retrieval Augmented Time Series Forecasting

Sungwon Han, Seungeon Lee, MEEYOUNG CHA et al.

ICML 2025oralarXiv:2505.04163

citations

#3140

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov et al.

AAAI 2025paperarXiv:2409.00134

citations

#3141

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Jiachen Yao, Abbas Mammadov, Julius Berner et al.

NEURIPS 2025arXiv:2505.17004

citations

#3142

Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging

Junkang Liu, Yuanyuan Liu, Fanhua Shang et al.

ICML 2025arXiv:2507.20016

citations

#3143

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently

Yuanhe Zhang, Fanghui Liu, Yudong Chen

ICML 2025oralarXiv:2502.01235

citations

#3144

xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Qingchen Yu, Zifan Zheng, Shichao Song et al.

ICLR 2025arXiv:2405.11874

citations

#3145

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Jinxiu Liu, Shaoheng Lin, Yinxiao Li et al.

CVPR 2025arXiv:2412.11100

citations

#3146

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Zekang Yang, Wang Zeng, Sheng Jin et al.

AAAI 2025paperarXiv:2402.15351

citations

#3147

Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens

Xixian Yong, Xiao Zhou, Yingying Zhang et al.

NEURIPS 2025spotlightarXiv:2505.18237

citations

#3148

Equivariant Neural Functional Networks for Transformers

Viet-Hoang Tran, Thieu Vo, An Nguyen et al.

ICLR 2025arXiv:2410.04209

citations

#3149

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

Xu Zheng, Farhad Shirani, Zhuomin Chen et al.

ICLR 2025arXiv:2410.02970

citations

#3150

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Yarden As, Bhavya, Lenart Treven et al.

ICLR 2025arXiv:2410.09486

citations

#3151

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Mahir Labib Dihan, Tanvir Hassan, Md Tanvir Parvez et al.

ICML 2025spotlightarXiv:2501.00316

citations

#3152

LOCATE 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Paul McVay, Sergio Arnaud, Ada Martin et al.

ICML 2025spotlightarXiv:2504.14151

citations

#3153

LongRoPE2: Near-Lossless LLM Context Window Scaling

Ning Shang, Li Lyna Zhang, Siyuan Wang et al.

ICML 2025arXiv:2502.20082

citations

#3154

Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming

Haoyang Liu, Jie Wang, Zijie Geng et al.

ICLR 2025arXiv:2503.01129

citations

#3155

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

Yuheng Shi, Minjing Dong, Chang Xu

ICCV 2025arXiv:2411.09219

citations

#3156

Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

Aliyah Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri et al.

ICLR 2025arXiv:2407.00886

citations

#3157

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025arXiv:2410.05346

citations

#3158

GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks

Sarp Aykent, Tian Xia

ICLR 2025

citations

#3159

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.

ICLR 2025arXiv:2505.11245

citations

#3160

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Bo Wang, Qinyuan Cheng, Runyu Peng et al.

NEURIPS 2025arXiv:2507.00018

citations

#3161

S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting

Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.

CVPR 2025arXiv:2503.04314

citations

#3162

Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective

Zeyu Gan, Yong Liu

ICLR 2025arXiv:2410.01720

citations

#3163

TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models

Xin Wang, Kai Chen, Jiaming Zhang et al.

CVPR 2025arXiv:2411.13136

citations

#3164

GenEx: Generating an Explorable World

TaiMing Lu, Tianmin Shu, Alan Yuille et al.

ICLR 2025arXiv:2412.09624

citations

#3165

AGENTIF: Benchmarking Large Language Models Instruction Following Ability in Agentic Scenarios

Yunjia Qi, Hao Peng, Xiaozhi Wang et al.

NEURIPS 2025spotlight

citations

#3166

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

Yi Chen, Jian Xu, Xu-Yao Zhang et al.

AAAI 2025paperarXiv:2409.01179

citations

#3167

UnZipLoRA: Separating Content and Style from a Single Image

Chang Liu, Viraj Shah, Aiyu Cui et al.

ICCV 2025highlightarXiv:2412.04465

citations

#3168

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

Jiyuan Shi, Xinzhe Liu, Dewei Wang et al.

NEURIPS 2025arXiv:2504.14305

citations

#3169

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

ICLR 2025arXiv:2405.14953

citations

#3170

Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models

Jean Park, Kuk Jin Jang, Basam Alasaly et al.

AAAI 2025paperarXiv:2408.12763

citations

#3171

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Tianjin Huang, Ziquan Zhu, Gaojie Jin et al.

ICLR 2025arXiv:2501.06842

citations

#3172

Efficient Track Anything

Yunyang Xiong, Chong Zhou, Xiaoyu Xiang et al.

ICCV 2025arXiv:2411.18933

citations

#3173

Ensembling Diffusion Models via Adaptive Feature Aggregation

Cong Wang, kuan tian, Yonghang Guan et al.

ICLR 2025arXiv:2405.17082

citations

#3174

WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models

Shengda Fan, Xin Cong, Yuepeng Fu et al.

ICLR 2025arXiv:2411.05451

citations

#3175

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.

CVPR 2025arXiv:2503.13399

citations

#3176

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Gaurav Sahu, Abhay Puri, Juan A. Rodriguez et al.

ICLR 2025arXiv:2407.06423

citations

#3177

Peri-LN: Revisiting Normalization Layer in the Transformer Architecture

Jeonghoon Kim, Byeongchan Lee, Cheonbok Park et al.

ICML 2025spotlightarXiv:2502.02732

citations

#3178

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Han Wang, Yuxiang Nie, Yongjie Ye et al.

ICCV 2025arXiv:2412.09530

citations

#3179

HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking

Runquan Gui, Zhihai Wang, Jie Wang et al.

ICML 2025arXiv:2505.02322

citations

#3180

Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning

Jian Lang, Zhangtao Cheng, Ting Zhong et al.

AAAI 2025paperarXiv:2501.01120

citations

#3181

Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability

Yingdong Shi, Changming Li, Yifan Wang et al.

CVPR 2025arXiv:2503.20483

citations

#3182

Docopilot: Improving Multimodal Models for Document-Level Understanding

Yuchen Duan, Zhe Chen, Yusong Hu et al.

CVPR 2025arXiv:2507.14675

citations

#3183

TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster

Kanghui Ning, Zijie Pan, Yu Liu et al.

NEURIPS 2025arXiv:2503.07649

citations

#3184

TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts

Yu-Hao Huang, Chang Xu, Yueying Wu et al.

AAAI 2025paperarXiv:2501.05403

citations

#3185

Probing Visual Language Priors in VLMs

Tiange Luo, Ang Cao, Gunhee Lee et al.

ICML 2025arXiv:2501.00569

citations

#3186

GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

Enguang Wang, Zhimao Peng, Zhengyuan Xie et al.

CVPR 2025arXiv:2403.09974

citations

#3187

FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

Yifan Wang, Peishan Yang, Zhen Xu et al.

CVPR 2025

citations

#3188

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Kang You, Tong Chen, Dandan Ding et al.

CVPR 2025arXiv:2503.12382

citations

#3189

Geolocation Representation from Large Language Models Are Generic Enhancers for Spatio-Temporal Learning

Junlin He, Tong Nie, Wei Ma

AAAI 2025paperarXiv:2408.12116

citations

#3190

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

Yan Gong, Yiren Song, Yicheng Li et al.

NEURIPS 2025arXiv:2506.02528

citations

#3191

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.

CVPR 2025arXiv:2309.03904

citations

#3192

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NEURIPS 2025arXiv:2505.01481

citations

#3193

EPIC: Efficient Position-Independent Caching for Serving Large Language Models

JUNHAO HU, Wenrui Huang, Weidong Wang et al.

ICML 2025arXiv:2410.15332

citations

#3194

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025arXiv:2506.03093

citations

#3195

Transformers Struggle to Learn to Search

Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.

ICLR 2025arXiv:2412.04703

citations

#3196

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj et al.

ICLR 2025arXiv:2410.01335

citations

#3197

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration

Jipeng Cen, Jiaxin Liu, Zhixu Li et al.

AAAI 2025paperarXiv:2406.13408

citations

#3198

MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation

Ning Li, Xiangmou Qu, Jiamu Zhou et al.

NEURIPS 2025oral

citations

#3199

Maximum Entropy Reinforcement Learning with Diffusion Policy

Xiaoyi Dong, Jian Cheng, Xi Zhang

ICML 2025arXiv:2502.11612

citations

#3200

Each Fake News Is Fake in Its Own Way: An Attribution Multi-Granularity Benchmark for Multimodal Fake News Detection

Hao Guo, Zihan Ma, Zhi Zeng et al.

AAAI 2025paperarXiv:2412.14686

citations

← Previous

1...14 15 16 17 18...112