Most Cited 2025 "frequency domain features" Papers

22,274 papers found • Page 11 of 112

#2001

Diffusion on Language Model Encodings for Protein Sequence Generation

Viacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov et al.

ICML 2025posterarXiv:2403.03726
14
citations
#2002

SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining

Mingjin Zhang, Xiaolong Li, Fei Gao et al.

CVPR 2025poster
14
citations
#2003

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025poster
14
citations
#2004

Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting

Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.

CVPR 2025posterarXiv:2503.14029
14
citations
#2005

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

XIANGYU PENG, Congying Xia, Xinyi Yang et al.

ICLR 2025posterarXiv:2410.02108
14
citations
#2006

Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models

Shicheng Xu, Liang Pang, Yunchang Zhu et al.

ICLR 2025posterarXiv:2410.12662
14
citations
#2007

Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective

Zeyu Gan, Yong Liu

ICLR 2025posterarXiv:2410.01720
14
citations
#2008

Quantization without Tears

Minghao Fu, Hao Yu, Jie Shao et al.

CVPR 2025posterarXiv:2411.13918
14
citations
#2009

Pippo: High-Resolution Multi-View Humans from a Single Image

Yash Kant, Ethan Weber, Jin Kyu Kim et al.

CVPR 2025highlightarXiv:2502.07785
14
citations
#2010

Provably Accurate Shapley Value Estimation via Leverage Score Sampling

Christopher Musco, R. Teal Witter

ICLR 2025posterarXiv:2410.01917
14
citations
#2011

Inference-Time Hyper-Scaling with KV Cache Compression

Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot et al.

NEURIPS 2025posterarXiv:2506.05345
14
citations
#2012

Video Diffusion Models Are Strong Video Inpainter

Minhyeok Lee, Suhwan Cho, Chajin Shin et al.

AAAI 2025paperarXiv:2408.11402
14
citations
#2013

Can Transformers Learn Full Bayesian Inference in Context?

Arik Reuter, Tim G. J. Rudner, Vincent Fortuin et al.

ICML 2025posterarXiv:2501.16825
14
citations
#2014

Refine Knowledge of Large Language Models via Adaptive Contrastive Learning

Yinghui Li, Haojing Huang, Jiayi Kuang et al.

ICLR 2025posterarXiv:2502.07184
14
citations
#2015

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.

ICLR 2025posterarXiv:2410.18640
14
citations
#2016

NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving

Chengyue Wang, Haicheng Liao, Bonan Wang et al.

AAAI 2025paperarXiv:2412.11682
14
citations
#2017

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Shulin Huang, Linyi Yang, Yan Song et al.

NEURIPS 2025posterarXiv:2502.16268
14
citations
#2018

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

Yiren Song, Pei Yang, Hai Ci et al.

CVPR 2025posterarXiv:2412.11638
14
citations
#2019

Optimization with Access to Auxiliary Information

EL MAHDI CHAYTI, Sai Karimireddy

ICLR 2025posterarXiv:2206.00395
14
citations
#2020

Weak-to-Strong Generalization Through the Data-Centric Lens

Changho Shin, John Cooper, Frederic Sala

ICLR 2025posterarXiv:2412.03881
14
citations
#2021

Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors

Weilong Yan, Ming Li, Li Haipeng et al.

CVPR 2025posterarXiv:2503.20211
14
citations
#2022

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
14
citations
#2023

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation

Zhi Cen, Huaijin Pi, Sida Peng et al.

ICLR 2025posterarXiv:2502.20370
14
citations
#2024

Let LRMs Break Free from Overthinking via Self-Braking Tuning

Haoran Zhao, Yuchen Yan, Yongliang Shen et al.

NEURIPS 2025posterarXiv:2505.14604
13
citations
#2025

Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations

Ji-An Li, Huadong Xiong, Robert Wilson et al.

NEURIPS 2025posterarXiv:2505.13763
13
citations
#2026

AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification

Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.

CVPR 2025posterarXiv:2503.08121
13
citations
#2027

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang et al.

ICLR 2025posterarXiv:2405.16325
13
citations
#2028

Conformal Prediction for Causal Effects of Continuous Treatments

Maresa Schröder, Dennis Frauen, Jonas Schweisthal et al.

NEURIPS 2025posterarXiv:2407.03094
13
citations
#2029

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Shuwei Shi, Biao Gong, Xi Chen et al.

CVPR 2025posterarXiv:2412.05848
13
citations
#2030

Benchmarking LLMs' Judgments with No Gold Standard

Shengwei Xu, Yuxuan Lu, Grant Schoenebeck et al.

ICLR 2025posterarXiv:2411.07127
13
citations
#2031

Concept Bottleneck Language Models For Protein Design

Aya Ismail, Tuomas Oikarinen, Amy Wang et al.

ICLR 2025posterarXiv:2411.06090
13
citations
#2032

Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling

Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.

ICLR 2025posterarXiv:2410.11236
13
citations
#2033

An Engorgio Prompt Makes Large Language Model Babble on

Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang et al.

ICLR 2025posterarXiv:2412.19394
13
citations
#2034

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

Kairong Luo, Haodong Wen, Shengding Hu et al.

ICLR 2025posterarXiv:2503.12811
13
citations
#2035

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NEURIPS 2025posterarXiv:2506.05744
13
citations
#2036

ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models

Hao Yin, Guangzong Si, Zilei Wang

CVPR 2025posterarXiv:2503.13107
13
citations
#2037

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang

NEURIPS 2025oralarXiv:2401.07844
13
citations
#2038

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NEURIPS 2025oralarXiv:2509.21100
13
citations
#2039

ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics

Junchao Zhu, Ruining Deng, Tianyuan Yao et al.

CVPR 2025posterarXiv:2412.03026
13
citations
#2040

MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models

Jingwei Xu, Junyu Lai, Yunpeng Huang

ICLR 2025posterarXiv:2405.13053
13
citations
#2041

Ward: Provable RAG Dataset Inference via LLM Watermarks

Nikola Jovanović, Robin Staab, Maximilian Baader et al.

ICLR 2025posterarXiv:2410.03537
13
citations
#2042

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful

Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.

NEURIPS 2025posterarXiv:2507.07101
13
citations
#2043

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Gaurav Sahu, Abhay Puri, Juan A. Rodriguez et al.

ICLR 2025posterarXiv:2407.06423
13
citations
#2044

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Jinluan Yang, Dingnan Jin, Anke Tang et al.

NEURIPS 2025posterarXiv:2502.06876
13
citations
#2045

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs

Jiarui Wang, Huiyu Duan, Yu Zhao et al.

ICCV 2025highlightarXiv:2504.08358
13
citations
#2046

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

Jiyuan Shi, Xinzhe Liu, Dewei Wang et al.

NEURIPS 2025posterarXiv:2504.14305
13
citations
#2047

Emergence and scaling laws in SGD learning of shallow neural networks

Yunwei Ren, Eshaan Nichani, Denny Wu et al.

NEURIPS 2025posterarXiv:2504.19983
13
citations
#2048

Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions

Wei Yao, Haian Yin, Shangzhi Zeng et al.

ICLR 2025posterarXiv:2406.01992
13
citations
#2049

Sum of Squares Circuits

Lorenzo Loconte, Stefan Mengel, Antonio Vergari

AAAI 2025paperarXiv:2408.11778
13
citations
#2050

Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later

Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan et al.

ICLR 2025posterarXiv:2407.03257
13
citations
#2051

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Yuhao Qing, Boyu Zhu, Mingzhe Du et al.

NEURIPS 2025posterarXiv:2505.13004
13
citations
#2052

TANGO: Training-free Embodied AI Agents for Open-world Tasks

Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.

CVPR 2025posterarXiv:2412.10402
13
citations
#2053

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Jian Yang, Dacheng Yin, Yizhou Zhou et al.

CVPR 2025posterarXiv:2410.10798
13
citations
#2054

MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert

Dapeng Zhang, Dayu Chen, Peng Zhi et al.

AAAI 2025paperarXiv:2412.12704
13
citations
#2055

BotSim: LLM-Powered Malicious Social Botnet Simulation

Boyu Qiao, Kun Li, Wei Zhou et al.

AAAI 2025paperarXiv:2412.13420
13
citations
#2056

MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models

Jiachun Li, Pengfei Cao, Zhuoran Jin et al.

ICLR 2025posterarXiv:2410.09542
13
citations
#2057

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

Qianxiong Xu, Cheng Long, Ziyue Li et al.

AAAI 2025paperarXiv:2311.02565
13
citations
#2058

Surprising Effectiveness of pretraining Ternary Language Model at Scale

Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.

ICLR 2025posterarXiv:2407.12327
13
citations
#2059

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang et al.

NEURIPS 2025posterarXiv:2502.05567
13
citations
#2060

Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding

Yixiong Fang, Ziran Yang, Zhaorun Chen et al.

NEURIPS 2025posterarXiv:2412.06474
13
citations
#2061

MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants

Zeyu Zhang, Quanyu Dai, Luyu Chen et al.

NEURIPS 2025posterarXiv:2409.20163
13
citations
#2062

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.

NEURIPS 2025posterarXiv:2506.20520
13
citations
#2063

Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

Augusto B. Corrêa, André G. Pereira, Jendrik Seipp

NEURIPS 2025posterarXiv:2503.18809
13
citations
#2064

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

Michael Aerni, Javier Rando, Edoardo Debenedetti et al.

ICLR 2025posterarXiv:2411.10242
13
citations
#2065

Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels

Ruitao Pu, Yuan Sun, Yang Qin et al.

AAAI 2025paperarXiv:2501.01699
13
citations
#2066

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

Yinghao Zhu, Ziyi He, Haoran Hu et al.

NEURIPS 2025posterarXiv:2505.12371
13
citations
#2067

Efficient Inference for Large Language Model-based Generative Recommendation

Xinyu Lin, Chaoqun Yang, Wenjie Wang et al.

ICLR 2025posterarXiv:2410.05165
13
citations
#2068

Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach

Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.

AAAI 2025paperarXiv:2404.03702
13
citations
#2069

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Abdelrahman Eldesokey, Peter Wonka

ICLR 2025posterarXiv:2408.14819
13
citations
#2070

Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation

Yingjie Chen, Yifang Men, Yuan Yao et al.

ICCV 2025posterarXiv:2501.05020
13
citations
#2071

Detecting High-Stakes Interactions with Activation Probes

Alex McKenzie, Urja Pawar, Phil Blandfort et al.

NEURIPS 2025posterarXiv:2506.10805
13
citations
#2072

Scaling Inference Time Compute for Diffusion Models

Nanye Ma, Shangyuan Tong, Haolin Jia et al.

CVPR 2025highlight
13
citations
#2073

Local Conditional Controlling for Text-to-Image Diffusion Models

Yibo Zhao, Liang Peng, Yang Yang et al.

AAAI 2025paperarXiv:2312.08768
13
citations
#2074

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj et al.

ICLR 2025posterarXiv:2410.01335
13
citations
#2075

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo

Zhenlong Yuan, Jinguo Luo, Fei Shen et al.

AAAI 2025paperarXiv:2412.11578
13
citations
#2076

Referring to Any Person

Qing Jiang, Lin Wu, Zhaoyang Zeng et al.

ICCV 2025posterarXiv:2503.08507
13
citations
#2077

UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models

Xin Xu, Jiaxin ZHANG, Tianhao Chen et al.

ICLR 2025posterarXiv:2501.13766
13
citations
#2078

Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs

Soonbin Lee, Fangwen Shu, Yago Sanchez de la Fuente et al.

ICCV 2025posterarXiv:2501.03399
13
citations
#2079

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.

ICCV 2025posterarXiv:2503.20491
13
citations
#2080

PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference

Jiarui Fang, Jinzhe Pan, Aoyu Li et al.

NEURIPS 2025posterarXiv:2405.14430
13
citations
#2081

FlowDec: A flow-based full-band general audio codec with high perceptual quality

Simon Welker, Matthew Le, Ricky T. Q. Chen et al.

ICLR 2025posterarXiv:2503.01485
13
citations
#2082

RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation

Feng yan, Fanfan Liu, Yiyang Huang et al.

ICCV 2025posterarXiv:2412.07215
13
citations
#2083

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Yuchen Zhu, Tianrong Chen, Lingkai Kong et al.

ICLR 2025posterarXiv:2405.16381
13
citations
#2084

HRAvatar: High-Quality and Relightable Gaussian Head Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin et al.

CVPR 2025posterarXiv:2503.08224
13
citations
#2085

Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts

Miao Rang, Zhenni Bi, Chuanjian Liu et al.

AAAI 2025paperarXiv:2501.04322
13
citations
#2086

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.

ICCV 2025posterarXiv:2507.02664
13
citations
#2087

Neuroplastic Expansion in Deep Reinforcement Learning

Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.

ICLR 2025posterarXiv:2410.07994
13
citations
#2088

RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Kunming Su, Qiuxia Wu, Panpan Cai et al.

AAAI 2025paperarXiv:2409.00353
13
citations
#2089

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025posterarXiv:2505.22094
13
citations
#2090

C-CLIP: Multimodal Continual Learning for Vision-Language Model

Wenzhuo Liu, Fei Zhu, Longhui Wei et al.

ICLR 2025poster
13
citations
#2091

AdaWM: Adaptive World Model based Planning for Autonomous Driving

Hang Wang, Xin Ye, Feng Tao et al.

ICLR 2025posterarXiv:2501.13072
13
citations
#2092

Detect Anything 3D in the Wild

Hanxue Zhang, Haoran Jiang, Qingsong Yao et al.

ICCV 2025posterarXiv:2504.07958
13
citations
#2093

ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression

Wei Jiang, Junru Li, Kai Zhang et al.

CVPR 2025posterarXiv:2410.09706
13
citations
#2094

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.

ICLR 2025posterarXiv:2505.11245
13
citations
#2095

What Makes a Maze Look Like a Maze?

Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.

ICLR 2025posterarXiv:2409.08202
13
citations
#2096

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

William Chen, Jinchuan Tian, Yifan Peng et al.

ICML 2025posterarXiv:2502.10373
13
citations
#2097

Are Large Vision Language Models Good Game Players?

Xinyu Wang, Bohan Zhuang, Qi Wu

ICLR 2025posterarXiv:2503.02358
13
citations
#2098

MVSAnywhere: Zero-Shot Multi-View Stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.

CVPR 2025posterarXiv:2503.22430
13
citations
#2099

UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection

Shun Wei, Jielin Jiang, Xiaolong Xu

CVPR 2025poster
13
citations
#2100

UFM: A Simple Path towards Unified Dense Correspondence with Flow

Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.

NEURIPS 2025posterarXiv:2506.09278
13
citations
#2101

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

Vaishnavh Nagarajan, Chen Wu, Charles Ding et al.

ICML 2025oralarXiv:2504.15266
13
citations
#2102

Faster Algorithms for Structured Linear and Kernel Support Vector Machines

Yuzhou Gu, Zhao Song, Lichen Zhang

ICLR 2025posterarXiv:2307.07735
13
citations
#2103

Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation

Zhenxin Lei, Man Yao, Jiakui Hu et al.

AAAI 2025paperarXiv:2412.14587
13
citations
#2104

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee, Haebin Seong, Dong Bok Lee et al.

ICLR 2025posterarXiv:2410.01524
13
citations
#2105

ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models

Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.

CVPR 2025posterarXiv:2505.07652
13
citations
#2106

On the Feature Learning in Diffusion Models

Andi Han, Wei Huang, Yuan Cao et al.

ICLR 2025posterarXiv:2412.01021
13
citations
#2107

Scaling Autonomous Agents via Automatic Reward Modeling And Planning

Zhenfang Chen, Delin Chen, Rui Sun et al.

ICLR 2025posterarXiv:2502.12130
13
citations
#2108

PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis

Xinlei Huang, Zhiqi Ma, Dian Meng et al.

AAAI 2025paperarXiv:2409.12728
13
citations
#2109

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Lijun Sheng, Jian Liang, Zilei Wang et al.

CVPR 2025posterarXiv:2504.11195
13
citations
#2110

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Haokun Chen, Hang Li, Yao Zhang et al.

CVPR 2025posterarXiv:2410.04810
13
citations
#2111

Temporal Query Network for Efficient Multivariate Time Series Forecasting

Shengsheng Lin, Haojun Chen, Haijie Wu et al.

ICML 2025oralarXiv:2505.12917
13
citations
#2112

Efficient Track Anything

Yunyang Xiong, Chong Zhou, Xiaoyu Xiang et al.

ICCV 2025posterarXiv:2411.18933
13
citations
#2113

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne et al.

ICLR 2025posterarXiv:2312.15289
13
citations
#2114

CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance

Jinming Li, Yichen Zhu, Zhibin Tang et al.

ICCV 2025poster
13
citations
#2115

Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability

Yingdong Shi, Changming Li, Yifan Wang et al.

CVPR 2025posterarXiv:2503.20483
13
citations
#2116

Truncated Consistency Models

Sangyun Lee, Yilun Xu, Tomas Geffner et al.

ICLR 2025posterarXiv:2410.14895
13
citations
#2117

Event-based Video Super-Resolution via State Space Models

Zeyu Xiao, Xinchao Wang

CVPR 2025poster
13
citations
#2118

On a Connection Between Imitation Learning and RLHF

Teng Xiao, Yige Yuan, Mingxiao Li et al.

ICLR 2025posterarXiv:2503.05079
13
citations
#2119

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing

Pengcheng Zhao, Jinxing Zhou, Yang Zhao et al.

AAAI 2025paperarXiv:2412.11248
13
citations
#2120

xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Qingchen Yu, Zifan Zheng, Shichao Song et al.

ICLR 2025posterarXiv:2405.11874
13
citations
#2121

CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer

Yang Liu, Zinan Zheng, Jiashun Cheng et al.

ICLR 2025oralarXiv:2502.19750
13
citations
#2122

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining

Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu et al.

ICLR 2025posterarXiv:2502.06733
13
citations
#2123

UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

Yuping Wang, Xiangyu Huang, Xiaokang Sun et al.

ICCV 2025posterarXiv:2503.24381
13
citations
#2124

Personalized Preference Fine-tuning of Diffusion Models

Meihua Dang, Anikait Singh, Linqi Zhou et al.

CVPR 2025posterarXiv:2501.06655
13
citations
#2125

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Yingzi Ma, Jiongxiao Wang, Fei Wang et al.

ICLR 2025posterarXiv:2411.03554
13
citations
#2126

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

Song Wang, Peng Wang, Tong Zhou et al.

ICLR 2025posterarXiv:2407.02408
13
citations
#2127

Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Zhengyi Zhong, Weidong Bao, Ji Wang et al.

CVPR 2025posterarXiv:2502.20709
13
citations
#2128

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Tao Zhang, Cheng Da, Kun Ding et al.

NEURIPS 2025posterarXiv:2502.01051
13
citations
#2129

Synthetic Data is an Elegant GIFT for Continual Vision-Language Models

Bin Wu, Wuxuan Shi, Jinqiao Wang et al.

CVPR 2025posterarXiv:2503.04229
13
citations
#2130

PILAF: Optimal Human Preference Sampling for Reward Modeling

Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng et al.

ICML 2025posterarXiv:2502.04270
13
citations
#2131

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Runhui Huang, Xinpeng Ding, Chunwei Wang et al.

CVPR 2025posterarXiv:2407.08706
13
citations
#2132

Implicit Search via Discrete Diffusion: A Study on Chess

Jiacheng Ye, Zhenyu Wu, Jiahui Gao et al.

ICLR 2025posterarXiv:2502.19805
13
citations
#2133

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

Shengeng Tang, Jiayi He, Dan Guo et al.

AAAI 2025paperarXiv:2412.13609
13
citations
#2134

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

Yunlong Tang, Gen Zhan, Li Yang et al.

AAAI 2025paperarXiv:2408.12009
13
citations
#2135

Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals

Nate Gillman, Charles Herrmann, Michael Freeman et al.

NEURIPS 2025posterarXiv:2505.19386
13
citations
#2136

Grounding Continuous Representations in Geometry: Equivariant Neural Fields

David Wessels, David Knigge, Riccardo Valperga et al.

ICLR 2025posterarXiv:2406.05753
13
citations
#2137

Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping

Ziye Huang, Haoqi Yuan, Yuhui Fu et al.

ICLR 2025posterarXiv:2410.02475
13
citations
#2138

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

Andrew Estornell, Jean-Francois Ton, Yuanshun Yao et al.

ICLR 2025posterarXiv:2411.00053
13
citations
#2139

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws

Muhammed Ildiz, Halil Gozeten, Ege Taga et al.

ICLR 2025posterarXiv:2410.18837
13
citations
#2140

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Luke Rowe, Roger Girgis, Anthony Gosselin et al.

CVPR 2025posterarXiv:2503.22496
13
citations
#2141

How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence

Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei et al.

ICML 2025poster
13
citations
#2142

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Yu Yuan, Xijun Wang, Yichen Sheng et al.

CVPR 2025highlightarXiv:2412.02168
13
citations
#2143

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Patrik Reizinger, Siyuan Guo, Ferenc Huszar et al.

ICLR 2025posterarXiv:2406.14302
13
citations
#2144

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

Bin Wang, Fan Wu, Linke Ouyang et al.

CVPR 2025posterarXiv:2409.03643
13
citations
#2145

Exploring More from Multiple Gait Modalities for Human Identification

Dongyang Jin, Chao Fan, Weihua Chen et al.

AAAI 2025paperarXiv:2412.11495
13
citations
#2146

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

Zikun Zhang, Zixiang Chen, Quanquan Gu

ICLR 2025posterarXiv:2410.02321
13
citations
#2147

Do LLMs estimate uncertainty well in instruction-following?

Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.

ICLR 2025posterarXiv:2410.14582
13
citations
#2148

MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow

Hanzhuo Huang, Yuan Liu, Ge Zheng et al.

ICLR 2025oralarXiv:2502.11697
13
citations
#2149

Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Ming Dai, Jian Li, Jiedong Zhuang et al.

AAAI 2025paperarXiv:2501.06710
13
citations
#2150

HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers

Yifeng Wang, Yi Zhao

AAAI 2025paperarXiv:2502.18064
13
citations
#2151

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

Yaming Yang, Dilxat Muhtar, Yelong Shen et al.

AAAI 2025paperarXiv:2410.09437
13
citations
#2152

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Xueqing Wu, Yuheng Ding, Bingxuan Li et al.

CVPR 2025posterarXiv:2412.02172
13
citations
#2153

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu et al.

CVPR 2025highlightarXiv:2408.15270
13
citations
#2154

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.

ICLR 2025posterarXiv:2406.12120
13
citations
#2155

DRAWER: Digital Reconstruction and Articulation With Environment Realism

Hongchi Xia, Entong Su, Marius Memmel et al.

CVPR 2025posterarXiv:2504.15278
13
citations
#2156

On the Relationship Between Monotone and Squared Probabilistic Circuits

Benjie Wang, Guy Van den Broeck

AAAI 2025paperarXiv:2408.00876
13
citations
#2157

A Periodic Bayesian Flow for Material Generation

Hanlin Wu, Yuxuan Song, Jingjing Gong et al.

ICLR 2025posterarXiv:2502.02016
13
citations
#2158

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Yarden As, Bhavya, Lenart Treven et al.

ICLR 2025posterarXiv:2410.09486
13
citations
#2159

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Yuncong Yang, Jiageng Liu, Zheyuan Zhang et al.

NEURIPS 2025posterarXiv:2507.12508
13
citations
#2160

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NEURIPS 2025posterarXiv:2505.01481
13
citations
#2161

Contextual Bandits for Unbounded Context Distributions

Puning Zhao, Rongfei Fan, Shaowei Wang et al.

ICML 2025posterarXiv:2408.09655
13
citations
#2162

Standardizing Structural Causal Models

Weronika Ormaniec, Scott Sussex, Lars Lorch et al.

ICLR 2025posterarXiv:2406.11601
13
citations
#2163

SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models

Haotian Xia, Zhengbang Yang, Junbo Zou et al.

ICLR 2025posterarXiv:2410.08474
13
citations
#2164

Decision Information Meets Large Language Models: The Future of Explainable Operations Research

Yansen Zhang, Qingcan Kang, Wing Yin YU et al.

ICLR 2025posterarXiv:2502.09994
12
citations
#2165

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Jiale Cheng, Xiao Liu, Cunxiang Wang et al.

ICLR 2025posterarXiv:2412.11605
12
citations
#2166

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025posterarXiv:2502.02358
12
citations
#2167

Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization

Yue Zhang, Liqiang Jing, Vibhav Gogate

AAAI 2025paperarXiv:2412.16232
12
citations
#2168

SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography

Xuanyu Zhang, Jiarui Meng, Zhipei Xu et al.

ICLR 2025posterarXiv:2503.06118
12
citations
#2169

UniMuMo: Unified Text, Music, and Motion Generation

Han Yang, Kun Su, Yutong Zhang et al.

AAAI 2025paperarXiv:2410.04534
12
citations
#2170

NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting

Yulong Zheng, Zicheng Jiang, Shengfeng He et al.

CVPR 2025highlightarXiv:2503.18794
12
citations
#2171

Establishing Best Practices in Building Rigorous Agentic Benchmarks

Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun et al.

NEURIPS 2025posterarXiv:2507.02825
12
citations
#2172

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Chang-Bin Zhang, Yujie Zhong, Kai Han

CVPR 2025poster
12
citations
#2173

Citations and Trust in LLM Generated Responses

Yifan Ding, Matthew Facciani, Ellen Joyce et al.

AAAI 2025paperarXiv:2501.01303
12
citations
#2174

Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation

Qingchen Tang, Lei Fan, Maurice Pagnucco et al.

CVPR 2025posterarXiv:2503.12068
12
citations
#2175

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat

CVPR 2025posterarXiv:2503.08585
12
citations
#2176

Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts

Lihu Chen, Adam Dejl, Francesca Toni

AAAI 2025paperarXiv:2406.10868
12
citations
#2177

Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior

Tongda Xu, Xiyan Cai, Xinjie Zhang et al.

ICLR 2025posterarXiv:2501.18913
12
citations
#2178

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.

CVPR 2025posterarXiv:2501.03714
12
citations
#2179

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Zachary Charles, Gabriel Teston, Lucio Dery et al.

NEURIPS 2025spotlightarXiv:2503.09799
12
citations
#2180

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models

Yijing Lin, Mengqi Huang, Shuhan Zhuang et al.

ICCV 2025posterarXiv:2503.10406
12
citations
#2181

Fully-inductive Node Classification on Arbitrary Graphs

Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin et al.

ICLR 2025posterarXiv:2405.20445
12
citations
#2182

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Kun Liu, Qi Liu, Xinchen Liu et al.

CVPR 2025posterarXiv:2503.23715
12
citations
#2183

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts

Minh Le, Chau Nguyen, Huy Nguyen et al.

ICLR 2025posterarXiv:2410.02200
12
citations
#2184

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

Matan Levi, Yair Allouche, Daniel Ohayon et al.

AAAI 2025paperarXiv:2408.09304
12
citations
#2185

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction

Weirong Chen, Ganlin Zhang, Felix Wimbauer et al.

ICCV 2025posterarXiv:2504.14516
12
citations
#2186

ReSim: Reliable World Simulation for Autonomous Driving

Jiazhi Yang, Kashyap Chitta, Shenyuan Gao et al.

NEURIPS 2025spotlightarXiv:2506.09981
12
citations
#2187

VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation

Saksham Singh Kushwaha, Yapeng Tian

CVPR 2025posterarXiv:2412.10768
12
citations
#2188

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos

Chin-Yang Lin, Cheng Sun, Fu-En Yang et al.

ICCV 2025posterarXiv:2508.14041
12
citations
#2189

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.

ICCV 2025posterarXiv:2507.07990
12
citations
#2190

Improving Equivariant Networks with Probabilistic Symmetry Breaking

Hannah Lawrence, Vasco Portilheiro, Yan Zhang et al.

ICLR 2025posterarXiv:2503.21985
12
citations
#2191

Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation

Jiyuan Wang, Chunyu Lin, cheng guan et al.

NEURIPS 2025posterarXiv:2503.15905
12
citations
#2192

ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments

Hojae Han, seung-won hwang, Rajhans Samdani et al.

ICLR 2025posterarXiv:2502.19852
12
citations
#2193

VisionArena: 230k Real World User-VLM Conversations with Preference Labels

Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.

CVPR 2025posterarXiv:2412.08687
12
citations
#2194

Ambient Diffusion Omni: Training Good Models with Bad Data

Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans et al.

NEURIPS 2025spotlightarXiv:2506.10038
12
citations
#2195

Imputation for prediction: beware of diminishing returns.

Marine Le Morvan, Gael Varoquaux

ICLR 2025posterarXiv:2407.19804
12
citations
#2196

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.

ICLR 2025posterarXiv:2407.18134
12
citations
#2197

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Advik Basani, Xiao Zhang

NEURIPS 2025posterarXiv:2411.14133
12
citations
#2198

Stable Segment Anything Model

Qi Fan, Xin Tao, Lei Ke et al.

ICLR 2025posterarXiv:2311.15776
12
citations
#2199

Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration

Zhixuan Shen, Haonan Luo, Kexun Chen et al.

AAAI 2025paperarXiv:2412.18292
12
citations
#2200

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

Jinluan Yang, Anke Tang, Didi Zhu et al.

ICLR 2025posterarXiv:2410.13910
12
citations