Most Cited 2025 "hierarchical joint embedding" Papers

22,274 papers found • Page 15 of 112

#2801

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

Zihan Zhang, Xiangyang Ji, Yuan Zhou

ICLR 2025posterarXiv:2110.08057
11
citations
#2802

DELTA: Pre-Train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment

Haitao Li, Qingyao Ai, Xinyan Han et al.

AAAI 2025paperarXiv:2403.18435
11
citations
#2803

Federated Learning with Sample-level Client Drift Mitigation

Haoran Xu, Jiaze Li, Wanyi Wu et al.

AAAI 2025paperarXiv:2501.11360
11
citations
#2804

SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining

Pei-Kai Huang, Jun-Xiong Chong, Cheng-Hsuan Chiang et al.

AAAI 2025paperarXiv:2503.19982
11
citations
#2805

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Qingtao Liu, Yu Cui, Zhengnan Sun et al.

ICLR 2025poster
11
citations
#2806

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Xianqiang Gao, Pingrui Zhang, Delin Qu et al.

AAAI 2025paperarXiv:2408.13024
11
citations
#2807

Large language models can learn and generalize steganographic chain-of-thought under process supervision

ROBERT MC CARTHY, Joey SKAF, Luis Ibanez-Lissen et al.

NEURIPS 2025posterarXiv:2506.01926
11
citations
#2808

Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts

Sukai Huang, Nir Lipovetzky, Trevor Cohn

AAAI 2025paperarXiv:2409.15915
11
citations
#2809

Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization

Abhishek Roy, Geelon So, Yian Ma

NEURIPS 2025poster
11
citations
#2810

Transformer-Squared: Self-adaptive LLMs

Qi Sun, Edoardo Cetin, Yujin Tang

ICLR 2025posterarXiv:2501.06252
11
citations
#2811

Efficient Model Editing with Task-Localized Sparse Fine-tuning

Leonardo Iurada, Marco Ciccone, Tatiana Tommasi

ICLR 2025posterarXiv:2504.02620
10
citations
#2812

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Minqian Liu, Zhiyang Xu, Xinyi Zhang et al.

COLM 2025paperarXiv:2504.10430
10
citations
#2813

Consistency Checks for Language Model Forecasters

Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.

ICLR 2025posterarXiv:2412.18544
10
citations
#2814

Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search

Shuyu Yang, Yaxiong Wang, Li Zhu et al.

ICCV 2025highlightarXiv:2411.17776
10
citations
#2815

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

CVPR 2025posterarXiv:2411.15720
10
citations
#2816

Emergent Temporal Correspondences from Video Diffusion Transformers

Jisu Nam, Soowon Son, Dahyun Chung et al.

NEURIPS 2025oralarXiv:2506.17220
10
citations
#2817

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025posterarXiv:2506.00070
10
citations
#2818

DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation

Jiangran Lyu, Ziming Li, Xuesong Shi et al.

ICCV 2025posterarXiv:2503.16806
10
citations
#2819

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

Fengxiang Wang, Yulin Wang, Mingshuo Chen et al.

NEURIPS 2025posterarXiv:2503.10392
10
citations
#2820

MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output

Yanyuan Chen, Dexuan Xu, Yu Huang et al.

CVPR 2025posterarXiv:2510.10011
10
citations
#2821

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.

CVPR 2025posterarXiv:2411.14716
10
citations
#2822

Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization

Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.

ICLR 2025posterarXiv:2408.10672
10
citations
#2823

Deep Linear Probe Generators for Weight Space Learning

Jonathan Kahana, Eliahu Horwitz, Imri Shuval et al.

ICLR 2025posterarXiv:2410.10811
10
citations
#2824

Unifying Autoregressive and Diffusion-Based Sequence Generation

Nima Fathi, Torsten Scholak, Pierre-Andre Noel

COLM 2025paperarXiv:2504.06416
10
citations
#2825

Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion

Yan Rong, Li Liu

AAAI 2025paperarXiv:2409.00700
10
citations
#2826

Confidence Estimation for Error Detection in Text-to-SQL Systems

Oleg Somov, Elena Tutubalina

AAAI 2025paperarXiv:2501.09527
10
citations
#2827

Towards Training-free Anomaly Detection with Vision and Language Foundation Models

Jinjin Zhang, Guodong Wang, yizhou jin et al.

CVPR 2025posterarXiv:2503.18325
10
citations
#2828

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Tony Alex, Sara Atito, Armin Mustafa et al.

ICLR 2025posterarXiv:2506.12222
10
citations
#2829

AdvPrefix: An Objective for Nuanced LLM Jailbreaks

Sicheng Zhu, Brandon Amos, Yuandong Tian et al.

NEURIPS 2025posterarXiv:2412.10321
10
citations
#2830

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion

Jingyuan Chen, Fuchen Long, Jie An et al.

AAAI 2025paperarXiv:2501.09019
10
citations
#2831

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

CVPR 2025highlightarXiv:2411.15482
10
citations
#2832

Flexible Frame Selection for Efficient Video Reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.

CVPR 2025poster
10
citations
#2833

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation

Yichen Xie, Runsheng Xu, Tong He et al.

CVPR 2025poster
10
citations
#2834

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Peiye Zhuang, Songfang Han, Chaoyang Wang et al.

ICLR 2025posterarXiv:2406.05649
10
citations
#2835

Vector-ICL: In-context Learning with Continuous Vector Representations

Yufan Zhuang, Chandan Singh, Liyuan Liu et al.

ICLR 2025posterarXiv:2410.05629
10
citations
#2836

Di[M]O: Distilling Masked Diffusion Models into One-step Generator

Yuanzhi Zhu, Xi WANG, Stéphane Lathuilière et al.

ICCV 2025poster
10
citations
#2837

Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier

Lu Yi, Zhewei Wei

ICLR 2025posterarXiv:2408.09212
10
citations
#2838

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Yizhang Zhu, Runzhi JIANG, Boyan Li et al.

COLM 2025paperarXiv:2503.22402
10
citations
#2839

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Soumyadeep Pal, Changsheng Wang, James Diffenderfer et al.

COLM 2025paperarXiv:2504.10185
10
citations
#2840

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference

Zhuomin He, Yizhen Yao, Pengfei Zuo et al.

AAAI 2025paperarXiv:2501.02336
10
citations
#2841

WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image

Yuci Liang, Xinheng Lyu, Meidan Ding et al.

ICCV 2025posterarXiv:2412.02141
10
citations
#2842

Unleashing Hour-Scale Video Training for Long Video-Language Understanding

Jingyang Lin, Jialian Wu, Ximeng Sun et al.

NEURIPS 2025oralarXiv:2506.05332
10
citations
#2843

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

Jianrong Zhang, Hehe Fan, Yi Yang

CVPR 2025highlightarXiv:2412.14706
10
citations
#2844

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paperarXiv:2504.14439
10
citations
#2845

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Jin Wang, Chenghui Lv, Xian Li et al.

CVPR 2025posterarXiv:2503.15024
10
citations
#2846

Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation

Wenhui Tan, Boyuan Li, Chuhao Jin et al.

ICLR 2025poster
10
citations
#2847

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control

Hannah Cyberey, David Evans

COLM 2025paper
10
citations
#2848

Law of the Weakest Link: Cross Capabilities of Large Language Models

Ming Zhong, Aston Zhang, Xuewei Wang et al.

ICLR 2025posterarXiv:2409.19951
10
citations
#2849

ID-Patch: Robust ID Association for Group Photo Personalization

Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.

CVPR 2025posterarXiv:2411.13632
10
citations
#2850

Scalable Surrogate Verification of Image-Based Neural Network Control Systems Using Composition and Unrolling

Feiyang Cai, Chuchu Fan, Stanley Bak

AAAI 2025paperarXiv:2405.18554
10
citations
#2851

Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data

Phillip Si, Peng Chen

ICLR 2025posterarXiv:2409.00127
10
citations
#2852

CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP

Songlong Xing, Zhengyu Zhao, Nicu Sebe

CVPR 2025posterarXiv:2503.03613
10
citations
#2853

Data-Driven Performance Guarantees for Classical and Learned Optimizers

Rajiv Sambharya, Bartolomeo Stellato

NEURIPS 2025posterarXiv:2404.13831
10
citations
#2854

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

CVPR 2025posterarXiv:2503.20781
10
citations
#2855

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.

COLM 2025paperarXiv:2312.00267
10
citations
#2856

3D Mesh Editing using Masked LRMs

William Gao, Dilin Wang, Yuchen Fan et al.

ICCV 2025posterarXiv:2412.08641
10
citations
#2857

ADAM: An Embodied Causal Agent in Open-World Environments

Shu Yu, Chaochao Lu

ICLR 2025posterarXiv:2410.22194
10
citations
#2858

General Scene Adaptation for Vision-and-Language Navigation

Haodong Hong, Yanyuan Qiao, Sen Wang et al.

ICLR 2025posterarXiv:2501.17403
10
citations
#2859

ConfTuner: Training Large Language Models to Express Their Confidence Verbally

Yibo Li, Miao Xiong, Jiaying Wu et al.

NEURIPS 2025posterarXiv:2508.18847
10
citations
#2860

Label-Free Backdoor Attacks in Vertical Federated Learning

Wei Shen, Wenke Huang, Guancheng Wan et al.

AAAI 2025paper
10
citations
#2861

Fluid Language Model Benchmarking

Valentin Hofmann, David Heineman, Ian Magnusson et al.

COLM 2025paperarXiv:2509.11106
10
citations
#2862

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

Yiyou Sun, Yu Gai, Lijie Chen et al.

NEURIPS 2025posterarXiv:2504.12691
10
citations
#2863

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Zhitao He, Zijun Liu, Peng Li et al.

COLM 2025paperarXiv:2502.14496
10
citations
#2864

Aligning Human Motion Generation with Human Perceptions

Haoru Wang, Wentao Zhu, Luyi Miao et al.

ICLR 2025posterarXiv:2407.02272
10
citations
#2865

STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

Zhimin Liao, Ping Wei, Shuaijia Chen et al.

CVPR 2025posterarXiv:2504.19749
10
citations
#2866

Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data

Seiji Maekawa, Hayate Iso, Nikita Bhutani

ICLR 2025posterarXiv:2410.11996
10
citations
#2867

Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment

Johannes Schusterbauer, Ming Gui, Frank Fundel et al.

CVPR 2025posterarXiv:2506.02221
10
citations
#2868

More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness

Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju

ICLR 2025posterarXiv:2404.18870
10
citations
#2869

Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise

Brayan Monroy, Jorge Bacca, Julián Tachella

CVPR 2025posterarXiv:2412.04648
10
citations
#2870

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Nikhil Kandpal, Brian Lester, Colin Raffel et al.

NEURIPS 2025posterarXiv:2506.05209
10
citations
#2871

Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling

Jingyun Xue, WANG HongFa, Qi Tian et al.

ICLR 2025posterarXiv:2406.03035
10
citations
#2872

First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training

Lai Wei, Yuting Li, Chen Wang et al.

NEURIPS 2025posterarXiv:2505.22453
10
citations
#2873

EqNIO: Subequivariant Neural Inertial Odometry

Royina Karegoudra Jayanth, Yinshuang Xu, Ziyun Wang et al.

ICLR 2025posterarXiv:2408.06321
10
citations
#2874

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

Grace Liu, Michael Tang, Benjamin Eysenbach

ICLR 2025posterarXiv:2408.05804
10
citations
#2875

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

Jinhong Deng, Yuhang Yang, Wen Li et al.

CVPR 2025posterarXiv:2411.15851
10
citations
#2876

MagicColor: Multi-instance Sketch Colorization

yinhan Zhang, Yue Ma, Bingyuan Wang et al.

ICCV 2025poster
10
citations
#2877

Gaussian Eigen Models for Human Heads

Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.

CVPR 2025posterarXiv:2407.04545
10
citations
#2878

Deep MMD Gradient Flow without adversarial training

Alexandre Galashov, Valentin De Bortoli, Arthur Gretton

ICLR 2025posterarXiv:2405.06780
10
citations
#2879

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Junlei Zhang, Zichen Ding, Chang Ma et al.

COLM 2025paperarXiv:2504.10127
10
citations
#2880

Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance

Sachin Goyal, Christina Baek, Zico Kolter et al.

ICLR 2025poster
10
citations
#2881

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

Minghe Gao, Xuqi Liu, Zhongqi Yue et al.

ICCV 2025posterarXiv:2504.06606
10
citations
#2882

Causal Inference over Visual-Semantic-Aligned Graph for Image Classification

Lei Meng, Xiangxian Li, Xiaoshuo Yan et al.

AAAI 2025paper
10
citations
#2883

Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses

David Glukhov, Ziwen Han, I Shumailov et al.

ICLR 2025posterarXiv:2407.02551
10
citations
#2884

Rethinking Visual Counterfactual Explanations Through Region Constraint

Bartlomiej Sobieski, Jakub Grzywaczewski, Bartłomiej Sadlej et al.

ICLR 2025posterarXiv:2410.12591
10
citations
#2885

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

ICLR 2025posterarXiv:2405.16869
10
citations
#2886

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025paperarXiv:2501.06773
10
citations
#2887

The 3D-PC: a benchmark for visual perspective taking in humans and machines

Drew Linsley, Peisen Zhou, Alekh Ashok et al.

ICLR 2025posterarXiv:2406.04138
10
citations
#2888

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Zihao Zhang, Haoran Chen, Haoyu Zhao et al.

CVPR 2025posterarXiv:2503.15831
10
citations
#2889

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.

CVPR 2025posterarXiv:2411.16863
10
citations
#2890

Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models

Huajie Tan, Yuheng Ji, Xiaoshuai Hao et al.

NEURIPS 2025posterarXiv:2503.20752
10
citations
#2891

Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks

Kairong Yu, Chengting Yu, Tianqing Zhang et al.

CVPR 2025posterarXiv:2503.03144
10
citations
#2892

Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

Weikai Li, Ding Wang, Zijian Ding et al.

AAAI 2025paperarXiv:2410.19225
10
citations
#2893

DocVLM: Make Your VLM an Efficient Reader

Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.

CVPR 2025posterarXiv:2412.08746
10
citations
#2894

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

Linzhou Li, Yumeng Li, Yanlin Weng et al.

CVPR 2025highlightarXiv:2503.12886
10
citations
#2895

V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection

Xun Huang, Jinlong Wang, Qiming Xia et al.

CVPR 2025posterarXiv:2411.08402
10
citations
#2896

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Suraj Anand, Michael Lepori, Jack Merullo et al.

ICLR 2025posterarXiv:2406.00053
10
citations
#2897

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Yikang Zhou, Tao Zhang, Shilin Xu et al.

ICCV 2025posterarXiv:2501.04670
10
citations
#2898

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

Claas Voelcker, Marcel Hussing, ERIC EATON et al.

ICLR 2025posterarXiv:2410.08896
10
citations
#2899

PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors

Guangshun Wei, Yuan Feng, Long Ma et al.

CVPR 2025posterarXiv:2411.19036
10
citations
#2900

Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling

Fengxiang Wang, Hongzhen Wang, Di Wang et al.

ICCV 2025posterarXiv:2406.11933
10
citations
#2901

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

yuntao du, Kailin Jiang, Zhi Gao et al.

ICLR 2025posterarXiv:2502.19870
10
citations
#2902

Visual Generation Without Guidance

Huayu Chen, Kai Jiang, Kaiwen Zheng et al.

ICML 2025posterarXiv:2501.15420
10
citations
#2903

Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

Haotian Ju, Hongyang Zhang, Dongyue Li

ICLR 2025posterarXiv:2306.08553
10
citations
#2904

WildSAT: Learning Satellite Image Representations from Wildlife Observations

Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.

ICCV 2025posterarXiv:2412.14428
10
citations
#2905

ParZC: Parametric Zero-Cost Proxies for Efficient NAS

Peijie Dong, Lujun Li, Zhenheng Tang et al.

AAAI 2025paperarXiv:2402.02105
10
citations
#2906

VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction

Zijian He, Yuwei Ning, Yipeng Qin et al.

CVPR 2025posterarXiv:2503.12165
10
citations
#2907

SuperDec: 3D Scene Decomposition with Superquadrics Primitives

Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.

ICCV 2025posterarXiv:2504.00992
10
citations
#2908

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Jin Zhou, Kaiwen Wang, Jonathan Chang et al.

NEURIPS 2025posterarXiv:2502.20548
10
citations
#2909

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.

ICCV 2025posterarXiv:2504.14032
10
citations
#2910

Knowledge Graph Completion with Relation-Aware Anchor Enhancement

Duanyang Yuan, Sihang Zhou, Xiaoshu Chen et al.

AAAI 2025paperarXiv:2504.06129
10
citations
#2911

Generative Monoculture in Large Language Models

Fan Wu, Emily Black, Varun Chandrasekaran

ICLR 2025posterarXiv:2407.02209
10
citations
#2912

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Gang Li, Ming Lin, Tomer Galanti et al.

NEURIPS 2025posterarXiv:2505.12366
10
citations
#2913

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025posterarXiv:2504.11199
10
citations
#2914

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2025posterarXiv:2503.16282
10
citations
#2915

Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective

Kaifang Long, Guoyang Xie, Lianbo Ma et al.

AAAI 2025paperarXiv:2412.17297
10
citations
#2916

MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding

Weikang Qiu, Zheng Huang, Haoyu Hu et al.

ICML 2025posterarXiv:2502.15786
10
citations
#2917

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.

NEURIPS 2025posterarXiv:2504.11651
10
citations
#2918

Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution

Du Chen, Liyi Chen, Zhengqiang ZHANG et al.

ICCV 2025posterarXiv:2501.06838
10
citations
#2919

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Yiren Song, Cheng Liu, Mike Zheng Shou

NEURIPS 2025posterarXiv:2505.18445
10
citations
#2920

ReCap: Better Gaussian Relighting with Cross-Environment Captures

Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.

CVPR 2025posterarXiv:2412.07534
10
citations
#2921

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.

ICML 2025posterarXiv:2403.10266
10
citations
#2922

Anyprefer: An Agentic Framework for Preference Data Synthesis

Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.

ICLR 2025posterarXiv:2504.19276
10
citations
#2923

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Minhyuk Seo, Hyunseo Koh, Jonghyun Choi

ICLR 2025posterarXiv:2410.15143
10
citations
#2924

Open-World Amodal Appearance Completion

Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.

CVPR 2025posterarXiv:2411.13019
10
citations
#2925

The Double-Ellipsoid Geometry of CLIP

Meir Yossef Levi, Guy Gilboa

ICML 2025posterarXiv:2411.14517
10
citations
#2926

Efficiently Parameterized Neural Metriplectic Systems

Anthony Gruber, Kookjin Lee, Haksoo Lim et al.

ICLR 2025posterarXiv:2405.16305
10
citations
#2927

Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models

Yingqing Guo, Yukang Yang, Hui Yuan et al.

NEURIPS 2025posterarXiv:2502.11420
10
citations
#2928

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Thomas Schmied, Thomas Adler, Vihang Patil et al.

ICML 2025posterarXiv:2410.22391
10
citations
#2929

DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

Zheng Chen, Zichen Zou, Kewei Zhang et al.

NEURIPS 2025posterarXiv:2505.16239
10
citations
#2930

MINERVA: Evaluating Complex Video Reasoning

Arsha Nagrani, Sachit Menon, Ahmet Iscen et al.

ICCV 2025posterarXiv:2505.00681
10
citations
#2931

Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems

Jie Chen

ICLR 2025posterarXiv:2406.00809
10
citations
#2932

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.

NEURIPS 2025posterarXiv:2504.13169
10
citations
#2933

HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution

Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.

CVPR 2025posterarXiv:2412.03748
10
citations
#2934

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang, Liheng Chen, Pengan CHEN et al.

ICLR 2025posterarXiv:2410.00938
10
citations
#2935

Probing the Latent Hierarchical Structure of Data via Diffusion Models

Antonio Sclocchi, Alessandro Favero, Noam Levi et al.

ICLR 2025posterarXiv:2410.13770
10
citations
#2936

Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation

Ling-An Zeng, Guohong Huang, Gaojie Wu et al.

AAAI 2025paperarXiv:2412.11193
10
citations
#2937

SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input

Zhen Lv, Yangqi Long, Congzhentao Huang et al.

CVPR 2025posterarXiv:2411.11934
10
citations
#2938

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Zongbo Han, Jialong Yang, Guangyu Wang et al.

NEURIPS 2025posterarXiv:2409.19375
10
citations
#2939

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

Jing He, Haodong Li, huyongzhe et al.

ICLR 2025posterarXiv:2410.02067
10
citations
#2940

SlerpFace: Face Template Protection via Spherical Linear Interpolation

Zhizhou Zhong, Yuxi Mi, Yuge Huang et al.

AAAI 2025paperarXiv:2407.03043
10
citations
#2941

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

Tianyu Huai, Jie Zhou, Xingjiao Wu et al.

CVPR 2025highlightarXiv:2503.00413
10
citations
#2942

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

Ruiyu Wang, Yu Yuan, Shizhao Sun et al.

ICML 2025posterarXiv:2501.19054
10
citations
#2943

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen, Jianfei Yang

ICLR 2025posterarXiv:2410.10167
10
citations
#2944

Measuring memorization in RLHF for code completion

Jamie Hayes, I Shumailov, Billy Porter et al.

ICLR 2025posterarXiv:2406.11715
10
citations
#2945

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

Xiangchen Yin, Donglin Di, Lei Fan et al.

AAAI 2025paperarXiv:2408.16540
10
citations
#2946

Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

Yizhou Huang, Yihua Cheng, Kezhi Wang

CVPR 2025posterarXiv:2503.10898
10
citations
#2947

PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems

Bocheng Zeng, Qi Wang, Mengtao Yan et al.

ICLR 2025oralarXiv:2410.01337
10
citations
#2948

Visual Test-time Scaling for GUI Agent Grounding

Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.

ICCV 2025highlightarXiv:2505.00684
10
citations
#2949

Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation

Ting Liu, Siyuan Li

CVPR 2025posterarXiv:2504.00356
10
citations
#2950

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem et al.

ICLR 2025posterarXiv:2410.23168
10
citations
#2951

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models

Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.

CVPR 2025posterarXiv:2503.19157
10
citations
#2952

Layered Image Vectorization via Semantic Simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun et al.

CVPR 2025posterarXiv:2406.05404
10
citations
#2953

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

Yuchen Sun, Shanhui Zhao, Tao Yu et al.

CVPR 2025posterarXiv:2503.17709
10
citations
#2954

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning

Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.

CVPR 2025posterarXiv:2503.00823
10
citations
#2955

Radiology Report Generation via Multi-objective Preference Optimization

Ting Xiao, Lei Shi, Peng Liu et al.

AAAI 2025paperarXiv:2412.08901
10
citations
#2956

CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Qinsi Wang, Hancheng Ye, Ming-Yu Chung et al.

ICML 2025posterarXiv:2505.19235
10
citations
#2957

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.

CVPR 2025posterarXiv:2408.12340
10
citations
#2958

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Yatai Ji, Jiacheng Zhang, Jie Wu et al.

ICCV 2025posterarXiv:2412.15156
10
citations
#2959

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval

Likai Tian, Jian Zhao, Zechao Hu et al.

CVPR 2025highlight
10
citations
#2960

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Siqiao Mu, Diego Klabjan

NEURIPS 2025posterarXiv:2409.09778
10
citations
#2961

Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think

Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.

CVPR 2025highlightarXiv:2503.00948
10
citations
#2962

Reliable and Efficient Amortized Model-based Evaluation

Sang Truong, Yuheng Tu, Percy Liang et al.

ICML 2025posterarXiv:2503.13335
10
citations
#2963

Node-Time Conditional Prompt Learning in Dynamic Graphs

Xingtong Yu, Zhenghao Liu, Xinming Zhang et al.

ICLR 2025oralarXiv:2405.13937
10
citations
#2964

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model

Dongki Kim, Wonbin Lee, Sung Ju Hwang

NEURIPS 2025posterarXiv:2502.13449
10
citations
#2965

Enhancing Multilingual LLM Pretraining with Model-Based Data Selection

Bettina Messmer, Vinko Sabolčec, Martin Jaggi

NEURIPS 2025posterarXiv:2502.10361
10
citations
#2966

Spectral Image Tokenizer

Carlos Esteves, Mohammed Suhail, Ameesh Makadia

ICCV 2025posterarXiv:2412.09607
10
citations
#2967

RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis

yifei feng, Mx Yang, Shuhui Yang et al.

ICCV 2025posterarXiv:2503.19011
10
citations
#2968

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Yi Yu, Botao Ren, Peiyuan Zhang et al.

CVPR 2025posterarXiv:2502.04268
10
citations
#2969

AlphaPO: Reward Shape Matters for LLM Alignment

Aman Gupta, Shao Tang, Qingquan Song et al.

ICML 2025posterarXiv:2501.03884
10
citations
#2970

Attention as a Hypernetwork

Simon Schug, Seijin Kobayashi, Yassir Akram et al.

ICLR 2025posterarXiv:2406.05816
10
citations
#2971

Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?

Yifan Feng, Chengwu Yang, Xingliang Hou et al.

ICLR 2025posterarXiv:2410.10083
10
citations
#2972

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians

Xiaobao Wei, Peng Chen, Ming Lu et al.

AAAI 2025paperarXiv:2412.13983
10
citations
#2973

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Hao Li, Xiaogeng Liu, CHIU Chun et al.

NEURIPS 2025posterarXiv:2506.12104
10
citations
#2974

Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

Qitao Tan, Jun Liu, Zheng Zhan et al.

NEURIPS 2025posterarXiv:2502.03304
10
citations
#2975

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Wenhao Wang, Yi Yang

NEURIPS 2025posterarXiv:2503.01739
10
citations
#2976

CLEVER: A Curated Benchmark for Formally Verified Code Generation

Amitayush Thakur, Jasper Lee, George Tsoukalas et al.

NEURIPS 2025posterarXiv:2505.13938
10
citations
#2977

Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks

Tiago Novello, Diana Aldana Moreno, André Araujo et al.

CVPR 2025highlightarXiv:2407.21121
10
citations
#2978

Unifying 2D and 3D Vision-Language Understanding

Ayush Jain, Alexander Swerdlow, Yuzhou Wang et al.

ICML 2025posterarXiv:2503.10745
10
citations
#2979

Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation

Changshuo Wang, Shuting He, Xiang Fang et al.

AAAI 2025paperarXiv:2504.02454
10
citations
#2980

Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?

Jonathan Roberts, Kai Han, Samuel Albanie

ICLR 2025posterarXiv:2411.05000
10
citations
#2981

ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks

Feiyang Wang, Xingquan Zuo, Hai Huang et al.

AAAI 2025paper
10
citations
#2982

The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

Jae-Won Chung, Jeff J. Ma, Ruofan Wu et al.

NEURIPS 2025spotlightarXiv:2505.06371
10
citations
#2983

A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities

Han-Jia Ye, Si-Yang Liu, Wei-Lun (Harry) Chao

NEURIPS 2025posterarXiv:2502.17361
10
citations
#2984

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Yangsibo Huang, Daogao Liu, Lynn Chua et al.

ICLR 2025posterarXiv:2410.09591
10
citations
#2985

FormalAlign: Automated Alignment Evaluation for Autoformalization

Jianqiao Lu, Yingjia Wan, Yinya Huang et al.

ICLR 2025posterarXiv:2410.10135
10
citations
#2986

CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

Hao He, Ceyuan Yang, Shanchuan Lin et al.

ICCV 2025posterarXiv:2503.10592
10
citations
#2987

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

Jiaru Zou, Yikun Ban, Zihao Li et al.

NEURIPS 2025spotlightarXiv:2505.16270
10
citations
#2988

DreamRelation: Bridging Customization and Relation Generation

Qingyu Shi, Lu Qi, Jianzong Wu et al.

CVPR 2025posterarXiv:2410.23280
10
citations
#2989

Periodic Materials Generation using Text-Guided Joint Diffusion Model

KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal et al.

ICLR 2025posterarXiv:2503.00522
10
citations
#2990

QERA: an Analytical Framework for Quantization Error Reconstruction

Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.

ICLR 2025posterarXiv:2410.06040
10
citations
#2991

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

Qihan Huang, Weilong Dai, Jinlong Liu et al.

CVPR 2025posterarXiv:2412.03177
10
citations
#2992

Attention layers provably solve single-location regression

Pierre Marion, Raphaël Berthier, Gérard Biau et al.

ICLR 2025posterarXiv:2410.01537
10
citations
#2993

Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution

Karam Park, Jae Woong Soh, Nam Ik Cho

AAAI 2025paperarXiv:2501.15774
10
citations
#2994

Scaling Laws for Optimal Data Mixtures

Mustafa Shukor, Louis Bethune, Dan Busbridge et al.

NEURIPS 2025posterarXiv:2507.09404
10
citations
#2995

Boosting Latent Diffusion with Perceptual Objectives

Tariq Berrada, Pietro Astolfi, Melissa Hall et al.

ICLR 2025posterarXiv:2411.04873
10
citations
#2996

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.

ICCV 2025posterarXiv:2501.14317
10
citations
#2997

X-Dancer: Expressive Music to Human Dance Video Generation

Zeyuan Chen, Hongyi Xu, Guoxian Song et al.

ICCV 2025highlightarXiv:2502.17414
10
citations
#2998

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Yumeng Li, William H Beluch, Margret Keuper et al.

ICLR 2025oralarXiv:2403.13501
10
citations
#2999

DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval

Yating Liu, Zimo Liu, Xiangyuan Lan et al.

AAAI 2025paperarXiv:2503.04144
10
citations
#3000

Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity

Yam Eitan, Yoav Gelberg, Guy Bar-Shalom et al.

ICLR 2025posterarXiv:2408.05486
10
citations