Most Cited 2025 "cascaded denoising diffusion" Papers

22,274 papers found • Page 14 of 112

#2601

BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers

Hui Zhang, Tingwei Gao, Jie Shao et al.

CVPR 2025posterarXiv:2503.15927
11
citations
#2602

LBM: Latent Bridge Matching for Fast Image-to-Image Translation

Clément Chadebec, Onur Tasar, Sanjeev Sreetharan et al.

ICCV 2025highlightarXiv:2503.07535
11
citations
#2603

NoT: Federated Unlearning via Weight Negation

Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.

CVPR 2025posterarXiv:2503.05657
11
citations
#2604

Locality Alignment Improves Vision-Language Models

Ian Covert, Tony Sun, James Y Zou et al.

ICLR 2025posterarXiv:2410.11087
11
citations
#2605

Zero-Shot Low-Light Image Enhancement via Latent Diffusion Models

Yan Huang, Xiaoshan Liao, Jinxiu Liang et al.

AAAI 2025paper
11
citations
#2606

KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning

Wei Sun, Wen Yang, Pu Jian et al.

NEURIPS 2025posterarXiv:2505.16826
11
citations
#2607

Enhancing Trustworthiness of Graph Neural Networks with Rank-Based Conformal Training

Ting Wang, Zhixin Zhou, Rui Luo

AAAI 2025paperarXiv:2501.02767
11
citations
#2608

Federated Learning with Sample-level Client Drift Mitigation

Haoran Xu, Jiaze Li, Wanyi Wu et al.

AAAI 2025paperarXiv:2501.11360
11
citations
#2609

SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining

Pei-Kai Huang, Jun-Xiong Chong, Cheng-Hsuan Chiang et al.

AAAI 2025paperarXiv:2503.19982
11
citations
#2610

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Lihan Jiang, Kerui Ren, Mulin Yu et al.

CVPR 2025posterarXiv:2412.01745
11
citations
#2611

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Peng Liu, Dongyang Dai, Zhiyong Wu

ICLR 2025posterarXiv:2403.05010
11
citations
#2612

DELTA: Pre-Train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment

Haitao Li, Qingyao Ai, Xinyan Han et al.

AAAI 2025paperarXiv:2403.18435
11
citations
#2613

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Yongchao Chen, Yilun Hao, Yueying Liu et al.

ICML 2025posterarXiv:2502.04350
11
citations
#2614

h4rm3l: A Language for Composable Jailbreak Attack Synthesis

Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia et al.

ICLR 2025posterarXiv:2408.04811
11
citations
#2615

Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Sohyun An, Ruochen Wang, Tianyi Zhou et al.

NEURIPS 2025poster
11
citations
#2616

CoRe: Benchmarking LLMs’ Code Reasoning Capabilities through Static Analysis Tasks

Danning Xie, Mingwei Zheng, Xuwei Liu et al.

NEURIPS 2025spotlightarXiv:2507.05269
11
citations
#2617

Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts

Sukai Huang, Nir Lipovetzky, Trevor Cohn

AAAI 2025paperarXiv:2409.15915
11
citations
#2618

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2025posterarXiv:2504.00430
11
citations
#2619

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025posterarXiv:2412.12032
11
citations
#2620

Learning the RoPEs: Better 2D and 3D Position Encodings with STRING

Connor Schenck, Isaac Reid, Mithun Jacob et al.

ICML 2025spotlightarXiv:2502.02562
11
citations
#2621

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

Jie Zhang, Cezara Petrui, Kristina Nikolić et al.

NEURIPS 2025posterarXiv:2505.12575
11
citations
#2622

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Ge Li, Dong Tian, Hongyi Zhou et al.

ICLR 2025oralarXiv:2410.09536
11
citations
#2623

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Xianqiang Gao, Pingrui Zhang, Delin Qu et al.

AAAI 2025paperarXiv:2408.13024
11
citations
#2624

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025posterarXiv:2503.14483
11
citations
#2625

KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.

AAAI 2025paperarXiv:2408.03297
11
citations
#2626

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

Jian Liang, Wenke Huang, Guancheng Wan et al.

CVPR 2025posterarXiv:2503.16843
11
citations
#2627

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Khaoula Chehbouni, Mohammed Haddou, Jackie CK Cheung et al.

NEURIPS 2025posterarXiv:2508.18076
11
citations
#2628

What's the Move? Hybrid Imitation Learning via Salient Points

Priya Sundaresan, Hengyuan Hu, Quan Vuong et al.

ICLR 2025posterarXiv:2412.05426
11
citations
#2629

Lightweight Neural App Control

Filippos Christianos, Georgios Papoudakis, Thomas Coste et al.

ICLR 2025posterarXiv:2410.17883
11
citations
#2630

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Hanyang Wang, Fangfu Liu, Jiawei Chi et al.

CVPR 2025highlightarXiv:2504.01956
11
citations
#2631

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

Luca Della Libera, Francesco Paissan, Cem Subakan et al.

NEURIPS 2025posterarXiv:2502.04465
11
citations
#2632

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

Fan Nie, Lan Feng, Haotian Ye et al.

COLM 2025paperarXiv:2504.04785
11
citations
#2633

Long-Sequence Recommendation Models Need Decoupled Embeddings

Ningya Feng, Junwei Pan, Jialong Wu et al.

ICLR 2025posterarXiv:2410.02604
11
citations
#2634

Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports

Yi Xu, Yun Fu

ICLR 2025oralarXiv:2405.17680
11
citations
#2635

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Weixi Feng, Chao Liu, Sifei Liu et al.

CVPR 2025posterarXiv:2501.07647
11
citations
#2636

Understanding Emotional Body Expressions via Large Language Models

Haifeng Lu, Jiuyi Chen, Feng Liang et al.

AAAI 2025paperarXiv:2412.12581
11
citations
#2637

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Qingtao Liu, Yu Cui, Zhengnan Sun et al.

ICLR 2025poster
11
citations
#2638

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

Philippe Pasquier, Jeff Ens, Nathan Fradet et al.

AAAI 2025paperarXiv:2501.17011
11
citations
#2639

AI-Researcher: Autonomous Scientific Innovation

Jiabin Tang, Lianghao Xia, Zhonghang Li et al.

NEURIPS 2025spotlightarXiv:2505.18705
11
citations
#2640

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding

Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.

ICCV 2025posterarXiv:2411.14401
11
citations
#2641

Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models

Pit Neitemeier, Björn Deiseroth, Constantin Eichenberg et al.

ICLR 2025posterarXiv:2501.10322
11
citations
#2642

Lifting Motion to the 3D World via 2D Diffusion

Jiaman Li, Karen Liu, Jiajun Wu

CVPR 2025highlightarXiv:2411.18808
11
citations
#2643

Revisiting Tampered Scene Text Detection in the Era of Generative AI

Chenfan Qu, Yiwu Zhong, Fengjun Guo et al.

AAAI 2025paperarXiv:2407.21422
11
citations
#2644

Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective

Minh Le, Tien Ngoc Luu, An Nguyen The et al.

AAAI 2025paperarXiv:2412.08285
11
citations
#2645

From Words to Worth: Newborn Article Impact Prediction with LLM

Penghai Zhao, Qinghua Xing, Kairan Dou et al.

AAAI 2025paperarXiv:2408.03934
11
citations
#2646

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.

CVPR 2025posterarXiv:2412.00174
11
citations
#2647

LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation

Chenxu Zhou, Lvchang Fu, Sida Peng et al.

CVPR 2025posterarXiv:2412.15199
11
citations
#2648

DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution

Xingyuan Li, Zirui Wang, Yang Zou et al.

CVPR 2025posterarXiv:2503.01187
11
citations
#2649

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Xiaomeng Chu, Jiajun Deng, Guoliang You et al.

CVPR 2025posterarXiv:2412.12725
11
citations
#2650

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.

CVPR 2025highlightarXiv:2503.05082
11
citations
#2651

Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen

CVPR 2025posterarXiv:2504.01957
11
citations
#2652

IgGM: A Generative Model for Functional Antibody and Nanobody Design

Rubo Wang, Fandi Wu, Xingyu Gao et al.

ICLR 2025poster
11
citations
#2653

Audio-Visual Instance Segmentation

Ruohao Guo, Xianghua Ying, Yaru Chen et al.

CVPR 2025posterarXiv:2310.18709
11
citations
#2654

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Fengxiang Wang, Mingshuo Chen, Yueying Li et al.

NEURIPS 2025spotlightarXiv:2505.21375
11
citations
#2655

Integrated Augmented and Virtual Reality Technologies for Realistic Fire Drill Training

Hosan Kang, Jinseong Yang, Beom-Seok Ko et al.

ISMAR 2025paper
11
citations
#2656

DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response

Junjue Wang, Weihao Xuan, Heli Qi et al.

NEURIPS 2025oralarXiv:2505.21089
11
citations
#2657

Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

Can Demircan, Tankred Saanum, Akshay Jagadish et al.

ICLR 2025oralarXiv:2410.01280
11
citations
#2658

Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models in Multi-turn Interactions

Hao Yang, Lizhen Qu, Ehsan Shareghi et al.

COLM 2025paper
11
citations
#2659

FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video

Yue Gao, Hong-Xing Yu, Bo Zhu et al.

CVPR 2025posterarXiv:2503.04720
11
citations
#2660

Transformer-Squared: Self-adaptive LLMs

Qi Sun, Edoardo Cetin, Yujin Tang

ICLR 2025posterarXiv:2501.06252
11
citations
#2661

On Linear Representations and Pretraining Data Frequency in Language Models

Jack Merullo, Noah Smith, Sarah Wiegreffe et al.

ICLR 2025posterarXiv:2504.12459
11
citations
#2662

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

Baoqi Pei, Yifei Huang, Jilan Xu et al.

ICLR 2025posterarXiv:2503.00986
11
citations
#2663

Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery

Nicholas Clark, Hua Shen, Bill Howe et al.

COLM 2025paperarXiv:2504.01205
11
citations
#2664

DG-Mamba: Robust and Efficient Dynamic Graph Structure Learning with Selective State Space Models

Haonan Yuan, Qingyun Sun, Zhaonan Wang et al.

AAAI 2025paperarXiv:2412.08160
11
citations
#2665

Skill Expansion and Composition in Parameter Space

Tenglong Liu, Jianxiong Li, Yinan Zheng et al.

ICLR 2025posterarXiv:2502.05932
11
citations
#2666

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Muzhi Zhu, Yuzhuo Tian, Hao Chen et al.

CVPR 2025posterarXiv:2503.08625
11
citations
#2667

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025poster
11
citations
#2668

OmniStyle: Filtering High Quality Style Transfer Data at Scale

Ye Wang, Ruiqi Liu, Jiang Lin et al.

CVPR 2025posterarXiv:2505.14028
11
citations
#2669

Geometry of Lightning Self-Attention: Identifiability and Dimension

Nathan Henry, Giovanni Luca Marchetti, Kathlén Kohn

ICLR 2025posterarXiv:2408.17221
11
citations
#2670

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

Xinyue Zhu, Binghao Huang, Yunzhu Li

NEURIPS 2025posterarXiv:2507.15062
11
citations
#2671

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Jiange Yang, Haoyi Zhu, Yating Wang et al.

CVPR 2025posterarXiv:2411.14519
11
citations
#2672

Selective Visual Prompting in Vision Mamba

Yifeng Yao, Zichen Liu, Zhenyu Cui et al.

AAAI 2025paperarXiv:2412.08947
11
citations
#2673

Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection

Jiahao Xu, Zikai Zhang, Rui Hu

CVPR 2025highlightarXiv:2503.07978
11
citations
#2674

Towards Optimal Multi-draft Speculative Decoding

Zhengmian Hu, Tong Zheng, Vignesh Viswanathan et al.

ICLR 2025posterarXiv:2502.18779
11
citations
#2675

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2025posterarXiv:2503.13792
11
citations
#2676

ViSpeak: Visual Instruction Feedback in Streaming Videos

Shenghao Fu, Qize Yang, Yuan-Ming Li et al.

ICCV 2025posterarXiv:2503.12769
11
citations
#2677

NetMoE: Accelerating MoE Training through Dynamic Sample Placement

Xinyi Liu, Yujie Wang, Fangcheng Fu et al.

ICLR 2025poster
11
citations
#2678

GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

Yangming Zhang, Wenqi Jia, Wei Niu et al.

CVPR 2025posterarXiv:2411.06019
11
citations
#2679

Rectified Diffusion Guidance for Conditional Generation

Mengfei Xia, Nan Xue, Yujun Shen et al.

CVPR 2025posterarXiv:2410.18737
11
citations
#2680

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Antonia Wüst, Tim Woydt, Lukas Helff et al.

ICML 2025posterarXiv:2410.19546
11
citations
#2681

Adversarial Machine Unlearning

Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik et al.

ICLR 2025posterarXiv:2406.07687
11
citations
#2682

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

Biao Zhang, Peter Wonka

ICLR 2025posterarXiv:2410.01295
11
citations
#2683

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

Guorui Zheng, Xidong Wang, Juhao Liang et al.

ICLR 2025posterarXiv:2410.10626
11
citations
#2684

OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

Anindya Mondal, Sauradip Nag, Xiatian Zhu et al.

AAAI 2025paperarXiv:2403.05435
11
citations
#2685

MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification

Yang Mu, Muhammad Shahzad, Xiao Xiang Zhu

AAAI 2025paperarXiv:2503.05582
11
citations
#2686

From Commands to Prompts: LLM-based Semantic File System for AIOS

Zeru Shi, Kai Mei, Mingyu Jin et al.

ICLR 2025posterarXiv:2410.11843
11
citations
#2687

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Zui Chen, Tianqiao Liu, Tongqing et al.

ICLR 2025posterarXiv:2501.14002
11
citations
#2688

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Tianyu Fu, Yi Ge, Yichen You et al.

NEURIPS 2025posterarXiv:2505.21600
11
citations
#2689

Hyperbolic Fine-Tuning for Large Language Models

Menglin Yang, Ram Samarth B B, Aosong Feng et al.

NEURIPS 2025spotlightarXiv:2410.04010
11
citations
#2690

Proxy Denoising for Source-Free Domain Adaptation

Song Tang, Wenxin Su, Yan Gan et al.

ICLR 2025posterarXiv:2406.01658
11
citations
#2691

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Fating Hong, Zunnan Xu, Zixiang Zhou et al.

ICCV 2025posterarXiv:2504.02542
11
citations
#2692

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models

Ziqi Lu, Heng Yang, Danfei Xu et al.

ICLR 2025posterarXiv:2412.07746
11
citations
#2693

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.

ICLR 2025posterarXiv:2410.10783
11
citations
#2694

Deep Kernel Relative Test for Machine-generated Text Detection

Yiliao Song, Zhenqiao Yuan, Shuhai Zhang et al.

ICLR 2025poster
11
citations
#2695

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025posterarXiv:2504.07745
11
citations
#2696

Latent Chain-of-Thought for Visual Reasoning

Guohao Sun, Hang Hua, Jian Wang et al.

NEURIPS 2025posterarXiv:2510.23925
11
citations
#2697

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Yihong Luo, Tianyang Hu, Jiacheng Sun et al.

ICCV 2025posterarXiv:2503.06674
11
citations
#2698

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

Zihan Zhang, Xiangyang Ji, Yuan Zhou

ICLR 2025posterarXiv:2110.08057
11
citations
#2699

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Jinho Jeong, Sangmin Han, Jinwoo Kim et al.

CVPR 2025posterarXiv:2503.18446
11
citations
#2700

Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization

Abhishek Roy, Geelon So, Yian Ma

NEURIPS 2025poster
11
citations
#2701

Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition

Bozheng Li, Mushui Liu, Gaoang Wang et al.

AAAI 2025paperarXiv:2408.12475
11
citations
#2702

Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation

HyunGi Kim, Siwon Kim, Jisoo Mok et al.

AAAI 2025paperarXiv:2501.04970
11
citations
#2703

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Hyeonho Jeong, Suhyeon Lee, Jong Ye

ICCV 2025posterarXiv:2503.09151
11
citations
#2704

SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning

Zhewei Dai, Shilei Zeng, Haotian Liu et al.

ICCV 2025posterarXiv:2410.14987
11
citations
#2705

Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

Mushui Liu, Fangtai Wu, Bozheng Li et al.

AAAI 2025paperarXiv:2408.12469
11
citations
#2706

Synthetic Video Enhances Physical Fidelity in Video Synthesis

Qi Zhao, Xingyu Ni, Ziyu Wang et al.

ICCV 2025posterarXiv:2503.20822
11
citations
#2707

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Tianyuan Zhang, Zhengfei Kuang, Haian Jin et al.

ICLR 2025posterarXiv:2410.06231
11
citations
#2708

Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection

Wen-Chao Hu, Wang-Zhou Dai, Yuan Jiang et al.

AAAI 2025paperarXiv:2412.08457
11
citations
#2709

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin, Li Kang, Xiufeng Song et al.

ICCV 2025posterarXiv:2503.16408
11
citations
#2710

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025posterarXiv:2406.14393
11
citations
#2711

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Renshan Zhang, Rui Shao, Gongwei Chen et al.

ICCV 2025posterarXiv:2501.16297
11
citations
#2712

Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

ICLR 2025posterarXiv:2405.14117
11
citations
#2713

ProSec: Fortifying Code LLMs with Proactive Security Alignment

Xiangzhe Xu, Zian Su, Jinyao Guo et al.

ICML 2025posterarXiv:2411.12882
11
citations
#2714

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.

CVPR 2025posterarXiv:2412.12718
11
citations
#2715

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

Qiang Hu, Zihan Zheng, Houqiang Zhong et al.

CVPR 2025posterarXiv:2503.18421
11
citations
#2716

Bridging the Gap for Test-Time Multimodal Sentiment Analysis

Zirun Guo, Tao Jin, Wenlong Xu et al.

AAAI 2025paperarXiv:2412.07121
11
citations
#2717

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Ziyi Wu, Anil Kag, Ivan Skorokhodov et al.

NEURIPS 2025oralarXiv:2506.03517
11
citations
#2718

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang, Chengjian Feng, Baihui Xiao et al.

ICCV 2025posterarXiv:2412.07689
11
citations
#2719

Ultra-Sparse Memory Network

Zihao Huang, Qiyang Min, Hongzhi Huang et al.

ICLR 2025posterarXiv:2411.12364
11
citations
#2720

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Shaofei Cai, Zihao Wang, Kewei Lian et al.

CVPR 2025posterarXiv:2410.17856
11
citations
#2721

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Zhenheng Tang, Xiang Liu, Qian Wang et al.

ICLR 2025posterarXiv:2502.17535
11
citations
#2722

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Fan Yang, Ru Zhen, Jianing Wang et al.

CVPR 2025posterarXiv:2411.17261
11
citations
#2723

Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

Nan Wang, Lixing Xiao, Yuantao Chen et al.

NEURIPS 2025posterarXiv:2506.05280
11
citations
#2724

Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety

Zihan Guan, Mengxuan Hu, Ronghang Zhu et al.

ICML 2025spotlightarXiv:2505.06843
11
citations
#2725

Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning

Yuankai Luo, Hongkang Li, Qijiong Liu et al.

ICLR 2025posterarXiv:2405.16435
11
citations
#2726

Hidden in the Noise: Two-Stage Robust Watermarking for Images

Kasra Arabi, Benjamin Feuer, R. Teal Witter et al.

ICLR 2025posterarXiv:2412.04653
11
citations
#2727

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

Yue Fan, Xiaojian Ma, Rongpeng Su et al.

ICCV 2025highlightarXiv:2501.00358
11
citations
#2728

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.

CVPR 2025highlightarXiv:2503.17032
11
citations
#2729

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Jingxuan Wei, Cheng Tan, Qi Chen et al.

CVPR 2025highlightarXiv:2411.11916
11
citations
#2730

Liger: Linearizing Large Language Models to Gated Recurrent Structures

Disen Lan, Weigao Sun, Jiaxi Hu et al.

ICML 2025posterarXiv:2503.01496
11
citations
#2731

STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM

Yiheng Huang, Xiaowei Mao, Shengnan Guo et al.

AAAI 2025paperarXiv:2407.09096
11
citations
#2732

Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective

Bo Ni, Yu Wang, Lu Cheng et al.

AAAI 2025paperarXiv:2410.08985
11
citations
#2733

Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models

Guobin Shen, Dongcheng Zhao, Yiting Dong et al.

ICLR 2025posterarXiv:2410.02298
11
citations
#2734

Differentiable Optimization of Similarity Scores Between Models and Brains

Nathan Cloos, Moufan Li, Markus Siegel et al.

ICLR 2025posterarXiv:2407.07059
11
citations
#2735

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

AAAI 2025paperarXiv:2408.17131
11
citations
#2736

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs

Haowen Pan, Xiaozhi Wang, Yixin Cao et al.

ICLR 2025posterarXiv:2503.01090
11
citations
#2737

ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.

CVPR 2025posterarXiv:2408.00672
11
citations
#2738

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

AAAI 2025paperarXiv:2412.13544
11
citations
#2739

HashAttention: Semantic Sparsity for Faster Inference

Aditya Desai, Shuo Yang, Alejandro Cuadron et al.

ICML 2025posterarXiv:2412.14468
11
citations
#2740

Manifold Learning by Mixture Models of VAEs for Inverse Problems

Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria et al.

ICLR 2025posterarXiv:2303.15244
11
citations
#2741

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025posterarXiv:2412.04378
11
citations
#2742

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

Yun Wang, Longguang Wang, Chenghao Zhang et al.

ICCV 2025highlightarXiv:2507.04631
11
citations
#2743

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

Haoji Zhang, Yiqin Wang, Yansong Tang et al.

ICCV 2025posterarXiv:2506.23825
11
citations
#2744

Multi-domain Distribution Learning for De Novo Drug Design

Arne Schneuing, Ilia Igashov, Adrian Dobbelstein et al.

ICLR 2025posterarXiv:2508.17815
11
citations
#2745

Rethinking Invariance in In-context Learning

Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.

ICLR 2025posterarXiv:2505.04994
11
citations
#2746

Deep Learning Alternatives Of The Kolmogorov Superposition Theorem

Leonardo Ferreira Guilhoto, Paris Perdikaris

ICLR 2025posterarXiv:2410.01990
11
citations
#2747

Residual Stream Analysis with Multi-Layer SAEs

Tim Lawson, Lucy Farnik, Conor Houghton et al.

ICLR 2025posterarXiv:2409.04185
11
citations
#2748

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

Meilong Xu, Saumya Gupta, Xiaoling Hu et al.

CVPR 2025posterarXiv:2412.06011
11
citations
#2749

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen et al.

CVPR 2025posterarXiv:2411.18229
11
citations
#2750

Visual-Instructed Degradation Diffusion for All-in-One Image Restoration

Haina Qin, Wenyang Luo, Zewen Chen et al.

CVPR 2025posterarXiv:2506.16960
11
citations
#2751

MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation

Huaize Liu, WenZhang Sun, Donglin Di et al.

CVPR 2025posterarXiv:2501.01808
11
citations
#2752

Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization

Xi Lin, Yilu Liu, Xiaoyuan Zhang et al.

ICLR 2025posterarXiv:2405.19650
11
citations
#2753

Revisiting In-context Learning Inference Circuit in Large Language Models

Hakaze Cho, Mariko Kato, Yoshihiro Sakai et al.

ICLR 2025posterarXiv:2410.04468
11
citations
#2754

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

zehan wang, Sashuai zhou, Shaoxuan He et al.

CVPR 2025poster
11
citations
#2755

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

Tianyuan Qu, Longxiang Tang, Bohao PENG et al.

ICCV 2025posterarXiv:2503.12496
11
citations
#2756

Semantic and Sequential Alignment for Referring Video Object Segmentation

Feiyu Pan, Hao Fang, Fangkai Li et al.

CVPR 2025poster
11
citations
#2757

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick et al.

ICML 2025posterarXiv:2502.12120
11
citations
#2758

Context Steering: Controllable Personalization at Inference Time

Zhiyang He, Sashrika Pandey, Mariah Schrum et al.

ICLR 2025posterarXiv:2405.01768
11
citations
#2759

Glad: A Streaming Scene Generator for Autonomous Driving

Bin Xie, Yingfei Liu, Tiancai Wang et al.

ICLR 2025oralarXiv:2503.00045
11
citations
#2760

SpiritSight Agent: Advanced GUI Agent with One Look

Zhiyuan Huang, Ziming Cheng, Junting Pan et al.

CVPR 2025posterarXiv:2503.03196
11
citations
#2761

DASK: Distribution Rehearsing via Adaptive Style Kernel Learning for Exemplar-Free Lifelong Person Re-Identification

Kunlun Xu, Chenghao Jiang, Peixi Xiong et al.

AAAI 2025paperarXiv:2412.09224
11
citations
#2762

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Minki Kang, Jongwon Jeong, Seanie Lee et al.

NEURIPS 2025spotlightarXiv:2505.17612
11
citations
#2763

Large language models can learn and generalize steganographic chain-of-thought under process supervision

ROBERT MC CARTHY, Joey SKAF, Luis Ibanez-Lissen et al.

NEURIPS 2025posterarXiv:2506.01926
11
citations
#2764

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen et al.

CVPR 2025posterarXiv:2412.04301
11
citations
#2765

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang et al.

ICLR 2025posterarXiv:2408.10202
11
citations
#2766

Understanding Virtual Nodes: Oversquashing and Node Heterogeneity

Joshua Southern, Francesco Di Giovanni, Michael Bronstein et al.

ICLR 2025posterarXiv:2405.13526
11
citations
#2767

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Feize Wu, Yun Pang, Junyi Zhang et al.

AAAI 2025paperarXiv:2408.15914
11
citations
#2768

Effective Training Data Synthesis for Improving MLLM Chart Understanding

Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.

ICCV 2025posterarXiv:2508.06492
11
citations
#2769

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.

ICLR 2025posterarXiv:2406.08973
11
citations
#2770

LoLCATs: On Low-Rank Linearizing of Large Language Models

Michael Zhang, Simran Arora, Rahul Chalamala et al.

ICLR 2025posterarXiv:2410.10254
11
citations
#2771

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Chen Wang, Chuhao Chen, Yiming Huang et al.

NEURIPS 2025oralarXiv:2509.20358
11
citations
#2772

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Mohan Xu, Kai Li, Guo Chen et al.

ICLR 2025oralarXiv:2410.01469
11
citations
#2773

On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding

Dehong Xu, Ruiqi Gao, Wenhao Zhang et al.

ICLR 2025posterarXiv:2405.16865
11
citations
#2774

FAMNet: Frequency-aware Matching Network for Cross-domain Few-shot Medical Image Segmentation

Yuntian Bo, Yazhou Zhu, Lunbo Li et al.

AAAI 2025paperarXiv:2412.09319
11
citations
#2775

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Jannis Chemseddine, Christian Wald, Richard Duong et al.

ICLR 2025posterarXiv:2410.03282
11
citations
#2776

Conformal Thresholded Intervals for Efficient Regression

Rui Luo, Zhixin Zhou

AAAI 2025paperarXiv:2407.14495
11
citations
#2777

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Yiren Song, Cheng Liu, Mike Zheng Shou

NEURIPS 2025posterarXiv:2505.18445
10
citations
#2778

Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation

Changshuo Wang, Shuting He, Xiang Fang et al.

AAAI 2025paperarXiv:2504.02454
10
citations
#2779

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.

NEURIPS 2025posterarXiv:2504.11651
10
citations
#2780

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

CVPR 2025posterarXiv:2411.15720
10
citations
#2781

Measuring memorization in RLHF for code completion

Jamie Hayes, I Shumailov, Billy Porter et al.

ICLR 2025posterarXiv:2406.11715
10
citations
#2782

MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output

Yanyuan Chen, Dexuan Xu, Yu Huang et al.

CVPR 2025posterarXiv:2510.10011
10
citations
#2783

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.

CVPR 2025posterarXiv:2411.14716
10
citations
#2784

Towards Training-free Anomaly Detection with Vision and Language Foundation Models

Jinjin Zhang, Guodong Wang, yizhou jin et al.

CVPR 2025posterarXiv:2503.18325
10
citations
#2785

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen, Jianfei Yang

ICLR 2025posterarXiv:2410.10167
10
citations
#2786

Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution

Karam Park, Jae Woong Soh, Nam Ik Cho

AAAI 2025paperarXiv:2501.15774
10
citations
#2787

UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks

Yuanbin Qian, Shuhan Ye, Chong Wang et al.

AAAI 2025paperarXiv:2503.12905
10
citations
#2788

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

CVPR 2025highlightarXiv:2411.15482
10
citations
#2789

Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models

Yingqing Guo, Yukang Yang, Hui Yuan et al.

NEURIPS 2025posterarXiv:2502.11420
10
citations
#2790

MagicColor: Multi-instance Sketch Colorization

yinhan Zhang, Yue Ma, Bingyuan Wang et al.

ICCV 2025poster
10
citations
#2791

Flexible Frame Selection for Efficient Video Reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.

CVPR 2025poster
10
citations
#2792

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

Minghe Gao, Xuqi Liu, Zhongqi Yue et al.

ICCV 2025posterarXiv:2504.06606
10
citations
#2793

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation

Yichen Xie, Runsheng Xu, Tong He et al.

CVPR 2025poster
10
citations
#2794

Attention as a Hypernetwork

Simon Schug, Seijin Kobayashi, Yassir Akram et al.

ICLR 2025posterarXiv:2406.05816
10
citations
#2795

DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval

Yating Liu, Zimo Liu, Xiangyuan Lan et al.

AAAI 2025paperarXiv:2503.04144
10
citations
#2796

Boosting Latent Diffusion with Perceptual Objectives

Tariq Berrada, Pietro Astolfi, Melissa Hall et al.

ICLR 2025posterarXiv:2411.04873
10
citations
#2797

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Yumeng Li, William H Beluch, Margret Keuper et al.

ICLR 2025oralarXiv:2403.13501
10
citations
#2798

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

Jianrong Zhang, Hehe Fan, Yi Yang

CVPR 2025highlightarXiv:2412.14706
10
citations
#2799

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Zongbo Han, Jialong Yang, Guangyu Wang et al.

NEURIPS 2025posterarXiv:2409.19375
10
citations
#2800

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Jin Wang, Chenghui Lv, Xian Li et al.

CVPR 2025posterarXiv:2503.15024
10
citations