Most Cited 2025 "state-object pairs" Papers

22,274 papers found • Page 40 of 112

#7801

Towards Realistic Semi-supervised Medical Image Classification

Wenxue Li, Lie Ju, Feilong Tang et al.

AAAI 2025paper
5
citations
#7802

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano et al.

ICLR 2025arXiv:2410.13837
5
citations
#7803

Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP

Yayuan Li, Jintao Guo, Lei Qi et al.

AAAI 2025paperarXiv:2412.11375
5
citations
#7804

RoDA: Robust Domain Alignment for Cross-Domain Retrieval Against Label Noise

Ziniu Yin, Yanglin Feng, Ming Yan et al.

AAAI 2025paper
5
citations
#7805

Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement

Chenxu Wu, Qingpeng Kong, Zihang Jiang et al.

ICLR 2025oralarXiv:2501.13514
5
citations
#7806

Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting

Jiaqi Lin, Zhihao Li, Binxiao Huang et al.

AAAI 2025paperarXiv:2501.10788
5
citations
#7807

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim et al.

AAAI 2025paperarXiv:2409.00986
5
citations
#7808

Optimized Gradient Clipping for Noisy Label Learning

Xichen Ye, Yifan Wu, Weizhong Zhang et al.

AAAI 2025paperarXiv:2412.08941
5
citations
#7809

DPLUT: Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors

Yunlong Lin, Zhenqi Fu, Kairun Wen et al.

AAAI 2025paper
5
citations
#7810

Lightweight Predictive 3D Gaussian Splats

Junli Cao, Vidit Goel, Chaoyang Wang et al.

ICLR 2025arXiv:2406.19434
5
citations
#7811

LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

Zhuorui Ye, Stephanie Milani, Geoff Gordon et al.

ICLR 2025arXiv:2407.15786
5
citations
#7812

CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

Yating Liu, Yujie Zhang, Ziyu Shan et al.

AAAI 2025paperarXiv:2501.10071
5
citations
#7813

SCOPE: Sign Language Contextual Processing with Embedding from LLMs

Yuqi Liu, Wenqian Zhang, Sihan Ren et al.

AAAI 2025paperarXiv:2409.01073
5
citations
#7814

Anchor Learning with Potential Cluster Constraints for Multi-view Clustering

Yawei Chen, Huibing Wang, Jinjia Peng et al.

AAAI 2025paperarXiv:2412.16519
5
citations
#7815

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection

Yuhang Ma, Wenting Xu, Chaoyi Zhao et al.

AAAI 2025paperarXiv:2409.19624
5
citations
#7816

Minimalist Concept Erasure in Generative Models

Yang Zhang, Er Jin, Yanfei Dong et al.

ICML 2025arXiv:2507.13386
5
citations
#7817

Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

Jiaxin Deng, Junbiao Pang, Baochang Zhang et al.

AAAI 2025paperarXiv:2406.08001
5
citations
#7818

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models

GuangHao Meng, Sunan He, Jinpeng Wang et al.

AAAI 2025paperarXiv:2505.18594
5
citations
#7819

MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer

Yilin Wang, chuan guo, Yuxuan Mu et al.

ICLR 2025oralarXiv:2504.08959
5
citations
#7820

CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification

Jiyang Xu, Qi Wang, Xin Xiong et al.

AAAI 2025paper
5
citations
#7821

Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM

Zirui Pan, Xin Wang, Yipeng Zhang et al.

AAAI 2025paperarXiv:2504.12048
5
citations
#7822

Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation

Suho Park, SuBeen Lee, Hyun Seok Seong et al.

AAAI 2025paperarXiv:2501.00752
5
citations
#7823

Causal Information Prioritization for Efficient Reinforcement Learning

Hongye Cao, Fan Feng, Tianpei Yang et al.

ICLR 2025arXiv:2502.10097
5
citations
#7824

TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection

Qiang Qi, Xiao Wang

AAAI 2025paperarXiv:2503.13903
5
citations
#7825

Accelerating Training with Neuron Interaction and Nowcasting Networks

Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie et al.

ICLR 2025arXiv:2409.04434
5
citations
#7826

PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis

Yifan Xie, Tao Feng, Xin Zhang et al.

AAAI 2025paperarXiv:2412.08504
5
citations
#7827

NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

Zhengyi Ho, Siyuan Liang, Sen Zhang et al.

ICLR 2025arXiv:2410.08970
5
citations
#7828

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

Xingrui Wang, Xin Li, Yaosi Hu et al.

AAAI 2025paperarXiv:2412.10275
5
citations
#7829

Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity

Tianqi Shen, Shaohua Liu, Jiaqi Feng et al.

AAAI 2025paperarXiv:2412.16619
5
citations
#7830

SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

Pengfei Chen, Lingxi Xie, xinyue huo et al.

ICLR 2025arXiv:2407.16682
5
citations
#7831

A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD

Ruinan Jin, Xiao Li, Yaoliang Yu et al.

ICML 2025arXiv:2410.04458
5
citations
#7832

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

Nikolaos Tsilivis, Gal Vardi, Julia Kempe

ICLR 2025arXiv:2410.22069
5
citations
#7833

Multi-Label Test-Time Adaptation with Bound Entropy Minimization

Xiangyu Wu, Feng Yu, Yang Yang et al.

ICLR 2025arXiv:2502.03777
5
citations
#7834

Hierarchical Refinement: Optimal Transport to Infinity and Beyond

Peter Halmos, Julian Gold, Xinhao Liu et al.

ICML 2025oralarXiv:2503.03025
5
citations
#7835

Exploring Activation Patterns of Parameters in Language Models

Yudong Wang, Damai Dai, Zhe Yang et al.

AAAI 2025paperarXiv:2405.17799
5
citations
#7836

Progressive Compression with Universally Quantized Diffusion Models

Yibo Yang, Justus Will, Stephan Mandt

ICLR 2025arXiv:2412.10935
5
citations
#7837

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Chenglu Sun, Shuo Shen, Wenzhi Tao et al.

AAAI 2025paperarXiv:2501.01085
5
citations
#7838

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Filipp Zmushko, Aleksandr Beznosikov, Martin Takac et al.

ICML 2025arXiv:2411.07837
5
citations
#7839

Sample Efficient Demonstration Selection for In-Context Learning

Kiran Purohit, Venktesh V, Sourangshu Bhattacharya et al.

ICML 2025arXiv:2506.08607
5
citations
#7840

Model Immunization from a Condition Number Perspective

Amber Yijia Zheng, Cedar Site Bai, Brian Bullins et al.

ICML 2025oralarXiv:2505.23760
5
citations
#7841

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

Jiwook Kim, Seonho Lee, Jaeyo Shin et al.

ICLR 2025arXiv:2407.11394
5
citations
#7842

Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization

Yeji Song, Jimyeong Kim, Wonhark Park et al.

AAAI 2025paperarXiv:2403.14155
5
citations
#7843

OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model

Zhenhao Zhang, Ye Shi, Lingxiao Yang et al.

NEURIPS 2025oralarXiv:2505.18947
5
citations
#7844

Score-based Pullback Riemannian Geometry: Extracting the Data Manifold Geometry using Anisotropic Flows

Willem Diepeveen, Georgios Batzolis, Zakhar Shumaylov et al.

ICML 2025arXiv:2410.01950
5
citations
#7845

GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning

Zhun Mou, Bin Xia, Zhengchao Huang et al.

ICML 2025arXiv:2503.02341
5
citations
#7846

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

Byeongjun Park, Hyojun Go, Hyelin Nam et al.

ICCV 2025arXiv:2503.12024
5
citations
#7847

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

Dahun Kim, AJ Piergiovanni, Ganesh Satish Mallya et al.

CVPR 2025arXiv:2504.03970
5
citations
#7848

EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds

Yujun Liu, Ruisheng Wang, Shangfeng Huang et al.

CVPR 2025
5
citations
#7849

Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning

yan wang, Da-Wei Zhou, Han-Jia Ye

ICCV 2025arXiv:2508.08165
5
citations
#7850

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

Muye Huang, Lingling Zhang, Jie Ma et al.

NEURIPS 2025arXiv:2505.19076
5
citations
#7851

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Shuhai Zhang, ZiHao Lian, Jiahao Yang et al.

NEURIPS 2025oralarXiv:2510.08073
5
citations
#7852

Fractal Calibration for Long-tailed Object Detection

Konstantinos Alexandridis, Ismail Elezi, Jiankang Deng et al.

CVPR 2025arXiv:2410.11774
5
citations
#7853

SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding

Tianci Wen, Zhiang Liu, Yongchun Fang

ICCV 2025arXiv:2501.05242
5
citations
#7854

PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model

Xiang Gao, Shuai Yang, Jiaying Liu

CVPR 2025arXiv:2503.06186
5
citations
#7855

LuxDiT: Lighting Estimation with Video Diffusion Transformer

Ruofan Liang, Kai He, Zan Gojcic et al.

NEURIPS 2025arXiv:2509.03680
5
citations
#7856

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Wenchuan Wang, Mengqi Huang, Yijing Tu et al.

ICCV 2025arXiv:2505.02192
5
citations
#7857

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Nan Chen, Mengqi Huang, Yihao Meng et al.

ICCV 2025arXiv:2507.01945
5
citations
#7858

Fine-grained Spatiotemporal Grounding on Egocentric Videos

Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.

ICCV 2025arXiv:2508.00518
5
citations
#7859

StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting

Shakiba Kheradmand, Delio Vicini, George Kopanas et al.

ICCV 2025arXiv:2503.24366
5
citations
#7860

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation

Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.

CVPR 2025arXiv:2412.03178
5
citations
#7861

Do different prompting methods yield a common task representation in language models?

Guy Davidson, Todd Gureckis, Brenden Lake et al.

NEURIPS 2025arXiv:2505.12075
5
citations
#7862

Treatment Effect Estimation for Optimal Decision-Making

Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal et al.

NEURIPS 2025arXiv:2505.13092
5
citations
#7863

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation

Youwei Zheng, Yuxi Ren, Xin Xia et al.

ICCV 2025arXiv:2510.09094
5
citations
#7864

Boosting Adversarial Transferability through Augmentation in Hypothesis Space

Yu Guo, Weiquan Liu, Qingshan Xu et al.

CVPR 2025
5
citations
#7865

Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs

Yibo Wang, Hai-Long Sun, Guangda Huzhang et al.

NEURIPS 2025arXiv:2601.08198
5
citations
#7866

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

Yichao Shen, Fangyun Wei, Zhiying Du et al.

NEURIPS 2025arXiv:2512.06963
5
citations
#7867

Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis

Zhuokun Chen, Jugang Fan, Zhuowei Yu et al.

ICCV 2025arXiv:2507.20454
5
citations
#7868

UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts

Yidi Liu, Dong Li, Xueyang Fu et al.

CVPR 2025
5
citations
#7869

SignRep: Enhancing Self-Supervised Sign Representations

Ryan Wong, Necati Cihan Camgoz, Richard Bowden

ICCV 2025arXiv:2503.08529
5
citations
#7870

CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu et al.

ICCV 2025
5
citations
#7871

InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior

Minghao Wen, Shengjie Wu, Kangkan Wang et al.

ICCV 2025arXiv:2507.04961
5
citations
#7872

Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration

Junyuan Deng, Wei Yin, Xiaoyang Guo et al.

ICCV 2025arXiv:2411.17240
5
citations
#7873

MikuDance: Animating Character Art with Mixed Motion Dynamics

Jiaxu Zhang, Xianfang Zeng, Xin Chen et al.

ICCV 2025arXiv:2411.08656
5
citations
#7874

From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport

Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.

CVPR 2025arXiv:2310.11439
5
citations
#7875

MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization

Zeyuan Ma, Yue-Jiao Gong, Hongshu Guo et al.

NEURIPS 2025arXiv:2505.17745
5
citations
#7876

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Chenxin Tao, Shiqian Su, Xizhou Zhu et al.

CVPR 2025arXiv:2412.16158
5
citations
#7877

MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance

Hallee Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.

ICCV 2025arXiv:2412.15058
5
citations
#7878

HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration

Shaocheng Yan, Yiming Wang, Kaiyan Zhao et al.

CVPR 2025
5
citations
#7879

Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection

Gensheng Pei, Tao Chen, Yujia Wang et al.

CVPR 2025arXiv:2503.17080
5
citations
#7880

MARBLE: Material Recomposition and Blending in CLIP-Space

Ta-Ying Cheng, Prafull Sharma, Mark Boss et al.

CVPR 2025arXiv:2506.05313
5
citations
#7881

Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking

Qiangqiang Wu, Yi Yu, Chenqi Kong et al.

ICCV 2025arXiv:2507.07483
5
citations
#7882

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat

Zhipeng Huang, Shaobin Zhuang, Canmiao Fu et al.

CVPR 2025arXiv:2503.01115
5
citations
#7883

External Knowledge Injection for CLIP-Based Class-Incremental Learning

Da-Wei Zhou, Kai-Wen Li, Jingyi Ning et al.

ICCV 2025arXiv:2503.08510
5
citations
#7884

Improving Multimodal Learning via Imbalanced Learning

Shicai Wei, Chunbo Luo, Yang Luo

ICCV 2025arXiv:2507.10203
5
citations
#7885

DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model

Junjia Huang, Pengxiang Yan, Jinhang Cai et al.

ICCV 2025highlight
5
citations
#7886

HERO: Human Reaction Generation from Videos

Chengjun Yu, Wei Zhai, Yuhang Yang et al.

ICCV 2025arXiv:2503.08270
5
citations
#7887

Rethinking Tokenized Graph Transformers for Node Classification

Jinsong Chen, Chenyang Li, Gaichao Li et al.

NEURIPS 2025arXiv:2502.08101
5
citations
#7888

Multi-party Collaborative Attention Control for Image Customization

Han Yang, Chuanguang Yang, Qiuli Wang et al.

CVPR 2025arXiv:2505.01428
5
citations
#7889

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

Tianyun Zhong, Chao Liang, Jianwen Jiang et al.

CVPR 2025arXiv:2412.16915
5
citations
#7890

PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model

Jinhua Zhang, Hualian Sheng, Sijia Cai et al.

ICCV 2025arXiv:2407.06109
5
citations
#7891

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception

Yunpeng Qu, Kun Yuan, Qizhi Xie et al.

CVPR 2025arXiv:2503.10259
5
citations
#7892

Synergistic Prompting for Robust Visual Recognition with Missing Modalities

Zhihui Zhang, Luanyuan Dai, Qika Lin et al.

ICCV 2025arXiv:2507.07802
5
citations
#7893

Bisecle: Binding and Separation in Continual Learning for Video Language Understanding

Yue Tan, Xiaoqian Hu, Hao Xue et al.

NEURIPS 2025arXiv:2507.00469
5
citations
#7894

Adding Additional Control to One-Step Diffusion with Joint Distribution Matching

Yihong Luo, Tianyang Hu, Yifan Song et al.

ICCV 2025arXiv:2503.06652
5
citations
#7895

Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks

Yi Xiao, Qiangqiang Yuan, Kui Jiang et al.

NEURIPS 2025oralarXiv:2503.04223
5
citations
#7896

Streaming VideoLLMs for Real-Time Procedural Video Understanding

Dibyadip Chatterjee, Edoardo Remelli, Yale Song et al.

ICCV 2025arXiv:2504.13915
5
citations
#7897

AutoData: A Multi-Agent System for Open Web Data Collection

Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.

NEURIPS 2025arXiv:2505.15859
5
citations
#7898

CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation

Kai Fang, Anqi Zhang, Guangyu Gao et al.

CVPR 2025arXiv:2504.04156
5
citations
#7899

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

Senkang Hu, Xudong Han, Jinqi Jiang et al.

NEURIPS 2025arXiv:2509.15888
5
citations
#7900

DisTime: Distribution-based Time Representation for Video Large Language Models

yingsen zeng, Zepeng Huang, Yujie Zhong et al.

ICCV 2025arXiv:2505.24329
5
citations
#7901

GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts

Minwen Liao, Hao Dong, Xinyi Wang et al.

ICCV 2025arXiv:2503.07417
5
citations
#7902

Learning Visual Generative Priors without Text

Shuailei Ma, Kecheng Zheng, Ying Wei et al.

CVPR 2025arXiv:2412.07767
5
citations
#7903

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models

hongji yang, Wencheng Han, Yucheng Zhou et al.

ICCV 2025arXiv:2502.14779
5
citations
#7904

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Gengze Zhou, Yicong Hong, Zun Wang et al.

ICCV 2025arXiv:2412.05552
5
citations
#7905

SocialGesture: Delving into Multi-person Gesture Understanding

Xu Cao, Pranav Virupaksha, Wenqi Jia et al.

CVPR 2025arXiv:2504.02244
5
citations
#7906

Hardware-Rasterized Ray-Based Gaussian Splatting

Samuel Rota Bulò, Lorenzo Porzi, Nemanja Bartolovic et al.

CVPR 2025highlightarXiv:2503.18682
5
citations
#7907

Versatile Transition Generation with Image-to-Video Diffusion

Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.

ICCV 2025arXiv:2508.01698
5
citations
#7908

WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection

Haodong Zhu, Wenhao Dong, Linlin Yang et al.

ICCV 2025arXiv:2507.18173
5
citations
#7909

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

jian ma, Qirong Peng, Xu Guo et al.

ICCV 2025arXiv:2503.06134
5
citations
#7910

MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing

Shuo Wang, Wanting Li, Yongcai Wang et al.

CVPR 2025arXiv:2412.20082
5
citations
#7911

Constrained Diffusers for Safe Planning and Control

Jichen Zhang, Liqun Zhao, Antonis Papachristodoulou et al.

NEURIPS 2025arXiv:2506.12544
5
citations
#7912

Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement

Shu Yang, Chengting Yu, Lei Liu et al.

CVPR 2025arXiv:2503.16572
5
citations
#7913

DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding

Yue Jiang, Jichu Li, Yang Liu et al.

NEURIPS 2025oralarXiv:2505.18411
5
citations
#7914

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

Yecheng Wu, Han Cai, Junyu Chen et al.

ICCV 2025arXiv:2507.04947
5
citations
#7915

EchoONE: Segmenting Multiple Echocardiography Planes in One Model

Jiongtong Hu, Wei Zhuo, Jun Cheng et al.

CVPR 2025arXiv:2412.02993
5
citations
#7916

Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency

Feng Wang, Timing Yang, Yaodong Yu et al.

CVPR 2025arXiv:2410.07599
5
citations
#7917

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

Jun Chen, Dannong Xu, Junjie Fei et al.

CVPR 2025arXiv:2411.16740
5
citations
#7918

Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

Xiyue Peng, Hengquan Guo, Jiawei Zhang et al.

NEURIPS 2025arXiv:2410.19933
5
citations
#7919

Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views

Chong Bao, Xiyu Zhang, Zehao Yu et al.

CVPR 2025arXiv:2503.24382
5
citations
#7920

TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions

Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.

ICCV 2025arXiv:2412.06334
5
citations
#7921

OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Luyao Tang, Chaoqi Chen, Yuxuan Yuan et al.

CVPR 2025arXiv:2503.18695
5
citations
#7922

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Debargha Ganguly, Vikash Singh, Sreehari Sankar et al.

NEURIPS 2025arXiv:2505.20047
5
citations
#7923

E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products

Yunyang Li, Lin Huang, Zhihao Ding et al.

NEURIPS 2025spotlightarXiv:2501.19216
5
citations
#7924

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Yang Zhang, Xinran Li, Jianing Ye et al.

NEURIPS 2025arXiv:2505.20922
5
citations
#7925

Learning single index models via harmonic decomposition

Nirmit Joshi, Hugo Koubbi, Theodor Misiakiewicz et al.

NEURIPS 2025arXiv:2506.09887
5
citations
#7926

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies

Dongyue Lu, Lingdong Kong, Gim Hee Lee et al.

NEURIPS 2025oralarXiv:2412.06708
5
citations
#7927

In-Context Learning Strategies Emerge Rationally

Daniel Wurgaft, Ekdeep S Lubana, Core Francisco Park et al.

NEURIPS 2025arXiv:2506.17859
5
citations
#7928

GUAVA: Generalizable Upper Body 3D Gaussian Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin et al.

ICCV 2025arXiv:2505.03351
5
citations
#7929

Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network

Xingyu Qiu, Mengying Yang, Xinghua Ma et al.

CVPR 2025arXiv:2502.19754
5
citations
#7930

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.

ICCV 2025arXiv:2503.10596
5
citations
#7931

GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR

Christophe Bolduc, Yannick Hold-Geoffroy, Jean-Francois Lalonde

ICCV 2025arXiv:2504.10809
5
citations
#7932

MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond

Shenghao Ren, Yi Lu, Jiayi Huang et al.

CVPR 2025highlightarXiv:2504.05046
5
citations
#7933

Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training

Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim et al.

CVPR 2025arXiv:2411.09998
5
citations
#7934

Object-aware Sound Source Localization via Audio-Visual Scene Understanding

Sung Jin Um, Dongjin Kim, Sangmin Lee et al.

CVPR 2025arXiv:2506.18557
5
citations
#7935

MoEdit: On Learning Quantity Perception for Multi-object Image Editing

Yanfeng Li, Ka-Hou Chan, Yue Sun et al.

CVPR 2025arXiv:2503.10112
5
citations
#7936

On the Consistency of Video Large Language Models in Temporal Comprehension

Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang et al.

CVPR 2025arXiv:2411.12951
5
citations
#7937

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

Chun Wang, Xiaojun Ye, Xiaoran Pan et al.

NEURIPS 2025arXiv:2505.18700
5
citations
#7938

Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis

Woojung Han, Yeonkyung Lee, Chanyoung Kim et al.

CVPR 2025arXiv:2503.22168
5
citations
#7939

Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms

Baran Hashemi, Kurt Pasque, Chris Teska et al.

NEURIPS 2025arXiv:2505.17190
5
citations
#7940

Learning to Integrate Diffusion ODEs by Averaging the Derivatives

Wenze Liu, Xiangyu Yue

NEURIPS 2025arXiv:2505.14502
5
citations
#7941

Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features

Liying Yang, Chen Liu, Zhenwei Zhu et al.

ICCV 2025highlightarXiv:2502.08377
5
citations
#7942

SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion

Xuan Zhu, Jijun Xiang, Xianqi Wang et al.

CVPR 2025arXiv:2503.01257
5
citations
#7943

Native-Resolution Image Synthesis

ZiDong Wang, LEI BAI, Xiangyu Yue et al.

NEURIPS 2025arXiv:2506.03131
5
citations
#7944

Alligat0R: Pre-Training through Covisibility Segmentation for Relative Camera Pose Regression

Thibaut Loiseau, Guillaume Bourmaud, Vincent Lepetit

NEURIPS 2025spotlight
5
citations
#7945

Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning

Liu Ziyin, Yizhou Xu, Isaac Chuang

NEURIPS 2025arXiv:2505.12387
5
citations
#7946

ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation

Zirun Guo, Tao Jin

CVPR 2025arXiv:2503.10358
5
citations
#7947

Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?

Yuru Jia, Valerio Marsocci, Ziyang Gong et al.

ICCV 2025arXiv:2503.07890
5
citations
#7948

Language Models Can Predict Their Own Behavior

Dhananjay Ashok, Jonathan May

NEURIPS 2025arXiv:2502.13329
5
citations
#7949

Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks

Ali Hariri, Alvaro Arroyo, Alessio Gravina et al.

NEURIPS 2025spotlightarXiv:2506.07624
5
citations
#7950

FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency

Yifei Su, Ning Liu, Dong Chen et al.

NEURIPS 2025oralarXiv:2506.08822
5
citations
#7951

Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems

Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert et al.

CVPR 2025arXiv:2410.01376
5
citations
#7952

Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

Xiaoqi Li, Lingyun Xu, Mingxu Zhang et al.

CVPR 2025arXiv:2505.02166
5
citations
#7953

SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image

Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi et al.

ICCV 2025arXiv:2409.16178
5
citations
#7954

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Haonan Han, Xiangzuo Wu, Huan Liao et al.

CVPR 2025arXiv:2411.18654
5
citations
#7955

EgoM2P: Egocentric Multimodal Multitask Pretraining

Gen Li, Yutong Chen, Yiqian Wu et al.

ICCV 2025arXiv:2506.07886
5
citations
#7956

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality

Sijie Li, Chen Chen, Jungong Han

ICCV 2025arXiv:2507.19264
5
citations
#7957

AniMo: Species-Aware Model for Text-Driven Animal Motion Generation

Xuan Wang, Kai Ruan, Xing Zhang et al.

CVPR 2025
5
citations
#7958

Backward Conformal Prediction

Etienne Gauthier, Francis Bach, Michael Jordan

NEURIPS 2025arXiv:2505.13732
5
citations
#7959

Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data

Xidan Zhang, Yihan Zhuang, Qian Guo et al.

ICCV 2025
5
citations
#7960

FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation

Siyi Chen, Yixuan Jia, Qing Qu et al.

NEURIPS 2025arXiv:2501.16642
5
citations
#7961

Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning

Chao-Chung Wu, Zhi Rui Tam, Chieh-Yen Lin et al.

NEURIPS 2025arXiv:2501.14315
5
citations
#7962

Anomize: Better Open Vocabulary Video Anomaly Detection

Fei Li, Wenxuan Liu, Jingjing Chen et al.

CVPR 2025arXiv:2503.18094
5
citations
#7963

NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion

Zihao Xu, Yuzhi Tang, Bowen Xu et al.

ICCV 2025
5
citations
#7964

Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees

Fu Luo, Yaoxin Wu, Zhi Zheng et al.

NEURIPS 2025arXiv:2505.24627
5
citations
#7965

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

Yifei Wang, Weimin Bai, colin zhang et al.

NEURIPS 2025arXiv:2505.20755
5
citations
#7966

NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes

Han-Hung Lee, Qinghong Han, Angel Chang

ICCV 2025arXiv:2503.16375
5
citations
#7967

OpenSDI: Spotting Diffusion-Generated Images in the Open World

Yabin Wang, Zhiwu Huang, Xiaopeng Hong

CVPR 2025arXiv:2503.19653
5
citations
#7968

Block-Biased Mamba for Long-Range Sequence Processing

Annan Yu, N. Benjamin Erichson

NEURIPS 2025arXiv:2505.09022
5
citations
#7969

Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels

Olaf Dünkel, Thomas Wimmer, Christian Theobalt et al.

ICCV 2025arXiv:2506.05312
5
citations
#7970

System Prompt Optimization with Meta-Learning

Yumin Choi, Jinheon Baek, Sung Ju Hwang

NEURIPS 2025arXiv:2505.09666
5
citations
#7971

Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration

Darshan Thaker, Abhishek Goyal, Rene Vidal

ICCV 2025arXiv:2411.15295
5
citations
#7972

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

Qi Chen, Xinze Zhou, Chen Liu et al.

ICCV 2025arXiv:2510.14831
5
citations
#7973

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.

NEURIPS 2025spotlightarXiv:2505.21908
5
citations
#7974

Privacy Reasoning in Ambiguous Contexts

Ren Yi, Octavian Suciu, Adrian Gascon et al.

NEURIPS 2025arXiv:2506.12241
5
citations
#7975

Online Learning of Neural Networks

Amit Daniely, Idan Mehalel, Elchanan Mossel

NEURIPS 2025arXiv:2505.09167
5
citations
#7976

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

YUEJIAO SU, Yi Wang, Qiongyang Hu et al.

CVPR 2025arXiv:2504.01472
5
citations
#7977

4Deform: Neural Surface Deformation for Robust Shape Interpolation

Lu Sang, Zehranaz Canfes, Dongliang Cao et al.

CVPR 2025arXiv:2502.20208
5
citations
#7978

Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising

Yuchen Wang, Hongyuan Wang, Lizhi Wang et al.

CVPR 2025arXiv:2412.16645
5
citations
#7979

OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging

Yijie Tang, Jiazhao Zhang, Yuqing Lan et al.

CVPR 2025arXiv:2503.01309
5
citations
#7980

iSegMan: Interactive Segment-and-Manipulate 3D Gaussians

Yian Zhao, Wanshi Xu, Ruochong Zheng et al.

CVPR 2025arXiv:2505.11934
5
citations
#7981

Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent

Tong Yang, Yu Huang, Yingbin Liang et al.

NEURIPS 2025arXiv:2508.08222
5
citations
#7982

Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training

Zhenghong Zhou, Jie An, Jiebo Luo

ICCV 2025arXiv:2412.06029
5
citations
#7983

Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning

Yuanyao Chen, Rongsheng Chen, Fu Luo et al.

NEURIPS 2025arXiv:2506.02392
5
citations
#7984

Heterogeneous Skeleton-Based Action Representation Learning

Xiaoyan Ma, jidong kuang, Hongsong Wang et al.

CVPR 2025arXiv:2506.03481
5
citations
#7985

InterDyn: Controllable Interactive Dynamics with Video Diffusion Models

Rick Akkerman, Haiwen Feng, Michael J. Black et al.

CVPR 2025arXiv:2412.11785
5
citations
#7986

Fine-grained List-wise Alignment for Generative Medication Recommendation

Chenxiao Fan, Chongming Gao, Wentao Shi et al.

NEURIPS 2025spotlightarXiv:2505.20218
5
citations
#7987

Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields

Shijie Zhou, Hui Ren, Yijia Weng et al.

CVPR 2025arXiv:2503.20776
5
citations
#7988

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering

Yihao Wang, Marcus Klasson, Matias Turkulainen et al.

CVPR 2025arXiv:2411.19756
5
citations
#7989

4D Visual Pre-training for Robot Learning

Chengkai Hou, Yanjie Ze, Yankai Fu et al.

ICCV 2025arXiv:2508.17230
5
citations
#7990

AnyI2V: Animating Any Conditional Image with Motion Control

Ziye Li, Xincheng Shuai, Hao Luo et al.

ICCV 2025arXiv:2507.02857
5
citations
#7991

High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model

Mingtao Guo, Guanyu Xing, Yanli Liu

CVPR 2025arXiv:2502.19894
5
citations
#7992

InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation

Jinlai Liu, Jian Han, Bin Yan et al.

NEURIPS 2025oral
5
citations
#7993

Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images

Junxian Wu, Minheng Chen, Xinyi Ke et al.

CVPR 2025
5
citations
#7994

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation

Xin Yan, Yuxuan Cai, Qiuyue Wang et al.

CVPR 2025arXiv:2412.01316
5
citations
#7995

Brain-like Variational Inference

Hadi Vafaii, Dekel Galor, Jacob Yates

NEURIPS 2025arXiv:2410.19315
5
citations
#7996

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025arXiv:2511.06256
5
citations
#7997

A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition

Jie Zhu, Yiyang Su, Minchul Kim et al.

ICCV 2025arXiv:2508.00053
5
citations
#7998

Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement

Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.

CVPR 2025arXiv:2503.15404
5
citations
#7999

Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules

Binghui Li, Fengling Chen, Zixun Huang et al.

NEURIPS 2025spotlightarXiv:2509.19189
5
citations
#8000

Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction

Teng Hu, Jiangning Zhang, Ran Yi et al.

CVPR 2025arXiv:2501.00880
5
citations