Most Cited 2025 "linear rewards" Papers

22,274 papers found • Page 34 of 112

#6601

Constant Bit-size Transformers Are Turing Complete

Qian Li, Yuyi Wang

NEURIPS 2025oralarXiv:2506.12027
6
citations
#6602

Towards foundational LiDAR world models with efficient latent flow matching

Tianran Liu, Shengwen Zhao, Nicholas Rhinehart

NEURIPS 2025arXiv:2506.23434
6
citations
#6603

Towards Doctor-Like Reasoning: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients

Yuxing Lu, Gecheng Fu, Wei Wu et al.

NEURIPS 2025
6
citations
#6604

RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes

Pou-Chun Kung, Skanda Harisha, Ram Vasudevan et al.

ICCV 2025arXiv:2506.01379
6
citations
#6605

Logits DeConfusion with CLIP for Few-Shot Learning

Shuo Li, Fang Liu, Zehua Hao et al.

CVPR 2025arXiv:2504.12104
6
citations
#6606

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

Litao Guo, Xinli Xu, Luozhou Wang et al.

NEURIPS 2025arXiv:2505.17908
6
citations
#6607

Doubly Robust Alignment for Large Language Models

Erhan Xu, Kai Ye, Hongyi Zhou et al.

NEURIPS 2025arXiv:2506.01183
6
citations
#6608

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

Ziyi Wang, Yanran Zhang, Jie Zhou et al.

CVPR 2025arXiv:2506.09952
6
citations
#6609

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Marianne Arriola, Yair Schiff, Hao Phung et al.

NEURIPS 2025arXiv:2510.22852
6
citations
#6610

EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data

Ryan Punamiya, Dhruv Patel, Patcharapong Aphiwetsa et al.

NEURIPS 2025arXiv:2509.19626
6
citations
#6611

Golden Cudgel Network for Real-Time Semantic Segmentation

Guoyu Yang, Yuan Wang, Daming Shi et al.

CVPR 2025arXiv:2503.03325
6
citations
#6612

PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring

Wang, Xiao Yang, Qingyong Hu et al.

NEURIPS 2025arXiv:2507.19172
6
citations
#6613

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Jiawei Lin, Shizhao Sun, Danqing Huang et al.

CVPR 2025arXiv:2412.19712
6
citations
#6614

Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation

Edward LOO, Jiacheng Deng

CVPR 2025arXiv:2506.17891
6
citations
#6615

CAT: Content-Adaptive Image Tokenization

Junhong Shen, Kushal Tirumala, Michihiro Yasunaga et al.

NEURIPS 2025arXiv:2501.03120
6
citations
#6616

ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS

Weijie Wang, Donny Y. Chen, Zeyu Zhang et al.

NEURIPS 2025arXiv:2505.23734
6
citations
#6617

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Bingquan Dai, Luo Li, Qihong Tang et al.

NEURIPS 2025arXiv:2508.14879
6
citations
#6618

CoMatcher: Multi-View Collaborative Feature Matching

Jintao Zhang, Zimin Xia, Mingyue Dong et al.

CVPR 2025arXiv:2504.01872
6
citations
#6619

GoRA: Gradient-driven Adaptive Low Rank Adaptation

haonan he, Peng Ye, Yuchen Ren et al.

NEURIPS 2025arXiv:2502.12171
6
citations
#6620

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

Wanxin Tian, Shijie Zhang, Kevin Zhang et al.

NEURIPS 2025arXiv:2506.21669
6
citations
#6621

VALLR: Visual ASR Language Model for Lip Reading

Marshall Thomas, Edward Fish, Richard Bowden

ICCV 2025arXiv:2503.21408
6
citations
#6622

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

Yuheng Jiang, Zhehao Shen, Chengcheng Guo et al.

CVPR 2025arXiv:2503.12242
6
citations
#6623

Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

Kazuki Irie, Morris Yau, Samuel J Gershman

NEURIPS 2025arXiv:2506.00744
6
citations
#6624

IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement

Zhihao Shi, Dong Huo, Yuhongze Zhou et al.

CVPR 2025arXiv:2503.04501
6
citations
#6625

7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting

Zhongpai Gao, Benjamin Planche, Meng Zheng et al.

ICCV 2025arXiv:2503.07946
6
citations
#6626

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Ho Kei Cheng, Alex Schwing

ICCV 2025arXiv:2503.10636
6
citations
#6627

CDI: Copyrighted Data Identification in Diffusion Models

Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch et al.

CVPR 2025arXiv:2411.12858
6
citations
#6628

AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation

Qingqiu Li, Zihang Cui, Seongsu Bae et al.

NEURIPS 2025arXiv:2505.02830
6
citations
#6629

Scaling Speculative Decoding with Lookahead Reasoning

Yichao Fu, Rui Ge, Zelei Shao et al.

NEURIPS 2025arXiv:2506.19830
6
citations
#6630

DMWM: Dual-Mind World Model with Long-Term Imagination

Lingyi Wang, Rashed Shelim, Walid Saad et al.

NEURIPS 2025spotlightarXiv:2502.07591
6
citations
#6631

Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

Xingjian Ran, Yixuan Li, Linning Xu et al.

NEURIPS 2025arXiv:2506.05341
6
citations
#6632

AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving

Jiawei Xu, Kai Deng, Zexin Fan et al.

ICCV 2025arXiv:2507.12137
6
citations
#6633

TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval

Jialin Chen, Ziyu Zhao, Gaukhar Nurbek et al.

NEURIPS 2025oralarXiv:2506.09114
6
citations
#6634

LayerAnimate: Layer-level Control for Animation

Yuxue Yang, Lue Fan, Zuzeng Lin et al.

ICCV 2025arXiv:2501.08295
6
citations
#6635

Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps

Chong Cheng, Sicheng Yu, Zijian Wang et al.

ICCV 2025arXiv:2507.03737
6
citations
#6636

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

Ruihai Wu, Ziyu Zhu, Yuran Wang et al.

CVPR 2025arXiv:2503.09243
6
citations
#6637

Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification

S P Sharan, Minkyu Choi, Sahil Shah et al.

CVPR 2025arXiv:2411.16718
6
citations
#6638

FAIR Universe HiggsML Uncertainty Dataset and Competition

Wahid Bhimji, Ragansu Chakkappai, Po-Wen Chang et al.

NEURIPS 2025arXiv:2410.02867
6
citations
#6639

Where, What, Why: Towards Explainable Driver Attention Prediction

Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.

ICCV 2025highlightarXiv:2506.23088
6
citations
#6640

ROSE: Remove Objects with Side Effects in Videos

Chenxuan Miao, Yutong Feng, Jianshu Zeng et al.

NEURIPS 2025arXiv:2508.18633
6
citations
#6641

Scaling Physical Reasoning with the PHYSICS Dataset

Shenghe Zheng, Qianjia Cheng, Junchi Yao et al.

NEURIPS 2025arXiv:2506.00022
6
citations
#6642

Generative Graph Pattern Machine

Zehong Wang, Zheyuan Zhang, Tianyi Ma et al.

NEURIPS 2025arXiv:2505.16130
6
citations
#6643

Distillation Robustifies Unlearning

Bruce W, Lee, Addie Foote, Alex Infanger et al.

NEURIPS 2025spotlightarXiv:2506.06278
6
citations
#6644

Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion

Jona Ballé, Luca Versari, Emilien Dupont et al.

CVPR 2025highlightarXiv:2412.00505
6
citations
#6645

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts

Qizhou Chen, Chengyu Wang, Dakan Wang et al.

CVPR 2025arXiv:2411.15432
6
citations
#6646

LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions

Hadi Askari, Shivanshu Gupta, Fei Wang et al.

NEURIPS 2025arXiv:2505.23811
6
citations
#6647

CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving

Rui Song, Chenwei Liang, Yan Xia et al.

ICCV 2025arXiv:2503.06744
6
citations
#6648

TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation

Ruineng Li, Daitao Xing, Huiming Sun et al.

CVPR 2025arXiv:2504.08181
6
citations
#6649

Augmented Deep Contexts for Spatially Embedded Video Coding

Yifan Bian, Chuanbo Tang, Li Li et al.

CVPR 2025highlightarXiv:2505.05309
6
citations
#6650

LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition

Jinghan You, Shanglin Li, Yuanrui Sun et al.

ICCV 2025highlightarXiv:2501.13420
6
citations
#6651

Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection

Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander

CVPR 2025arXiv:2503.23220
6
citations
#6652

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Zeyu Liu, Zanlin Ni, Yeguo Hua et al.

ICCV 2025arXiv:2503.17760
6
citations
#6653

Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search

Haoran Sun, Yankai Jiang, Wenjie Lou et al.

NEURIPS 2025arXiv:2506.16962
6
citations
#6654

4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians

Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison

CVPR 2025arXiv:2505.22859
6
citations
#6655

Probing Equivariance and Symmetry Breaking in Convolutional Networks

Sharvaree Vadgama, Mohammad Islam, Domas Buracas et al.

NEURIPS 2025arXiv:2501.01999
6
citations
#6656

SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning

Yiting Wang, Wanghao Ye, Ping Guo et al.

NEURIPS 2025arXiv:2504.10369
6
citations
#6657

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.

ICCV 2025arXiv:2503.19263
6
citations
#6658

Rotation-Equivariant Self-Supervised Method in Image Denoising

Hanze Liu, Jiahong Fu, Qi Xie et al.

CVPR 2025arXiv:2505.19618
6
citations
#6659

Uncertain Multimodal Intention and Emotion Understanding in the Wild

Qu Yang, QingHongYa Shi, Tongxin Wang et al.

CVPR 2025
6
citations
#6660

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Zichen Tang, Haihong E, Jiacheng Liu et al.

ICCV 2025arXiv:2508.04625
6
citations
#6661

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Zhiqi Ge, Juncheng Li, Xinglei Pang et al.

ICCV 2025arXiv:2412.10342
6
citations
#6662

TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling

Yuancheng Wang, Dekun Chen, Xueyao Zhang et al.

NEURIPS 2025arXiv:2508.16790
6
citations
#6663

DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning

Leander Diaz-Bone, Marco Bagatella, Jonas Hübotter et al.

NEURIPS 2025arXiv:2505.19850
6
citations
#6664

DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy

Ming Dai, Wenxuan Cheng, Jiang-Jiang Liu et al.

ICCV 2025arXiv:2507.01738
6
citations
#6665

Memories of Forgotten Concepts

Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.

CVPR 2025highlightarXiv:2412.00782
6
citations
#6666

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.

CVPR 2025arXiv:2412.00071
6
citations
#6667

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models

Yuning Han, Bingyin Zhao, Rui Chu et al.

CVPR 2025highlightarXiv:2412.11441
6
citations
#6668

ReWind: Understanding Long Videos with Instructed Learnable Memory

Anxhelo Diko, Tinghuai Wang, Wassim Swaileh et al.

CVPR 2025arXiv:2411.15556
6
citations
#6669

Differentially Private Fine-Tuning of Diffusion Models

Yu-Lin Tsai, Yizhe Li, Zekai Chen et al.

ICCV 2025arXiv:2406.01355
6
citations
#6670

Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining

Haochen Zhang, Junze Yin, Guanchu Wang et al.

NEURIPS 2025arXiv:2502.05790
6
citations
#6671

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Zhen Zeng, Leijiang Gu, Xun Yang et al.

ICCV 2025arXiv:2411.12790
6
citations
#6672

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

Han Wang, Shengyang Li, Jian Yang et al.

ICCV 2025arXiv:2506.22027
6
citations
#6673

Angular Steering: Behavior Control via Rotation in Activation Space

Minh Hieu Vu, Tan Nguyen

NEURIPS 2025oralarXiv:2510.26243
6
citations
#6674

Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization

Xu Zheng, Yuanhuiyi Lyu, Lutao Jiang et al.

ICCV 2025arXiv:2505.06635
6
citations
#6675

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

Yang Yue, Yulin Wang, Haojun Jiang et al.

CVPR 2025arXiv:2504.13065
6
citations
#6676

Differentiable Generalized Sliced Wasserstein Plans

Laetitia Chapel, Romain Tavenard, Samuel Vaiter

NEURIPS 2025arXiv:2505.22049
6
citations
#6677

Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution

Bozhou Zhang, Nan Song, jingyu li et al.

NEURIPS 2025oralarXiv:2510.11092
6
citations
#6678

BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

Eunjin Kim, HYEONJIN KIM, Kyong Hwan Jin et al.

CVPR 2025arXiv:2501.11043
6
citations
#6679

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment

ying ba, Tianyu Zhang, Yalong Bai et al.

ICCV 2025arXiv:2507.19002
6
citations
#6680

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation

David Heineman, Valentin Hofmann, Ian Magnusson et al.

NEURIPS 2025spotlightarXiv:2508.13144
6
citations
#6681

DyMU: Dynamic Merging and Virtual Unmerging for Efficient Variable-Length VLMs

Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong et al.

NEURIPS 2025
6
citations
#6682

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

Yongqi Huang, Peng Ye, Chenyu Huang et al.

CVPR 2025arXiv:2503.01359
6
citations
#6683

What Matters in Data for DPO?

Yu Pan, Zhongze Cai, Huaiyang Zhong et al.

NEURIPS 2025arXiv:2508.18312
6
citations
#6684

Learning from Streaming Video with Orthogonal Gradients

Tengda Han, Dilara Gokay, Joseph Heyward et al.

CVPR 2025arXiv:2504.01961
6
citations
#6685

Time-Aware Auto White Balance in Mobile Photography

Mahmoud Afifi, Luxi Zhao, Abhijith Punnappurath et al.

ICCV 2025arXiv:2504.05623
6
citations
#6686

Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning

Yihong Tang, Kehai Chen, Muyun Yang et al.

NEURIPS 2025arXiv:2506.01748
6
citations
#6687

Towards RAW Object Detection in Diverse Conditions

Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.

CVPR 2025highlightarXiv:2411.15678
6
citations
#6688

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

Daniel Kunin, Giovanni Luca Marchetti, Feng Chen et al.

NEURIPS 2025arXiv:2506.06489
6
citations
#6689

OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions

Yuanhao Cai, HE Zhang, Xi Chen et al.

NEURIPS 2025oralarXiv:2506.23361
6
citations
#6690

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

Zhuqiang Lu, Zhenfei Yin, Mengwei He et al.

ICCV 2025arXiv:2412.09919
6
citations
#6691

RIGNO: A Graph-based Framework For Robust And Accurate Operator Learning For PDEs On Arbitrary Domains

Sepehr Mousavi, Shizheng Wen, Levi Lingsch et al.

NEURIPS 2025oralarXiv:2501.19205
6
citations
#6692

From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision

Chuang Yu, Jinmiao Zhao, Yunpeng Liu et al.

ICCV 2025arXiv:2412.11154
6
citations
#6693

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

ChengAo Shen, Wenchao Yu, Ziming Zhao et al.

NEURIPS 2025arXiv:2505.24003
6
citations
#6694

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Xiaoding Yuan, Shitao Tang, Kejie Li et al.

CVPR 2025arXiv:2407.07174
6
citations
#6695

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Amit Attia, Matan Schliserman, Uri Sherman et al.

NEURIPS 2025arXiv:2507.11274
6
citations
#6696

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Ziming Yu, Pan Zhou, Sike Wang et al.

ICCV 2025arXiv:2410.08989
6
citations
#6697

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs

Jiawei Mao, Yuhan Wang, Yucheng Tang et al.

ICCV 2025arXiv:2504.06897
6
citations
#6698

``Principal Components" Enable A New Language of Images

Xin Wen, Bingchen Zhao, Ismail Elezi et al.

ICCV 2025
6
citations
#6699

Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation

Zhenyu Wang, Peter Bühlmann, Zijian Guo

NEURIPS 2025arXiv:2309.02211
6
citations
#6700

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning

Yan Zhang, Yao Feng, Alpár Cseke et al.

ICCV 2025arXiv:2503.17544
6
citations
#6701

Emulating Self-attention with Convolution for Efficient Image Super-Resolution

Dongheon Lee, Seokju Yun, Youngmin Ro

ICCV 2025highlightarXiv:2503.06671
6
citations
#6702

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Xianhang Li, Yanqing Liu, Haoqin Tu et al.

ICCV 2025arXiv:2505.04601
6
citations
#6703

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

ruotian peng, Haiying He, Yake Wei et al.

CVPR 2025arXiv:2504.06666
6
citations
#6704

EVOS: Efficient Implicit Neural Training via EVOlutionary Selector

Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.

CVPR 2025arXiv:2412.10153
6
citations
#6705

ReNeg: Learning Negative Embedding with Reward Guidance

Xiaomin Li, yixuan liu, Takashi Isobe et al.

CVPR 2025highlightarXiv:2412.19637
6
citations
#6706

SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning

Zhi Chen, Zecheng Zhao, Jingcai Guo et al.

ICCV 2025arXiv:2503.10252
6
citations
#6707

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Bogdan Kulynych, Juan Gomez, Georgios Kaissis et al.

NEURIPS 2025arXiv:2507.06969
6
citations
#6708

Learning Interpretable Queries for Explainable Image Classification with Information Pursuit

Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan et al.

ICCV 2025arXiv:2312.11548
6
citations
#6709

Motion Modes: What Could Happen Next?

Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha et al.

CVPR 2025arXiv:2412.00148
6
citations
#6710

Large Language Models Think Too Fast To Explore Effectively

Lan Pan, Hanbo Xie, Robert Wilson

NEURIPS 2025arXiv:2501.18009
6
citations
#6711

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution

Huan Zheng, Wencheng Han, Jianbing Shen

CVPR 2025arXiv:2411.03239
6
citations
#6712

TurboReg: TurboClique for Robust and Efficient Point Cloud Registration

Shaocheng Yan, Pengcheng Shi, Zhenjun Zhao et al.

ICCV 2025arXiv:2507.01439
6
citations
#6713

Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels

Yujia Tong, Yuze Wang, Jingling Yuan et al.

ICCV 2025arXiv:2503.13917
6
citations
#6714

ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving

Yuhang Lu, Jiadong Tu, Yuexin Ma et al.

ICCV 2025arXiv:2507.12499
6
citations
#6715

Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration

Haipeng Fang, Sheng Tang, Juan Cao et al.

CVPR 2025arXiv:2505.11707
6
citations
#6716

SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

Yu Sheng, Jiajun Deng, Xinran Zhang et al.

ICCV 2025arXiv:2505.23044
6
citations
#6717

Truth over Tricks: Measuring and Mitigating Shortcut Learning in Misinformation Detection

Herun Wan, Jiaying Wu, Minnan Luo et al.

NEURIPS 2025arXiv:2506.02350
6
citations
#6718

mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework

Bingyi Liu, Jian Teng, Hongfei Xue et al.

ICCV 2025arXiv:2501.12263
6
citations
#6719

Distilled Prompt Learning for Incomplete Multimodal Survival Prediction

Yingxue Xu, Fengtao ZHOU, Chenyu Zhao et al.

CVPR 2025arXiv:2503.01653
6
citations
#6720

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He et al.

ICCV 2025arXiv:2503.09186
6
citations
#6721

DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors

Runqi Wang, Yang Chen, Sijie Xu et al.

ICCV 2025arXiv:2501.08553
6
citations
#6722

TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

Yinda Chen, Haoyuan Shi, Xiaoyu Liu et al.

ICCV 2025arXiv:2405.16847
6
citations
#6723

Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis

Zixuan Wang, DUO PENG, Feng Chen et al.

CVPR 2025arXiv:2504.01515
6
citations
#6724

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models

Yu Cheng, Fajie Yuan

ICCV 2025arXiv:2503.14325
6
citations
#6725

EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

Yuqian Yuan, Ronghao Dang, long li et al.

NEURIPS 2025oralarXiv:2506.05287
6
citations
#6726

Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers

Zhengliang Shi, Lingyong Yan, Dawei Yin et al.

NEURIPS 2025arXiv:2505.20128
6
citations
#6727

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

Weili Zeng, Ziyuan Huang, Kaixiang Ji et al.

ICCV 2025arXiv:2503.21817
6
citations
#6728

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.

ICCV 2025arXiv:2504.02386
6
citations
#6729

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment

Huangbiao Xu, Xiao Ke, Huanqi Wu et al.

CVPR 2025
6
citations
#6730

RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting

Qiyu Dai, Xingyu Ni, Qianfan Shen et al.

CVPR 2025arXiv:2503.21442
6
citations
#6731

Denoising Functional Maps: Diffusion Models for Shape Correspondence

Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik

CVPR 2025arXiv:2503.01845
6
citations
#6732

Textured 3D Regenerative Morphing with 3D Diffusion Prior

Songlin Yang, Yushi LAN, Honghua Chen et al.

ICCV 2025arXiv:2502.14316
6
citations
#6733

Snakes and Ladders: Two Steps Up for VideoMamba

Hui Lu, Albert Ali Salah, Ronald Poppe

ICCV 2025arXiv:2406.19006
6
citations
#6734

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Wenhao Wang, Yi Yang

ICCV 2025arXiv:2411.04709
6
citations
#6735

ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models

Zifu Wan, Ce Zhang, Silong Yong et al.

ICCV 2025arXiv:2507.00898
6
citations
#6736

Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning

yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.

CVPR 2025arXiv:2505.11182
6
citations
#6737

Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction

Luyao Tang, Kunze Huang, Yuxuan Yuan et al.

ICCV 2025highlightarXiv:2508.10731
6
citations
#6738

Dual-Process Image Generation

Grace Luo, Jonathan Granskog, Aleksander Holynski et al.

ICCV 2025arXiv:2506.01955
6
citations
#6739

CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization

Jan Ackermann, Jonas Kulhanek, Shengqu Cai et al.

ICCV 2025arXiv:2506.21117
6
citations
#6740

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Abduljalil Radman, Jorma Laaksonen

CVPR 2025
6
citations
#6741

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Yunheng Li, Yuxuan Li, Quan-Sheng Zeng et al.

ICCV 2025arXiv:2412.06244
6
citations
#6742

Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling

Yanchen Luo, ZHIYUAN LIU, Yi Zhao et al.

NEURIPS 2025arXiv:2503.15567
6
citations
#6743

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.

CVPR 2025arXiv:2501.09333
6
citations
#6744

SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting

Mengjiao Ma, Qi Ma, Yue Li et al.

NEURIPS 2025arXiv:2506.08710
6
citations
#6745

DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

Runze Zhang, Guoguang Du, Xiaochuan Li et al.

ICCV 2025highlightarXiv:2503.06053
6
citations
#6746

AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement

J Rosser, Jakob Foerster

NEURIPS 2025spotlightarXiv:2502.00757
6
citations
#6747

GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene

Xiao Chen, Tai Wang, Quanyi Li et al.

ICCV 2025arXiv:2505.20294
6
citations
#6748

Multi-turn Consistent Image Editing

Zijun Zhou, Yingying Deng, Xiangyu He et al.

ICCV 2025arXiv:2505.04320
6
citations
#6749

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization

Zhentao Tan, Ben Xue, Jian Jia et al.

ICCV 2025arXiv:2412.10443
6
citations
#6750

Scaling Offline RL via Efficient and Expressive Shortcut Models

Nicolas Espinosa-Dice, Yiyi Zhang, Yiding Chen et al.

NEURIPS 2025arXiv:2505.22866
6
citations
#6751

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

Yuchen Xia, Quan Yuan, Guiyang Luo et al.

CVPR 2025arXiv:2411.16799
6
citations
#6752

Parameter Efficient Fine-tuning via Explained Variance Adaptation

Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.

NEURIPS 2025arXiv:2410.07170
6
citations
#6753

Dense-SfM: Structure from Motion with Dense Consistent Matching

JongMin Lee, Sungjoo Yoo

CVPR 2025arXiv:2501.14277
6
citations
#6754

PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask

Jeongho Kim, Hoiyeong Jin, Sunghyun Park et al.

ICCV 2025arXiv:2412.16978
6
citations
#6755

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

Kaining Ying, Henghui Ding, Guangquan Jie et al.

ICCV 2025arXiv:2507.22886
6
citations
#6756

On Denoising Walking Videos for Gait Recognition

Dongyang Jin, Chao Fan, Jingzhe Ma et al.

CVPR 2025arXiv:2505.18582
6
citations
#6757

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

Jiahao Wang, Ning Kang, Lewei Yao et al.

ICCV 2025arXiv:2501.12976
6
citations
#6758

Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models

Quan Zhang, Jinwei Fang, Rui Yuan et al.

CVPR 2025arXiv:2411.08466
6
citations
#6759

Open-Vocabulary Octree-Graph for 3D Scene Understanding

Zhigang Wang, Yifei Su, Chenhui Li et al.

ICCV 2025arXiv:2411.16253
6
citations
#6760

Behavior Injection: Preparing Language Models for Reinforcement Learning

Zhepeng Cen, Yihang Yao, William Han et al.

NEURIPS 2025arXiv:2505.18917
6
citations
#6761

SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction

Zhengyuan Li, Kai Cheng, Anindita Ghosh et al.

CVPR 2025arXiv:2503.18211
6
citations
#6762

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts

Heming Zou, Yunliang Zang, Wutong Xu et al.

NEURIPS 2025arXiv:2510.08396
6
citations
#6763

Sparfels: Fast Reconstruction from Sparse Unposed Imagery

Shubhendu Jena, Amine Ouasfi, Mae Younes et al.

ICCV 2025highlightarXiv:2505.02178
6
citations
#6764

CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation

Xiangyang Luo, Ye Zhu, Yunfei Liu et al.

ICCV 2025arXiv:2507.02691
6
citations
#6765

OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models

Huanpeng Chu, Wei Wu, Guanyu Feng et al.

ICCV 2025arXiv:2508.16212
6
citations
#6766

$\Psi$-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Taehoon Yoon, Yunhong Min, Kyeongmin Yeo et al.

NEURIPS 2025spotlightarXiv:2506.01320
6
citations
#6767

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation

Ao Ma, Jiasong Feng, Ke Cao et al.

ICCV 2025arXiv:2508.08949
6
citations
#6768

Augmented Mass-Spring Model for Real-Time Dense Hair Simulation

Jorge Herrera, Yi Zhou, Xin Sun et al.

ICCV 2025arXiv:2412.17144
6
citations
#6769

SpectralAR: Spectral Autoregressive Visual Generation

Yuanhui Huang, Weiliang Chen, Wenzhao Zheng et al.

ICCV 2025arXiv:2506.10962
6
citations
#6770

VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis

Zhifeng Wang, Renjiao Yi, Xin Wen et al.

CVPR 2025arXiv:2503.12758
6
citations
#6771

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Shuyang Hao, Bryan Hooi, Jun Liu et al.

CVPR 2025arXiv:2411.18000
6
citations
#6772

ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation

Qizhen Lan, Qing Tian

ICCV 2025arXiv:2503.06307
6
citations
#6773

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

Ruiyang Zhang, Hu Zhang, Zhedong Zheng

ICCV 2025arXiv:2408.00619
6
citations
#6774

Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers

Johanna Vielhaben, Dilyara Bareeva, Jim Berend et al.

NEURIPS 2025arXiv:2412.06639
6
citations
#6775

Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark

Changsheng Gao, Yifan Ma, Qiaoxi Chen et al.

ICCV 2025arXiv:2412.04307
6
citations
#6776

Edit360: 2D Image Edits to 3D Assets from Any Angle

Junchao Huang, Xinting Hu, Shaoshuai Shi et al.

ICCV 2025highlightarXiv:2506.10507
6
citations
#6777

Towards Realistic Example-based Modeling via 3D Gaussian Stitching

Xinyu Gao, Ziyi Yang, Bingchen Gong et al.

CVPR 2025arXiv:2408.15708
6
citations
#6778

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng, Xiang Zhang, Zirui Wang et al.

ICCV 2025arXiv:2508.00728
6
citations
#6779

Exploring Simple Open-Vocabulary Semantic Segmentation

Zihang Lai

CVPR 2025arXiv:2401.12217
6
citations
#6780

Thought Communication in Multiagent Collaboration

Yujia Zheng, Zhuokai Zhao, Zijian Li et al.

NEURIPS 2025spotlightarXiv:2510.20733
6
citations
#6781

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Yudong Jin, Sida Peng, Xuan Wang et al.

ICCV 2025arXiv:2507.13344
6
citations
#6782

WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images

Yansong Guo, Jie Hu, Yansong Qu et al.

ICCV 2025arXiv:2503.08407
6
citations
#6783

AdsQA: Towards Advertisement Video Understanding

Xinwei Long, Kai Tian, Peng Xu et al.

ICCV 2025arXiv:2509.08621
6
citations
#6784

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei, Chenyu Lin, Yu Qiu et al.

CVPR 2025arXiv:2503.17261
6
citations
#6785

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Xuran Ma, Yexin Liu, Yaofu LIU et al.

ICCV 2025arXiv:2504.03140
6
citations
#6786

KLASS: KL-Guided Fast Inference in Masked Diffusion Models

Seo Hyun Kim, Sunwoo Hong, Hojung Jung et al.

NEURIPS 2025spotlightarXiv:2511.05664
6
citations
#6787

Towards In-the-wild 3D Plane Reconstruction from a Single Image

Jiachen Liu, Rui Yu, Sili Chen et al.

CVPR 2025highlightarXiv:2506.02493
6
citations
#6788

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation

Yueru Jia, Jiaming Liu, Sixiang Chen et al.

CVPR 2025
6
citations
#6789

GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting

Xiaobao Wei, Peng Chen, Guangyu Li et al.

ICCV 2025highlightarXiv:2411.12981
6
citations
#6790

DreamFuse: Adaptive Image Fusion with Diffusion Transformer

Junjia Huang, Pengxiang Yan, Jiyang Liu et al.

ICCV 2025arXiv:2504.08291
6
citations
#6791

GIViC: Generative Implicit Video Compression

Ge Gao, Siyue Teng, Tianhao Peng et al.

ICCV 2025arXiv:2503.19604
6
citations
#6792

SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL

Yue Gong, Chuan Lei, Xiao Qin et al.

NEURIPS 2025arXiv:2506.04494
6
citations
#6793

RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation

Yuhan Li, Xianfeng Tan, Wenxiang Shang et al.

ICCV 2025highlightarXiv:2411.19528
6
citations
#6794

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Lu Chen, Yizhou Wang, SHIXIANG TANG et al.

ICCV 2025arXiv:2502.05857
6
citations
#6795

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment

Darshana Saravanan, Varun Gupta, Darshan Singh S et al.

CVPR 2025arXiv:2406.10889
6
citations
#6796

Hierarchical Cross-modal Prompt Learning for Vision-Language Models

Hao Zheng, Shunzhi Yang, Zhuoxin He et al.

ICCV 2025arXiv:2507.14976
6
citations
#6797

IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation

Zijie Lin, Yang Zhang, Xiaoyan Zhao et al.

NEURIPS 2025arXiv:2506.13229
6
citations
#6798

Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

Vivek Myers, Bill Zheng, Anca Dragan et al.

NEURIPS 2025oralarXiv:2502.05454
6
citations
#6799

Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model

Jian Zhu, He Wang, Yang Xu et al.

CVPR 2025arXiv:2505.11800
6
citations
#6800

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving

Chen Shi, Shaoshuai Shi, Kehua Sheng et al.

ICCV 2025arXiv:2505.19239
6
citations