Most Cited 2025 &quot;subword tokenization&quot; Papers

CVPR 2025arXiv:2509.00649

#8802

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Aviral Chharia, Wenbo Gou, Haoye Dong

NEURIPS 2025arXiv:2506.04667

#8803

FlashMoE: Fast Distributed MoE in a Single Kernel

Osayamen Aimuyo, Byungsoo Oh, Rachee Singh

#8804

STAFF: Speculative Coreset Selection for Task-Specific Fine-tuning

Xiaoyu Zhang, Juan Zhai, Shiqing Ma et al.

ICLR 2025

ICLR 2025arXiv:2503.00733

#8805

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation

Alexander Liu, Sang-gil Lee, Chao-Han Huck Yang et al.

NEURIPS 2025spotlightarXiv:2510.19779

#8806

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Yuezhou Hu, Jiaxin Guo, Xinyu Feng et al.

NEURIPS 2025oralarXiv:2505.20640

#8807

IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios

Yifan Li, Yuhang Chen, Anh Dao et al.

ICCV 2025arXiv:2503.09733

#8808

I2V3D: Controllable Image-to-video Generation with 3D Guidance

Zhiyuan Zhang, Dongdong Chen, Jing Liao

CVPR 2025arXiv:2502.19955

#8809

RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges

Thibaut Loiseau, Guillaume Bourmaud

ICCV 2025highlightarXiv:2502.07001

#8810

From Image to Video: An Empirical Study of Diffusion Representations

Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.

NEURIPS 2025arXiv:2505.15952

#8811

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj et al.

ICLR 2025arXiv:2503.14338

#8812

Higher-Order Graphon Neural Networks: Approximation and Cut Distance

Daniel Herbst, Stefanie Jegelka

ICLR 2025arXiv:2412.04318

#8813

The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation

Fredrik Carlsson, Fangyu Liu, Daniel Ward et al.

NEURIPS 2025oralarXiv:2506.13558

#8814

X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

Yu Yang, Alan Liang, Jianbiao Mei et al.

NEURIPS 2025arXiv:2505.21060

#8815

Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles

Peng Wang, Xiang Liu, Peidong Liu

CVPR 2025arXiv:2504.00527

#8816

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning

Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.

CVPR 2025arXiv:2411.15255

#8817

OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction

Gehui Li, Bin Chen, Chen Zhao et al.

NEURIPS 2025arXiv:2412.09585

#8818

Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation

Jitesh Jain, Zhengyuan Yang, Humphrey Shi et al.

NEURIPS 2025arXiv:2505.23052

#8819

RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models

Jiarui Zhang, Xiangyu Liu, Yong Hu et al.

ICCV 2025arXiv:2508.01126

#8820

UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation

Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.

CVPR 2025arXiv:2506.04453

#8821

Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning

Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury et al.

NEURIPS 2025arXiv:2505.12697

#8822

Towards A Generalist Code Embedding Model Based On Massive Data Synthesis

Chaofan Li, Jianlyu Chen, Yingxia Shao et al.

NEURIPS 2025spotlightarXiv:2501.11447

#8823

Decomposing Interventional Causality into Synergistic, Redundant, and Unique Components

Abel Jansma

ICLR 2025arXiv:2412.09945

#8824

Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information

Xinhao Zhong, Bin Chen, Hao Fang et al.

NEURIPS 2025arXiv:2503.16872

#8825

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Xuan Wang, Siyuan Liang, Dongping Liao et al.

ICCV 2025arXiv:2508.14461

#8826

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

shanlin sun, Yifan Wang, Hanwen Zhang et al.

ICLR 2025arXiv:2503.09046

#8827

Discovering Influential Neuron Path in Vision Transformers

Yifan Wang, Yifei Liu, Yingdong Shi et al.

#8828

SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization

Jianyu LAI, Sixiang Chen, yunlong lin et al.

NEURIPS 2025arXiv:2504.06560

#8829

NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables

Lanrui Wang, Mingyu Zheng, Hongyin Tang et al.

NEURIPS 2025arXiv:2507.07995

#8830

Single-pass Adaptive Image Tokenization for Minimum Program Search

Shivam Duggal, Sanghyun Byun, Bill Freeman et al.

CVPR 2025highlightarXiv:2412.13183

#8831

Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

Guoxing Sun, Rishabh Dabral, Heming Zhu et al.

ICLR 2025arXiv:2412.06619

#8832

Copyright-Protected Language Generation via Adaptive Model Fusion

Javier Abad, Konstantin Donhauser, Francesco Pinto et al.

ICCV 2025arXiv:2409.05381

#8833

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Xudong Li, Zihao Huang, Yan Zhang et al.

CVPR 2025arXiv:2412.00952

#8834

ESCAPE: Equivariant Shape Completion via Anchor Point Encoding

Burak Bekci, Nassir Navab, Federico Tombari et al.

NEURIPS 2025oralarXiv:2506.16371

#8835

AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

Yunhao Hou, Bochao Zou, Min Zhang et al.

NEURIPS 2025arXiv:2405.07098

#8836

Interpretable Global Minima of Deep ReLU Neural Networks on Sequentially Separable Data

Thomas Chen, Patricia Muñoz Ewald

NEURIPS 2025arXiv:2410.07961

#8837

QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design

Rui Yang, Ziruo Wang, Yuntian Gu et al.

NEURIPS 2025arXiv:2510.21189

#8838

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Yukun Jiang, Mingjie Li, Michael Backes et al.

CVPR 2025arXiv:2504.21749

#8839

Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space

Leonhard Sommer, Olaf Dünkel, Christian Theobalt et al.

CVPR 2025arXiv:2412.04282

#8840

Learnable Infinite Taylor Gaussian for Dynamic View Rendering

Bingbing Hu, Yanyan Li, rui xie et al.

NEURIPS 2025arXiv:2509.24739

#8841

Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

Tien Nguyen, Dac Nguyen, Duc Nguyen The Minh et al.

ICCV 2025arXiv:2508.21060

#8842

Multi-View 3D Point Tracking

Frano Rajič, Haofei Xu, Marko Mihajlovic et al.

ICCV 2025arXiv:2507.00790

#8843

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

Li Huaqiu, Yong Wang, Tongwen Huang et al.

CVPR 2025arXiv:2503.02009

#8844

Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization

Jamie Wynn, Zawar Qureshi, Jakub Powierza et al.

ICCV 2025arXiv:2407.17399

#8845

Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising

Sébastien Herbreteau, Michael Unser

NEURIPS 2025arXiv:2505.14177

#8846

From stability of Langevin diffusion to convergence of proximal MCMC for non-log-concave sampling

Marien Renaud, Valentin De Bortoli, Arthur Leclaire et al.

CVPR 2025arXiv:2502.19691

#8847

Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach

Chen-Chen Zong, Sheng-Jun Huang

NEURIPS 2025oralarXiv:2506.06218

#8848

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.

ICCV 2025arXiv:2504.20041

#8849

Learning Streaming Video Representation via Multitask Training

Yibin Yan, Jilan Xu, Shangzhe Di et al.

ICLR 2025arXiv:2505.00031

#8850

Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving

Jin Zhang, Flood Sung, Zhilin Yang et al.

CVPR 2025arXiv:2504.04834

#8851

Learning Affine Correspondences by Integrating Geometric Constraints

Pengju Sun, Banglei Guan, Zhenbao Yu et al.

NEURIPS 2025arXiv:2506.07570

#8852

OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization

Yixuan Yang, Zhen Luo, Tongsheng Ding et al.

CVPR 2025arXiv:2412.04317

#8853

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression

Bo Tong, Bokai Lai, Yiyi Zhou et al.

CVPR 2025arXiv:2411.19824

#8854

SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens

Chi Su, Xiaoxuan Ma, Jiajun Su et al.

ICLR 2025arXiv:2406.11520

#8855

Operator Deep Smoothing for Implied Volatility

Ruben Wiedemann, Antoine (Jack) Jacquier, Lukas Gonon

#8856

Solving Partial Differential Equations via Radon Neural Operator

Wenbin Lu, Yihan Chen, Junnan Xu et al.

NEURIPS 2025arXiv:2505.19406

#8857

Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Tianle Li, Jihai Zhang, Yongming Rao et al.

CVPR 2025arXiv:2411.11361

#8858

Scalable Autoregressive Monocular Depth Estimation

Jinhong Wang, Jintai Chen, Jian liu et al.

NEURIPS 2025oralarXiv:2410.10101

#8859

Learning Linear Attention in Polynomial Time

Morris Yau, Ekin Akyürek, Jiayuan Mao et al.

ICLR 2025arXiv:2502.19718

#8860

Learning Mask Invariant Mutual Information for Masked Image Modeling

Tao Huang, Yanxiang Ma, Shan You et al.

CVPR 2025highlightarXiv:2503.13016

#8861

Efficient Motion-Aware Video MLLM

Zijia Zhao, Yuqi Huo, Tongtian Yue et al.

ICCV 2025arXiv:2503.16177

#8862

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering

Shiyong Liu, Xiao Tang, Zhihao Li et al.

NEURIPS 2025arXiv:2502.19335

#8863

Gatekeeper: Improving Model Cascades Through Confidence Tuning

Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha et al.

CVPR 2025highlightarXiv:2503.19718

#8864

QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers

Natacha Kuete Meli, Vladislav Golyanik, Marcel Seelbach Benkner et al.

NEURIPS 2025arXiv:2507.04451

#8865

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

Zheyuan Liu, Munan Ning, Qihui Zhang et al.

CVPR 2025arXiv:2504.09623

#8866

Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding

Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.

CVPR 2025arXiv:2504.01019

#8867

MixerMDM: Learnable Composition of Human Motion Diffusion Models

Pablo Ruiz-Ponce, German Barquero, Cristina Palmero et al.

#8868

HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation

Hanxiang Ren, Li Sun, Xulong Wang et al.

ICLR 2025

ICCV 2025arXiv:2503.07152

#8869

Controllable 3D Outdoor Scene Generation via Scene Graphs

Yuheng Liu, Xinke Li, Yuning Zhang et al.

NEURIPS 2025arXiv:2505.13432

#8870

Synthetic-powered predictive inference

Meshi Bashari, Roy Maor Lotan, Yonghoon Lee et al.

ICCV 2025arXiv:2503.15283

#8871

TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models

Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.

CVPR 2025arXiv:2504.09621

#8872

Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

Jiuchen Chen, Xinyu Yan, Qizhi Xu et al.

ICCV 2025arXiv:2408.13697

#8873

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection

Yingjian Chen, Lei Zhang, Yakun Niu

NEURIPS 2025arXiv:2505.20426

#8874

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Yunlong Tang, Pinxin Liu, Mingqian Feng et al.

NEURIPS 2025arXiv:2505.21866

#8875

CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing

Guozhen Zhu, Yuqian Hu, Weihang Gao et al.

NEURIPS 2025arXiv:2505.17771

#8876

TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving

Yanping Fu, Xinyuan Liu, Tianyu Li et al.

#8877

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Kangan Qian, Jinyu Miao, Xinyu Jiao et al.

ICCV 2025

ICCV 2025arXiv:2503.10905

#8878

Learning to Inference Adaptively for Multimodal Large Language Models

Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.

ICCV 2025arXiv:2503.15877

#8879

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Tiange Xiang, Kai Li, Chengjiang Long et al.

NEURIPS 2025arXiv:2509.24693

#8880

Brain Harmony: A Multimodal Foundation Model Unifying Morphology and Function into 1D Tokens

Zijian Dong, Ruilin Li, Joanna Chong et al.

NEURIPS 2025arXiv:2506.02009

#8881

STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds

Yinfang Chen, Jiaqi Pan, Jackson Clark et al.

#8882

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

Jan Drgona, Mahantesh Halappanavar, Frank Liu et al.

ICLR 2025

NEURIPS 2025oralarXiv:2509.07447

#8883

In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting

Taiying Peng, Jiacheng Hua, Miao Liu et al.

ICLR 2025arXiv:2406.11608

#8884

Visually Consistent Hierarchical Image Classification

Seulki Park, Youren Zhang, Stella Yu et al.

CVPR 2025arXiv:2411.18807

#8885

Reconstructing Animals and the Wild

Peter Kulits, Michael J. Black, Silvia Zuffi

ICLR 2025arXiv:2408.07191

#8886

Joint Graph Rewiring and Feature Denoising via Spectral Resonance

Jonas Linkerhägner, Cheng Shi, Ivan Dokmanić

ICCV 2025arXiv:2503.11780

#8887

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

Tianyi Zhao, Boyang Liu, Yanglei Gao et al.

NEURIPS 2025arXiv:2509.16500

#8888

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

CVPR 2025arXiv:2504.09086

#8889

RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection

Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.

NEURIPS 2025arXiv:2506.01084

#8890

zip2zip: Inference-Time Adaptive Tokenization via Online Compression

Saibo Geng, Nathan Ranchin, Yunzhen Yao et al.

CVPR 2025arXiv:2506.11493

#8891

Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation

Long Tung Vuong, Hoang Phan, Vy Vo et al.

NEURIPS 2025spotlightarXiv:2505.21501

#8892

Vision Transformers with Self-Distilled Registers

Zipeng Yan, Yinjie Chen, Chong Zhou et al.

NEURIPS 2025arXiv:2510.12114

#8893

Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration

Wenjie Li, Xiangyi Wang, Heng Guo et al.

CVPR 2025arXiv:2502.17435

#8894

GCC: Generative Color Constancy via Diffusing a Color Checker

Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang et al.

NEURIPS 2025arXiv:2508.07434

#8895

Let's Revise Step-by-Step: A Unified Local Search Framework for Code Generation with LLMs

Zhiyi Lyu, Jianguo Huang, Yanchen Deng et al.

#8896

One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

Senmao Li, Lei Wang, Kai Wang et al.

NEURIPS 2025arXiv:2505.20268

#8897

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Fan Chen, Zeyu Jia, Alexander Rakhlin et al.

ICLR 2025arXiv:2503.17821

#8898

OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination

Tobias Gessler, Tin Dizdarevic, Ani Calinescu et al.

NEURIPS 2025arXiv:2405.16246

#8899

Conformal Prediction for Ensembles: Improving Efficiency via Score-Based Aggregation

Yash Patel, Eduardo Ochoa Rivera, Ambuj Tewari

CVPR 2025arXiv:2504.13820

#8900

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

Yang Yue, Yulin Wang, Chenxin Tao et al.

CVPR 2025arXiv:2601.07377

#8901

Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation

Jiao Xu, Xin Chen, Lihe Zhang

ICLR 2025arXiv:2410.22625

#8902

Non-Equilibrium Dynamics of Hybrid Continuous-Discrete Ground-State Sampling

Timothee Leleu, Sam Reifenstein

ICCV 2025arXiv:2503.06339

#8903

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Gaurav Patel, Qiang Qiu

CVPR 2025arXiv:2502.19834

#8904

Knowledge Bridger: Towards Training-Free Missing Modality Completion

Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.

#8905

Multi-modal Medical Diagnosis via Large-small Model Collaboration

Wanyi Chen, Zihua Zhao, Jiangchao Yao et al.

NEURIPS 2025oralarXiv:2506.17093

#8906

Identifiability of Deep Polynomial Neural Networks

Konstantin Usevich, Ricardo Borsoi, Clara Dérand et al.

NEURIPS 2025arXiv:2506.02935

#8907

MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver

yuepeng zheng, Fu Luo, Zhenkun Wang et al.

ICCV 2025arXiv:2412.01250

#8908

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025arXiv:2508.09062

#8909

VertexRegen: Mesh Generation with Continuous Level of Detail

Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.

ICCV 2025arXiv:2503.04151

#8910

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Jie Xu, Na Zhao, Gang Niu et al.

ICLR 2025arXiv:2502.09935

#8911

Precise Parameter Localization for Textual Generation in Diffusion Models

Łukasz Staniszewski, Bartosz Cywiński, Franziska Boenisch et al.

NEURIPS 2025arXiv:2506.02672

#8912

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

Shihan Dou, Ming Zhang, Chenhao Huang et al.

NEURIPS 2025arXiv:2509.12178

#8913

All that structure matches does not glitter

Maya Martirossyan, Thomas Egg, Philipp Höllmer et al.

NEURIPS 2025arXiv:2508.12815

#8914

Learning to Steer: Input-dependent Steering for Multimodal LLMs

Jayneel Parekh, Pegah KHAYATAN, Mustafa Shukor et al.

ICLR 2025arXiv:2511.04769

#8915

ReGen: Generative Robot Simulation via Inverse Design

Peter (Phat) Nguyen, Johnson (Tsun-Hsuan) Wang, Zhang-Wei Hong et al.

CVPR 2025arXiv:2408.15127

#8916

T-FAKE: Synthesizing Thermal Images for Facial Landmarking

Philipp Flotho, Moritz Piening, Anna Kukleva et al.

CVPR 2025arXiv:2504.05457

#8917

Taxonomy-Aware Evaluation of Vision-Language Models

Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr et al.

NEURIPS 2025arXiv:2506.02961

#8918

FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

Yan Gao, Massimo R. Scamarcia, Javier Fernandez-Marques et al.

NEURIPS 2025arXiv:2509.26096

#8919

EVODiff: Entropy-aware Variance Optimized Diffusion Inference

Shigui Li, Wei Chen, Delu Zeng

NEURIPS 2025arXiv:2502.08021

#8920

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

Pai Liu, Lingfeng Zhao, Shivangi Agarwal et al.

ICCV 2025arXiv:2508.04682

#8921

TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction

Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.

ICCV 2025arXiv:2508.03284

#8922

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Shaofeng Yin, Ting Lei, Yang Liu

NEURIPS 2025spotlightarXiv:2505.18966

#8923

Protein Design with Dynamic Protein Vocabulary

Nuowei Liu, Jiahao Kuang, Yanting Liu et al.

CVPR 2025highlightarXiv:2411.12593

#8924

AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

Yuanbin Man, Ying Huang, Chengming Zhang et al.

ICCV 2025arXiv:2501.01425

#8925

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.

NEURIPS 2025arXiv:2506.11097

#8926

C-SEO Bench: Does Conversational SEO Work?

Haritz Puerto, Martin Gubri, Tommaso Green et al.

CVPR 2025arXiv:2504.04756

#8927

Continuous Locomotive Crowd Behavior Generation

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2025arXiv:2412.11519

#8928

LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model

Xi Wang, Hongzhen Li, Heng Fang et al.

ICCV 2025arXiv:2407.03010

#8929

CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee, Jiwan Seo, Kiljoon Han et al.

ICCV 2025arXiv:2412.19142

#8930

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

Siyu Jiao, Haoye Dong, Yuyang Yin et al.

NEURIPS 2025arXiv:2511.01755

#8931

3EED: Ground Everything Everywhere in 3D

Rong Li, Yuhao Dong, Tianshuai Hu et al.

NEURIPS 2025arXiv:2502.01618

#8932

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Isha Puri, Shivchander Sudalairaj, Guangxuan Xu et al.

NEURIPS 2025arXiv:2509.17664

#8933

SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models

Pingyi Chen, Yujing Lou, Shen Cao et al.

CVPR 2025highlightarXiv:2412.06191

#8934

Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range

Ziyuan Qu, Zihao Zou, Vivek Boominathan et al.

CVPR 2025arXiv:2412.11365

#8935

BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions

Wonyong Seo, Jihyong Oh, Munchurl Kim

NEURIPS 2025arXiv:2506.07848

#8936

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Teng Hu, Zhentao Yu, Zhengguang Zhou et al.

CVPR 2025arXiv:2503.01167

#8937

Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data

Haoxin Li, Boyang Li

NEURIPS 2025arXiv:2506.12110

#8938

EconGym: A Scalable AI Testbed with Diverse Economic Tasks

Qirui Mi, Qipeng Yang, Zijun Fan et al.

CVPR 2025arXiv:2503.18817

#8939

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Jeonghyeon Kim, Sangheum Hwang

#8940

HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver

Cong Wei, Haoxian Tan, Yujie Zhong et al.

ICCV 2025highlightarXiv:2412.06293

#8941

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

Qifan Yu, Zhebei Shen, Zhongqi Yue et al.

ICLR 2025arXiv:2502.04730

#8942

PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

Tianyu Xie, David Harry Tyensoung Richman, Jiansi Gao et al.

ICCV 2025arXiv:2411.13949

#8943

SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning

Ziqi Wang, Chang Che, Qi Wang et al.

ICLR 2025arXiv:2410.04234

#8944

Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Zi Wang, Divyam Anshumaan, Ashish Hooda et al.

ICCV 2025arXiv:2411.19921

#8945

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation

Wenjia Wang, Liang Pan, Zhiyang Dou et al.

NEURIPS 2025arXiv:2508.06044

#8946

NEP: Autoregressive Image Editing via Next Editing Token Prediction

Huimin Wu, Xiaojian (Shawn) Ma, Haozhe Zhao et al.

NEURIPS 2025arXiv:2507.02861

#8947

LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans

Zhening Huang, Xiaoyang Wu, Fangcheng Zhong et al.

NEURIPS 2025arXiv:2506.00362

#8948

FSNet: Feasibility-Seeking Neural Network for Constrained Optimization with Guarantees

Hoang Nguyen, Priya Donti

NEURIPS 2025arXiv:2503.16578

#8949

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

chen yang, Hui Wang, Shiyao Wang et al.

NEURIPS 2025arXiv:2506.11784

#8950

GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers

Guang Liang, Xinyao Liu, Jianxin Wu

ICLR 2025arXiv:2410.13726

#8951

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation

Hanbo Cheng, Limin Lin, Chenyu Liu et al.

CVPR 2025arXiv:2504.15786

#8952

Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views

Ningli Xu, Rongjun Qin

ICCV 2025arXiv:2504.00502

#8953

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.

NEURIPS 2025arXiv:2505.18384

#8954

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Boyi Wei, Benedikt Stroebl, Jiacen Xu et al.

CVPR 2025arXiv:2504.16023

#8955

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

Song Wang, Xiaolu Liu, Lingdong Kong et al.

ICLR 2025arXiv:2505.04599

#8956

Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness

Michael Crawshaw, Mingrui Liu

CVPR 2025arXiv:2503.10740

#8957

Subnet-Aware Dynamic Supernet Training for Neural Architecture Search

Jeimin Jeon, Youngmin Oh, Junghyup Lee et al.

ICLR 2025arXiv:2410.06232

#8958

Range, not Independence, Drives Modularity in Biologically Inspired Representations

Will Dorrell, Kyle Hsu, Luke Hollingsworth et al.

#8959

$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Xiaojun Guo, Ang Li, Yifei Wang et al.

NEURIPS 2025arXiv:2503.01707

#8960

Metropolis Adjusted Microcanonical Hamiltonian Monte Carlo

Jakob Robnik, Reuben Cohn-Gordon, Uros Seljak

ICCV 2025arXiv:2508.00701

#8961

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

Chende Zheng, Ruiqi suo, Chenhao Lin et al.

CVPR 2025highlightarXiv:2503.19904

#8962

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

Zihang Lai, Andrea Vedaldi

CVPR 2025arXiv:2503.19776

#8963

Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion

Konyul Park, Yecheol Kim, Daehun Kim et al.

CVPR 2025arXiv:2503.23241

#8964

Geometry in Style: 3D Stylization via Surface Normal Deformation

Nam Anh Dinh, Itai Lang, Hyunwoo Kim et al.

NEURIPS 2025spotlightarXiv:2505.18659

#8965

Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees

Sangwoo Park, Matteo Zecchin, Osvaldo Simeone

NEURIPS 2025arXiv:2506.04721

#8966

Sparta Alignment: Collectively Aligning Multiple Language Models through Combat

Yuru Jiang, Wenxuan Ding, Shangbin Feng et al.

ICCV 2025arXiv:2506.21547

#8967

SAM4D: Segment Anything in Camera and LiDAR Streams

Jianyun Xu, Song Wang, Ziqian Ni et al.

ICCV 2025arXiv:2507.18060

#8968

BokehDiff: Neural Lens Blur with One-Step Diffusion

Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.

NEURIPS 2025spotlightarXiv:2508.02546

#8969

What are you sinking? A geometric approach on attention sink

Valeria Ruscio, Umberto Nanni, Fabrizio Silvestri

#8970

Understanding Contrastive Learning via Gaussian Mixture Models

Parikshit Bansal, Ali Kavis, Sujay Sanghavi

CVPR 2025arXiv:2503.02593

#8971

CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework

Yanlong Xu, Haoxuan Qu, Jun Liu et al.

NEURIPS 2025arXiv:2506.07239

#8972

VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code

Raghu Vamshi Hemadri, Jitendra Bhandari, Andre Nakkab et al.

CVPR 2025arXiv:2503.21150

#8973

The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation

Yuhan Liu, Yixiong Zou, Yuhua Li et al.

NEURIPS 2025arXiv:2501.19122

#8974

FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling

Hong Huang, Jinhai Yang, Yuan Chen et al.

CVPR 2025arXiv:2505.06218

#8975

Let Humanoids Hike! Integrative Skill Development on Complex Trails

Kwan-Yee Lin, Stella X. Yu

NEURIPS 2025arXiv:2502.17821

#8976

CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems

Rui Liu, Yu Shen, Peng Gao et al.

CVPR 2025arXiv:2411.17249

#8977

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Zhengfei Kuang, Tianyuan Zhang, Kai Zhang et al.

ICCV 2025arXiv:2508.00366

#8978

SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies

Liang Han, Xu Zhang, Haichuan Song et al.

CVPR 2025highlightarXiv:2502.19630

#8979

Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras

Hoonhee Cho, Jae-Young Kang, Youngho Kim et al.

CVPR 2025highlightarXiv:2411.17313

#8980

Event Ellipsometer: Event-based Mueller-Matrix Video Imaging

Ryota Maeda, Yunseong Moon, Seung-Hwan Baek

CVPR 2025arXiv:2503.07481

#8981

Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References

Yitang Li, Mingxian Lin, Zhuo Lin et al.

ICCV 2025arXiv:2411.16789

#8982

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.

NEURIPS 2025arXiv:2501.18877

#8983

Mitigating Sexual Content Generation via Embedding Distortion in Text-conditioned Diffusion Models

Jaesin Ahn, Heechul Jung

ICCV 2025arXiv:2508.19649

#8984

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.

ICCV 2025arXiv:2412.14974

#8985

Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations

Jianhua Sun, Yuxuan Li, Jiude Wei et al.

NEURIPS 2025arXiv:2502.12128

#8986

LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities

Florian Sestak, Artur Toshev, Andreas Fürst et al.

NEURIPS 2025arXiv:2506.11849

#8987

Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values

R. Teal Witter, Yurong Liu, Christopher Musco

CVPR 2025arXiv:2503.17984

#8988

Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting

Maochen Yang, Zekun Li, Jian Zhang et al.

NEURIPS 2025arXiv:2512.04550

#8989

AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees

Yangning Li, Shaoshen Chen, Yinghui Li et al.

CVPR 2025arXiv:2410.11374

#8990

Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing

Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung et al.

CVPR 2025arXiv:2503.18987

#8991

Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization

Xiran Wang, Jian Zhang, Lei Qi et al.

ICLR 2025arXiv:2501.13273

#8992

Enhancing Robust Fairness via Confusional Spectral Regularization

Gaojie Jin, Sihao Wu, Jiaxu Liu et al.

NEURIPS 2025arXiv:2510.15455

#8993

CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

Gucongcong Fan, Chaoyue Niu, Chengfei Lyu et al.

NEURIPS 2025oralarXiv:2505.22854

#8994

CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

Kornel Howil, Joanna Waczynska, Piotr Borycki et al.

ICCV 2025arXiv:2506.07371

#8995

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025arXiv:2510.16641

#8996

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.

#8997

LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition

Chuanfu Shen, Rui Wang, Lixin Duan et al.

#8998

DualEqui: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules

Junjie Xu, Jiahao Zhang, Mangal Prakash et al.

NEURIPS 2025arXiv:2402.06674

#8999

Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning

Marlon Tobaben, Hibiki Ito, Joonas Jälkö et al.

NEURIPS 2025arXiv:2510.23607

#9000

Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Yujia Zhang, Xiaoyang Wu, Yixing Lao et al.