Most Cited 2025 "grasping motion generation" Papers

22,274 papers found • Page 47 of 112

Filters:Most Cited 2025 grasping motion generation Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#9201

SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization

Jianyu LAI, Sixiang Chen, yunlong lin et al.

CVPR 2025

citations

#9202

Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization

Jamie Wynn, Zawar Qureshi, Jakub Powierza et al.

CVPR 2025arXiv:2503.02009

citations

#9203

Efficient Motion-Aware Video MLLM

Zijia Zhao, Yuqi Huo, Tongtian Yue et al.

CVPR 2025highlightarXiv:2503.13016

citations

#9204

RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection

Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.

CVPR 2025arXiv:2504.09086

citations

#9205

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

Yang Yue, Yulin Wang, Chenxin Tao et al.

CVPR 2025arXiv:2504.13820

citations

#9206

Knowledge Bridger: Towards Training-Free Missing Modality Completion

Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.

CVPR 2025arXiv:2502.19834

citations

#9207

Multi-modal Medical Diagnosis via Large-small Model Collaboration

Wanyi Chen, Zihua Zhao, Jiangchao Yao et al.

CVPR 2025

citations

#9208

Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data

Haoxin Li, Boyang Li

CVPR 2025arXiv:2503.01167

citations

#9209

HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver

Cong Wei, Haoxian Tan, Yujie Zhong et al.

CVPR 2025

citations

#9210

Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras

Hoonhee Cho, Jae-Young Kang, Youngho Kim et al.

CVPR 2025highlightarXiv:2502.19630

citations

#9211

Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction

Huiwon Jang, Sihyun Yu, Jinwoo Shin et al.

CVPR 2025arXiv:2411.14762

citations

#9212

Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition

Yang Chen, Jingcai Guo, Song Guo et al.

CVPR 2025arXiv:2411.11288

citations

#9213

Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory

Han Hu, Wenli Du, Peng Liao et al.

CVPR 2025

citations

#9214

FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

Tongyuan Bai, Wangyuanfan Bai, Dong Chen et al.

CVPR 2025arXiv:2506.02781

citations

#9215

Dual-Agent Optimization framework for Cross-Domain Few-Shot Segmentation

Zhaoyang Li, Yuan Wang, Wangkai Li et al.

CVPR 2025

citations

#9216

Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting

Jingyi Xu, Xieyuanli Chen, Junyi Ma et al.

CVPR 2025arXiv:2411.14169

citations

#9217

IDEA-Bench: How Far are Generative Models from Professional Designing?

Chen Liang, Lianghua Huang, Jingwu Fang et al.

CVPR 2025arXiv:2412.11767

citations

#9218

Enhancing Dataset Distillation via Non-Critical Region Refinement

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2025arXiv:2503.18267

citations

#9219

AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation

Zeyi Xu, Jinfan Liu, Kuangxu Chen et al.

CVPR 2025arXiv:2503.10257

citations

#9220

Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics

Chen Liu, Liying Yang, Peike Li et al.

CVPR 2025arXiv:2503.12840

citations

#9221

Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception

Yuanchen Wu, Lu Zhang, Hang Yao et al.

CVPR 2025arXiv:2504.20468

citations

#9222

ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

Hao Yu, Tangyu Jiang, Shuning Jia et al.

CVPR 2025arXiv:2506.03737

citations

#9223

HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion

Yifang Xu, BenXiang Zhai, Yunzhuo Sun et al.

CVPR 2025arXiv:2512.14542

citations

#9224

ZeroVO: Visual Odometry with Minimal Assumptions

Lei Lai, Zekai Yin, Eshed Ohn-Bar

CVPR 2025arXiv:2506.08005

citations

#9225

GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning

Guangyan Chen, Te Cui, Meiling Wang et al.

CVPR 2025

citations

#9226

Continual SFT Matches Multimodal RLHF with Negative Supervision

Ke Zhu, Yu Wang, Yanpeng Sun et al.

CVPR 2025arXiv:2411.14797

citations

#9227

Exploring Contextual Attribute Density in Referring Expression Counting

Zhicheng Wang, Zhiyu Pan, Zhan Peng et al.

CVPR 2025arXiv:2503.12460

citations

#9228

UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning

Weiqi Yan, Lvhai Chen, Huaijia Kou et al.

CVPR 2025highlightarXiv:2506.07087

citations

#9229

Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning

Sherry X. Chen, Misha Sra, Pradeep Sen

CVPR 2025arXiv:2503.18406

citations

#9230

Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval

Boseung Jeong, Jicheol Park, Sungyeon Kim et al.

CVPR 2025arXiv:2504.02397

citations

#9231

UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming

Hao Lin, Ke Wu, Jie Li et al.

CVPR 2025arXiv:2307.16375

citations

#9232

SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization

Junchen Yu, Siyuan Cao, Runmin Zhang et al.

CVPR 2025highlightarXiv:2409.17993

citations

#9233

BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning

Hao Zhu, Yifei Zhang, Junhao Dong et al.

CVPR 2025

citations

#9234

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Yicheng Chen, Xiangtai Li, Yining Li et al.

CVPR 2025arXiv:2406.20085

citations

#9235

Enhanced then Progressive Fusion with View Graph for Multi-View Clustering

Zhibin Dong, Meng Liu, Siwei Wang et al.

CVPR 2025

citations

#9236

Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects

Shalini Maiti, Lourdes Agapito, Filippos Kokkinos

CVPR 2025arXiv:2504.08125

citations

#9237

Probing the Mid-level Vision Capabilities of Self-Supervised Learning

Xuweiyi Chen, Markus Marks, Zezhou Cheng

CVPR 2025arXiv:2411.17474

citations

#9238

On the Out-Of-Distribution Generalization of Large Multimodal Models

Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.

CVPR 2025

citations

#9239

Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining

Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao et al.

CVPR 2025arXiv:2503.18703

citations

#9240

ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge

Radu Berdan, Beril Besbinar, Christoph Reinders et al.

CVPR 2025arXiv:2503.03782

citations

#9241

NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery

Reese Kneeland, Paul Scotti, Ghislain St-Yves et al.

CVPR 2025highlightarXiv:2506.06898

citations

#9242

Diffusion Model is Effectively Its Own Teacher

Xinyin Ma, Runpeng Yu, Songhua Liu et al.

CVPR 2025

citations

#9243

Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing

Yanjun Li, Zhaoyang Li, Honghui Chen et al.

CVPR 2025arXiv:2503.00548

citations

#9244

Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Xiaoying Xing, Avinab Saha, Junfeng He et al.

CVPR 2025highlightarXiv:2501.06481

citations

#9245

DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image

Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh et al.

CVPR 2025arXiv:2503.19373

citations

#9246

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Rui Zhao, Weijia Mao, Mike Zheng Shou

CVPR 2025arXiv:2503.03651

citations

#9247

Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation

Songsong Duan, Xi Yang, Nannan Wang

CVPR 2025highlight

citations

#9248

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Kang Chen, Jiyuan Zhang, Zecheng Hao et al.

CVPR 2025highlightarXiv:2411.10504

citations

#9249

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025arXiv:2409.01071

citations

#9250

Generative Zoo

Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas Velasquez et al.

ICCV 2025arXiv:2412.08101

citations

#9251

Dynamic Multimodal Prototype Learning in Vision-Language Models

Xingyu Zhu, Shuo Wang, Beier Zhu et al.

ICCV 2025arXiv:2507.03657

citations

#9252

DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.

ICCV 2025arXiv:2504.13206

citations

#9253

Multi-View 3D Point Tracking

Frano Rajič, Haofei Xu, Marko Mihajlovic et al.

ICCV 2025arXiv:2508.21060

citations

#9254

Learning to Inference Adaptively for Multimodal Large Language Models

Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.

ICCV 2025arXiv:2503.10905

citations

#9255

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Gaurav Patel, Qiang Qiu

ICCV 2025arXiv:2503.06339

citations

#9256

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Jie Xu, Na Zhao, Gang Niu et al.

ICCV 2025arXiv:2503.04151

citations

#9257

TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction

Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.

ICCV 2025arXiv:2508.04682

citations

#9258

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Shaofeng Yin, Ting Lei, Yang Liu

ICCV 2025arXiv:2508.03284

citations

#9259

CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee, Jiwan Seo, Kiljoon Han et al.

ICCV 2025arXiv:2407.03010

citations

#9260

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

Siyu Jiao, Haoye Dong, Yuyang Yin et al.

ICCV 2025arXiv:2412.19142

citations

#9261

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

Qifan Yu, Zhebei Shen, Zhongqi Yue et al.

ICCV 2025highlightarXiv:2412.06293

citations

#9262

SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning

Ziqi Wang, Chang Che, Qi Wang et al.

ICCV 2025arXiv:2411.13949

citations

#9263

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.

ICCV 2025arXiv:2504.00502

citations

#9264

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.

ICCV 2025arXiv:2510.16641

citations

#9265

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Wenwen Yu, Zhibo Yang, Yuliang Liu et al.

ICCV 2025arXiv:2508.08589

citations

#9266

VGGSounder: Audio-Visual Evaluations for Foundation Models

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.

ICCV 2025arXiv:2508.08237

citations

#9267

Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product

Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.

ICCV 2025arXiv:2508.00230

citations

#9268

SceneMI: Motion In-betweening for Modeling Human-Scene Interaction

Inwoo Hwang, Bing Zhou, Young Min Kim et al.

ICCV 2025highlightarXiv:2503.16289

citations

#9269

Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving

Hao Zhou, Zhanning Gao, Zhili Chen et al.

ICCV 2025arXiv:2411.13076

citations

#9270

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

Tianyi Zhao, Boyang Liu, Yanglei Gao et al.

ICCV 2025arXiv:2503.11780

citations

#9271

Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations

Jianhua Sun, Yuxuan Li, Jiude Wei et al.

ICCV 2025arXiv:2412.14974

citations

#9272

X-Capture: An Open-Source Portable Device for Multi-Sensory Learning

Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.

ICCV 2025arXiv:2504.02318

citations

#9273

Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos

Changwoon Choi, Jeongjun Kim, Geonho Cha et al.

ICCV 2025arXiv:2412.19089

citations

#9274

IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

Wenxuan Guo, Xiuwei Xu, Hang Yin et al.

ICCV 2025arXiv:2508.00823

citations

#9275

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.

ICCV 2025arXiv:2507.23567

citations

#9276

VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking

Zekun Qian, Ruize Han, Junhui Hou et al.

ICCV 2025

citations

#9277

Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models

Mateusz Michalkiewicz, Xinyue Bai, Mahsa Baktashmotlagh et al.

ICCV 2025arXiv:2412.19920

citations

#9278

CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

Arindam Dutta, Meng Zheng, Zhongpai Gao et al.

ICCV 2025highlightarXiv:2503.15671

citations

#9279

Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics

Zhirui Gao, Renjiao Yi, Yuhang Huang et al.

ICCV 2025arXiv:2408.10789

citations

#9280

UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation

Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.

ICCV 2025arXiv:2508.01126

citations

#9281

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

shanlin sun, Yifan Wang, Hanwen Zhang et al.

ICCV 2025arXiv:2508.14461

citations

#9282

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Xudong Li, Zihao Huang, Yan Zhang et al.

ICCV 2025arXiv:2409.05381

citations

#9283

Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising

Sébastien Herbreteau, Michael Unser

ICCV 2025arXiv:2407.17399

citations

#9284

EAMamba: Efficient All-Around Vision State Space Model for Image Restoration

Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen et al.

ICCV 2025arXiv:2506.22246

citations

#9285

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.

ICCV 2025arXiv:2508.19649

citations

#9286

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.

ICCV 2025arXiv:2501.01425

citations

#9287

VertexRegen: Mesh Generation with Continuous Level of Detail

Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.

ICCV 2025arXiv:2508.09062

citations

#9288

I2V3D: Controllable Image-to-video Generation with 3D Guidance

Zhiyuan Zhang, Dongdong Chen, Jing Liao

ICCV 2025arXiv:2503.09733

citations

#9289

Controllable Weather Synthesis and Removal with Video Diffusion Models

Chih-Hao Lin, Zian Wang, Ruofan Liang et al.

ICCV 2025arXiv:2505.00704

citations

#9290

Sequential Gaussian Avatars with Hierarchical Motion Context

Wangze Xu, Yifan Zhan, Zhihang Zhong et al.

ICCV 2025arXiv:2411.16768

citations

#9291

iManip: Skill-Incremental Learning for Robotic Manipulation

Zexin Zheng, Jia-Feng Cai, Xiao-Ming Wu et al.

ICCV 2025arXiv:2503.07087

citations

#9292

Morph: A Motion-free Physics Optimization Framework for Human Motion Generation

Zhuo Li, Mingshuang Luo, RuiBing Hou et al.

ICCV 2025arXiv:2411.14951

citations

#9293

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Gaoyang Zhang, Bingtao Fu, Qingnan Fan et al.

ICCV 2025arXiv:2412.13195

citations

#9294

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation

Zhongyu Yang, Jun Chen, Dannong Xu et al.

ICCV 2025arXiv:2503.19065

citations

#9295

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

Aniket Rege, Zinnia Nie, Unmesh Raskar et al.

ICCV 2025arXiv:2506.08071

citations

#9296

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Tiange Xiang, Kai Li, Chengjiang Long et al.

ICCV 2025arXiv:2503.15877

citations

#9297

From Image to Video: An Empirical Study of Diffusion Representations

Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.

ICCV 2025highlightarXiv:2502.07001

citations

#9298

Balanced Image Stylization with Style Matching Score

Yuxin Jiang, Liming Jiang, Shuai Yang et al.

ICCV 2025arXiv:2503.07601

citations

#9299

LayerD: Decomposing Raster Graphic Designs into Layers

Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.

ICCV 2025arXiv:2509.25134

citations

#9300

Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

Taihang Hu, Linxuan Li, Kai Wang et al.

ICCV 2025arXiv:2504.10434

citations

#9301

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang, Cong Han, Yafei Li et al.

ICCV 2025arXiv:2507.00992

citations

#9302

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang et al.

ICCV 2025highlightarXiv:2503.23925

citations

#9303

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025arXiv:2412.01250

citations

#9304

BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation

Ruotong Wang, Mingli Zhu, Jiarong Ou et al.

ICCV 2025arXiv:2504.16907

citations

#9305

Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization

Li, Yang Xiao, Jie Ji et al.

ICCV 2025arXiv:2504.09039

citations

#9306

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.

ICCV 2025highlightarXiv:2504.01009

citations

#9307

Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates

Kecheng Chen, Xinyu Luo, Tiexin Qin et al.

ICCV 2025highlightarXiv:2504.02008

citations

#9308

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025arXiv:2506.07371

citations

#9309

OuroMamba: A Data-Free Quantization Framework for Vision Mamba

Akshat Ramachandran, Mingyu Lee, Huan Xu et al.

ICCV 2025arXiv:2503.10959

citations

#9310

CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation

Leon Sick, Dominik Engel, Sebastian Hartwig et al.

ICCV 2025arXiv:2411.16319

citations

#9311

Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations

Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.

ICCV 2025arXiv:2412.03215

citations

#9312

Stable Diffusion Models are Secretly Good at Visual In-Context Learning

Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara et al.

ICCV 2025arXiv:2508.09949

citations

#9313

Auto-Vocabulary Semantic Segmentation

Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.

ICCV 2025arXiv:2312.04539

citations

#9314

PRM: Photometric Stereo based Large Reconstruction Model

Wenhang Ge, Jiantao Lin, Guibao SHEN et al.

ICCV 2025highlightarXiv:2412.07371

citations

#9315

Neural Shell Texture Splatting: More Details and Fewer Primitives

Xin Zhang, Anpei Chen, Jincheng Xiong et al.

ICCV 2025arXiv:2507.20200

citations

#9316

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.

ICCV 2025arXiv:2505.00482

citations

#9317

DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model

Rui Yu, Xianghang Zhang, Runkai Zhao et al.

ICCV 2025arXiv:2508.05402

citations

#9318

GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors

Kang DU, Zhihao Liang, Yulin Shen et al.

ICCV 2025arXiv:2408.08524

citations

#9319

ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models

Shadi Hamdan, Chonghao Sima, Zetong Yang et al.

ICCV 2025arXiv:2506.07725

citations

#9320

Occupancy Learning with Spatiotemporal Memory

Ziyang Leng, Jiawei Yang, Wenlong Yi et al.

ICCV 2025arXiv:2508.04705

citations

#9321

Inverse 3D Microscopy Rendering for Cell Shape Inference with Active Mesh

Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary et al.

ICCV 2025highlightarXiv:2303.10440

citations

#9322

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Kangan Qian, Jinyu Miao, Xinyu Jiao et al.

ICCV 2025

citations

#9323

BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment

Tongfan Guan, Jiaxin Guo, Chen Wang et al.

ICCV 2025highlightarXiv:2508.04611

citations

#9324

LightSwitch: Multi-view Relighting with Material-guided Diffusion

Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani

ICCV 2025arXiv:2508.06494

citations

#9325

QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization

Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner et al.

ICCV 2025arXiv:2505.05591

citations

#9326

SP2T: Sparse Proxy Attention for Dual-stream Point Transformer

Jiaxu Wan, Hong Zhang, Ziqi He et al.

ICCV 2025

citations

#9327

Controllable 3D Outdoor Scene Generation via Scene Graphs

Yuheng Liu, Xinke Li, Yuning Zhang et al.

ICCV 2025arXiv:2503.07152

citations

#9328

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Andreas Engelhardt, Mark Boss, Vikram Voleti et al.

ICCV 2025arXiv:2510.08271

citations

#9329

SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies

Liang Han, Xu Zhang, Haichuan Song et al.

ICCV 2025arXiv:2508.00366

citations

#9330

SAM4D: Segment Anything in Camera and LiDAR Streams

Jianyun Xu, Song Wang, Ziqian Ni et al.

ICCV 2025arXiv:2506.21547

citations

#9331

Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description

Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.

ICCV 2025arXiv:2412.01398

citations

#9332

What You Have is What You Track: Adaptive and Robust Multimodal Tracking

Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.

ICCV 2025arXiv:2507.05899

citations

#9333

Learning Streaming Video Representation via Multitask Training

Yibin Yan, Jilan Xu, Shangzhe Di et al.

ICCV 2025arXiv:2504.20041

citations

#9334

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi

ICCV 2025arXiv:2504.06908

citations

#9335

TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models

Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.

ICCV 2025arXiv:2503.15283

citations

#9336

Video Motion Graphs

Haiyang Liu, Zhan Xu, Fating Hong et al.

ICCV 2025highlightarXiv:2503.20218

citations

#9337

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

Li Huaqiu, Yong Wang, Tongwen Huang et al.

ICCV 2025arXiv:2507.00790

citations

#9338

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

Peng Zheng, Junke Wang, Yi Chang et al.

ICCV 2025arXiv:2507.01756

citations

#9339

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection

Yingjian Chen, Lei Zhang, Yakun Niu

ICCV 2025arXiv:2408.13697

citations

#9340

FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing

Bizhu Wu, Jinheng Xie, Meidan Ding et al.

ICCV 2025arXiv:2507.19850

citations

#9341

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.

ICCV 2025arXiv:2411.16789

citations

#9342

ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery

Yanzhe Lyu, Kai Cheng, Kang Xin et al.

ICCV 2025arXiv:2412.07494

citations

#9343

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

ICCV 2025highlightarXiv:2411.15867

citations

#9344

MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos

Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.

ICCV 2025arXiv:2505.11868

citations

#9345

TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images

Tu Bui, Shruti Agarwal, John Collomosse

ICCV 2025

citations

#9346

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

Dadong Jiang, Zhi Hou, Zhihui Ke et al.

ICCV 2025arXiv:2411.11941

citations

#9347

Region-based Cluster Discrimination for Visual Representation Learning

Yin Xie, Kaicheng Yang, Xiang An et al.

ICCV 2025highlightarXiv:2507.20025

citations

#9348

Acknowledging Focus Ambiguity in Visual Questions

Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.

ICCV 2025arXiv:2501.02201

citations

#9349

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering

Shiyong Liu, Xiao Tang, Zhihao Li et al.

ICCV 2025arXiv:2503.16177

citations

#9350

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

Chende Zheng, Ruiqi suo, Chenhao Lin et al.

ICCV 2025arXiv:2508.00701

citations

#9351

MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion

Zihan Wang, Jeff Tan, Tarasha Khurana et al.

ICCV 2025arXiv:2507.23782

citations

#9352

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

Xingyu Miao, Haoran Duan, Quanhao Qian et al.

ICCV 2025highlightarXiv:2507.18678

citations

#9353

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Jun Li, Jinpeng Wang, Chaolei Tan et al.

ICCV 2025arXiv:2507.17402

citations

#9354

Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information

Junbo Zhao, Ting Zhang, Jiayu Sun et al.

ICCV 2025arXiv:2503.05543

citations

#9355

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

Kelin Yu, Sheng Zhang, Harshit Soora et al.

ICCV 2025arXiv:2508.11049

citations

#9356

ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

Ronggang Huang, Haoxin Yang, Yan Cai et al.

ICCV 2025arXiv:2507.11261

citations

#9357

Boosting Multimodal Learning via Disentangled Gradient Learning

Shicai Wei, Chunbo Luo, Yang Luo

ICCV 2025arXiv:2507.10213

citations

#9358

HORT: Monocular Hand-held Objects Reconstruction with Transformers

Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.

ICCV 2025arXiv:2503.21313

citations

#9359

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis

Timo Teufel, xilong zhou, Umar Iqbal et al.

ICCV 2025arXiv:2508.09137

citations

#9360

GAP: Gaussianize Any Point Clouds with Text Guidance

Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.

ICCV 2025arXiv:2508.05631

citations

#9361

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation

Wenjia Wang, Liang Pan, Zhiyang Dou et al.

ICCV 2025arXiv:2411.19921

citations

#9362

BokehDiff: Neural Lens Blur with One-Step Diffusion

Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.

ICCV 2025arXiv:2507.18060

citations

#9363

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection

Yupeng Hu, Changxing Ding, Chang Sun et al.

ICCV 2025arXiv:2507.06510

citations

#9364

A Token-level Text Image Foundation Model for Document Understanding

Tongkun Guan, Zining Wang, Pei Fu et al.

ICCV 2025arXiv:2503.02304

citations

#9365

Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo

ICCV 2025arXiv:2503.19914

citations

#9366

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration

Zhehui Wu, Yong Chen, Naoto Yokoya et al.

ICCV 2025arXiv:2503.09131

citations

#9367

Learning to Generalize without Bias for Open-Vocabulary Action Recognition

Yating Yu, Congqi Cao, Yifan Zhang et al.

ICCV 2025highlightarXiv:2502.20158

citations

#9368

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Chenwei Lin, Hanjia Lyu, Xian Xu et al.

ICCV 2025arXiv:2406.09105

citations

#9369

StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold

Zhizhong Li, Sina Sajadmanesh, Jingtao Li et al.

NEURIPS 2025spotlightarXiv:2510.01938

citations

#9370

Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation

Mehrdad Noori, David OSOWIECHI, Gustavo Vargas Hakim et al.

NEURIPS 2025arXiv:2505.21844

citations

#9371

The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation

Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand et al.

NEURIPS 2025arXiv:2505.15807

citations

#9372

Compressed and Smooth Latent Space for Text Diffusion Modeling

Viacheslav Meshchaninov, Egor Chimbulatov, Alexander Shabalin et al.

NEURIPS 2025arXiv:2506.21170

citations

#9373

$\texttt{STRCMP}$: Integrating Graph Structural Priors with Language Models for Combinatorial Optimization

Xijun Li, Jiexiang Yang, Jinghao Wang et al.

NEURIPS 2025

citations

#9374

Light-Weight Diffusion Multiplier and Uncertainty Quantification for Fourier Neural Operators

Albert Matveev, Sanmitra Ghosh, Aamal Hussain et al.

NEURIPS 2025spotlightarXiv:2508.00643

citations

#9375

Statistical inference for Linear Stochastic Approximation with Markovian Noise

Sergey Samsonov, Marina Sheshukova, Eric Moulines et al.

NEURIPS 2025arXiv:2505.19102

citations

#9376

GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning

Haonan Yuan, Qingyun Sun, Junhua Shi et al.

NEURIPS 2025arXiv:2511.05592

citations

#9377

Incentivizing LLMs to Self-Verify Their Answers

Fuxiang Zhang, Jiacheng Xu, Chaojie Wang et al.

NEURIPS 2025arXiv:2506.01369

citations

#9378

Stop the Nonconsensual Use of Nude Images in Research

Princessa Cintaqia, Arshia Arya, Elissa Redmiles et al.

NEURIPS 2025oralarXiv:2510.22423

citations

#9379

Lost in Transmission: When and Why LLMs Fail to Reason Globally

Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan et al.

NEURIPS 2025spotlightarXiv:2505.08140

citations

#9380

A Simple Linear Patch Revives Layer-Pruned Large Language Models

Xinrui Chen, Haoli Bai, Tao Yuan et al.

NEURIPS 2025arXiv:2505.24680

citations

#9381

DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding

Yunhai Hu, Tianhua Xia, Zining Liu et al.

NEURIPS 2025arXiv:2505.19201

citations

#9382

Position: Towards Bidirectional Human-AI Alignment

Hua Shen, Tiffany Knearem, Reshmi Ghosh et al.

NEURIPS 2025oralarXiv:2406.09264

citations

#9383

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Weipeng Zhong, Peizhou Cao, Yichen Jin et al.

NEURIPS 2025arXiv:2509.10813

citations

#9384

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby et al.

NEURIPS 2025arXiv:2505.20033

citations

#9385

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj et al.

NEURIPS 2025arXiv:2505.15952

citations

#9386

Towards A Generalist Code Embedding Model Based On Massive Data Synthesis

Chaofan Li, Jianlyu Chen, Yingxia Shao et al.

NEURIPS 2025arXiv:2505.12697

citations

#9387

AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

Yunhao Hou, Bochao Zou, Min Zhang et al.

NEURIPS 2025oralarXiv:2506.16371

citations

#9388

QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design

Rui Yang, Ziruo Wang, Yuntian Gu et al.

NEURIPS 2025arXiv:2410.07961

citations

#9389

Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

Tien Nguyen, Dac Nguyen, Duc Nguyen The Minh et al.

NEURIPS 2025arXiv:2509.24739

citations

#9390

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.

NEURIPS 2025oralarXiv:2506.06218

citations

#9391

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Yunlong Tang, Pinxin Liu, Mingqian Feng et al.

NEURIPS 2025arXiv:2505.20426

citations

#9392

CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing

Guozhen Zhu, Yuqian Hu, Weihang Gao et al.

NEURIPS 2025arXiv:2505.21866

citations

#9393

In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting

Taiying Peng, Jiacheng Hua, Miao Liu et al.

NEURIPS 2025oralarXiv:2509.07447

citations

#9394

Identifiability of Deep Polynomial Neural Networks

Konstantin Usevich, Ricardo Borsoi, Clara Dérand et al.

NEURIPS 2025oralarXiv:2506.17093

citations

#9395

All that structure matches does not glitter

Maya Martirossyan, Thomas Egg, Philipp Höllmer et al.

NEURIPS 2025arXiv:2509.12178

citations

#9396

FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

Yan Gao, Massimo R. Scamarcia, Javier Fernandez-Marques et al.

NEURIPS 2025arXiv:2506.02961

citations

#9397

C-SEO Bench: Does Conversational SEO Work?

Haritz Puerto, Martin Gubri, Tommaso Green et al.

NEURIPS 2025arXiv:2506.11097

citations

#9398

3EED: Ground Everything Everywhere in 3D

Rong Li, Yuhao Dong, Tianshuai Hu et al.

NEURIPS 2025arXiv:2511.01755

citations

#9399

EconGym: A Scalable AI Testbed with Diverse Economic Tasks

Qirui Mi, Qipeng Yang, Zijun Fan et al.

NEURIPS 2025arXiv:2506.12110

citations

#9400

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Boyi Wei, Benedikt Stroebl, Jiacen Xu et al.

NEURIPS 2025arXiv:2505.18384

citations

← Previous

1...45 46 47 48 49...112