Most Cited CVPR "robotics control" Papers

5,589 papers found • Page 4 of 28

#601

Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation

feilong tang, Zhongxing Xu, Zhaojun QU et al.

CVPR 2024posterarXiv:2403.07630
29
citations
#602

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Geng Li, Jinglin Xu, Yunzhen Zhao et al.

CVPR 2025highlightarXiv:2504.14920
28
citations
#603

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning

Hanxun Yu, Wentong Li, Song Wang et al.

CVPR 2025highlightarXiv:2503.00513
28
citations
#604

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025posterarXiv:2501.09781
28
citations
#605

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Hongjie Wang, Difan Liu, Yan Kang et al.

CVPR 2024posterarXiv:2405.05252
28
citations
#606

Single Domain Generalization for Crowd Counting

Zhuoxuan Peng, S.-H. Gary Chan

CVPR 2024posterarXiv:2403.09124
28
citations
#607

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025highlightarXiv:2502.20653
28
citations
#608

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

Jinseok Kim, Tae-Kyun Kim

CVPR 2024posterarXiv:2403.10255
28
citations
#609

Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis

Sunghwan Hong, Jaewoo Jung, Heeseong Shin et al.

CVPR 2024highlight
28
citations
#610

Retrieval-Augmented Embodied Agents

Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.

CVPR 2024posterarXiv:2404.11699
28
citations
#611

Contextrast: Contextual Contrastive Learning for Semantic Segmentation

Changki Sung, Wanhee Kim, Jungho An et al.

CVPR 2024posterarXiv:2404.10633
28
citations
#612

Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts

Jiayi Chen, Benteng Ma, Hengfei Cui et al.

CVPR 2024posterarXiv:2312.02567
28
citations
#613

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Lihe Ding, Shaocong Dong, Zhanpeng Huang et al.

CVPR 2024posterarXiv:2312.04963
28
citations
#614

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

Chengfeng Zhao, Juze Zhang, Jiashen Du et al.

CVPR 2024posterarXiv:2312.08869
28
citations
#615

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

Zanlin Ni, Yulin Wang, Renping Zhou et al.

CVPR 2024posterarXiv:2406.05478
28
citations
#616

Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection

Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker et al.

CVPR 2024posterarXiv:2404.01819
28
citations
#617

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri et al.

CVPR 2024posterarXiv:2403.07234
28
citations
#618

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Xiangheng Shan, Dongyue Wu, Guilin Zhu et al.

CVPR 2024posterarXiv:2406.09829
28
citations
#619

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

Hao Li, Dingwen Zhang, Yalun Dai et al.

CVPR 2024highlightarXiv:2311.11863
28
citations
#620

A Simple Baseline for Efficient Hand Mesh Reconstruction

zhishan zhou, shihao zhou, Zhi Lv et al.

CVPR 2024posterarXiv:2403.01813
28
citations
#621

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.

CVPR 2025posterarXiv:2503.17673
28
citations
#622

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Xu He, Qiaochu Huang, Zhensong Zhang et al.

CVPR 2024posterarXiv:2404.01862
28
citations
#623

LaneCPP: Continuous 3D Lane Detection using Physical Priors

Maximilian Pittner, Joel Janai, Alexandru Paul Condurache

CVPR 2024posterarXiv:2406.08381
28
citations
#624

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Sheng Zhou, Junbin Xiao, Qingyun Li et al.

CVPR 2025posterarXiv:2502.07411
28
citations
#625

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Zhejun Zhang, Peter Karkus, Maximilian Igl et al.

CVPR 2025posterarXiv:2412.05334
28
citations
#626

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari et al.

CVPR 2024posterarXiv:2402.19302
28
citations
#627

Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection

Wei Luo, Yunkang Cao, Haiming Yao et al.

CVPR 2025posterarXiv:2503.02424
28
citations
#628

VideoCon: Robust Video-Language Alignment via Contrast Captions

Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.

CVPR 2024posterarXiv:2311.10111
28
citations
#629

3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation

Dale Decatur, Itai Lang, Kfir Aberman et al.

CVPR 2024posterarXiv:2311.09571
27
citations
#630

Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Laura Leal-Taixe

CVPR 2025highlightarXiv:2501.14914
27
citations
#631

Blind Image Quality Assessment Based on Geometric Order Learning

Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim

CVPR 2024poster
27
citations
#632

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

CVPR 2025highlightarXiv:2410.03665
27
citations
#633

Multi-modal Learning for Geospatial Vegetation Forecasting

Vitus Benson, Claire Robin, Christian Requena-Mesa et al.

CVPR 2024posterarXiv:2303.16198
27
citations
#634

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

Haiwen Diao, Bo Wan, Ying Zhang et al.

CVPR 2024posterarXiv:2308.14316
27
citations
#635

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

Cheng Sun, Jaesung Choe, Charles Loop et al.

CVPR 2025posterarXiv:2412.04459
27
citations
#636

FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

Geunhyuk Youk, Jihyong Oh, Munchurl Kim

CVPR 2024posterarXiv:2401.03707
27
citations
#637

Audio-Visual Segmentation via Unlabeled Frame Exploitation

Jinxiang Liu, Yikun Liu, Ferenas et al.

CVPR 2024posterarXiv:2403.11074
27
citations
#638

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

Beomyoung Kim, Joonsang Yu, Sung Ju Hwang

CVPR 2024posterarXiv:2403.20126
27
citations
#639

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.

CVPR 2024posterarXiv:2312.11666
27
citations
#640

HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding

Trong-Thuan Nguyen, Pha Nguyen, Khoa Luu

CVPR 2024posterarXiv:2312.03050
27
citations
#641

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Xiangyang Zhu, Renrui Zhang, Bowei He et al.

CVPR 2024highlightarXiv:2404.04050
27
citations
#642

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Mingzhen Sun, Weining Wang, Li et al.

CVPR 2025posterarXiv:2503.07418
27
citations
#643

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025posterarXiv:2501.09757
27
citations
#644

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding

Le Zhang, Rabiul Awal, Aishwarya Agrawal

CVPR 2024posterarXiv:2306.08832
27
citations
#645

Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning

HAO ZHANG, Linfeng Tang, Xinyu Xiang et al.

CVPR 2024poster
27
citations
#646

Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification

Kunlun Xu, Xu Zou, Yuxin Peng et al.

CVPR 2024poster
27
citations
#647

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

Ziyue Feng, Huangying Zhan, Zheng Chen et al.

CVPR 2024posterarXiv:2402.18771
27
citations
#648

Sparse Global Matching for Video Frame Interpolation with Large Motion

Chunxu Liu, Guozhen Zhang, Rui Zhao et al.

CVPR 2024posterarXiv:2404.06913
27
citations
#649

The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding

Lorenzo Bianchi, Fabio Carrara, Nicola Messina et al.

CVPR 2024highlightarXiv:2311.17518
26
citations
#650

Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations

Rui Zhao, Ruiqin Xiong, Jing Zhao et al.

CVPR 2024poster
26
citations
#651

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat et al.

CVPR 2025posterarXiv:2401.05779
26
citations
#652

CPR: Retrieval Augmented Generation for Copyright Protection

Aditya Golatkar, Alessandro Achille, Luca Zancato et al.

CVPR 2024posterarXiv:2403.18920
26
citations
#653

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025posterarXiv:2503.21483
26
citations
#654

ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining

Ruoxi Shi, Xinyue Wei, Cheng Wang et al.

CVPR 2024posterarXiv:2312.09249
26
citations
#655

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Xinshun Wang, Zhongbin Fang, Xia Li et al.

CVPR 2024posterarXiv:2312.03703
26
citations
#656

TEA: Test-time Energy Adaptation

Yige Yuan, Bingbing Xu, Liang Hou et al.

CVPR 2024posterarXiv:2311.14402
26
citations
#657

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Jinjin Zhang, qiuyu Huang, Junjie Liu et al.

CVPR 2025posterarXiv:2503.18352
26
citations
#658

DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis

Yuming Gu, Hongyi Xu, You Xie et al.

CVPR 2024highlightarXiv:2312.13016
26
citations
#659

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025posterarXiv:2407.15811
26
citations
#660

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

CVPR 2025posterarXiv:2412.01169
26
citations
#661

MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior

Honghua Chen, Chen Change Loy, Xingang Pan

CVPR 2024posterarXiv:2405.02859
26
citations
#662

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Ying Chen, Guoan Wang, Yuanfeng Ji et al.

CVPR 2025posterarXiv:2410.11761
26
citations
#663

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.

CVPR 2024posterarXiv:2312.01409
26
citations
#664

Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion

Zuoyue Li, Zhenqiang Li, Zhaopeng Cui et al.

CVPR 2024highlightarXiv:2401.10786
26
citations
#665

PhysGen3D: Crafting a Miniature Interactive World from a Single Image

Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.

CVPR 2025posterarXiv:2503.20746
26
citations
#666

M&M VTO: Multi-Garment Virtual Try-On and Editing

Luyang Zhu, Yingwei Li, Nan Liu et al.

CVPR 2024highlightarXiv:2406.04542
26
citations
#667

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Lue Fan, Hao ZHANG, Qitai Wang et al.

CVPR 2025posterarXiv:2412.03566
25
citations
#668

Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation

Qiyuan Dai, Sibei Yang

CVPR 2024posterarXiv:2404.11998
25
citations
#669

SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

Junyan Ye, Qiyan Luo, Jinhua Yu et al.

CVPR 2024highlightarXiv:2404.02638
25
citations
#670

Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation

Jonas Herzog

CVPR 2024posterarXiv:2402.17614
25
citations
#671

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025posterarXiv:2503.01210
25
citations
#672

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.

CVPR 2025posterarXiv:2504.09795
25
citations
#673

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025posterarXiv:2501.12612
25
citations
#674

Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking

Wei Cao, Chang Luo, Biao Zhang et al.

CVPR 2024posterarXiv:2401.06614
25
citations
#675

MoDE: CLIP Data Experts via Clustering

Jiawei Ma, Po-Yao Huang, Saining Xie et al.

CVPR 2024posterarXiv:2404.16030
25
citations
#676

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Siyuan Liang, Jiawei Liang, Tianyu Pang et al.

CVPR 2025posterarXiv:2406.18844
25
citations
#677

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Bin Chen, Gehui Li, Rongyuan Wu et al.

CVPR 2025posterarXiv:2411.13383
25
citations
#678

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang

CVPR 2024posterarXiv:2406.09402
25
citations
#679

Doubly Abductive Counterfactual Inference for Text-based Image Editing

Xue Song, Jiequan Cui, Hanwang Zhang et al.

CVPR 2024posterarXiv:2403.02981
25
citations
#680

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2025posterarXiv:2411.15241
25
citations
#681

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025posterarXiv:2406.19353
25
citations
#682

Federated Generalized Category Discovery

Nan Pu, Wenjing Li, Xinyuan Ji et al.

CVPR 2024posterarXiv:2305.14107
25
citations
#683

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.

CVPR 2025highlightarXiv:2412.15214
25
citations
#684

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025highlightarXiv:2412.19238
25
citations
#685

AutoPresent: Designing Structured Visuals from Scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.

CVPR 2025posterarXiv:2501.00912
25
citations
#686

WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights

Youngdong Jang, Dong In Lee, MinHyuk Jang et al.

CVPR 2024posterarXiv:2405.02066
25
citations
#687

Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

Zhuowan Li, Bhavan Jasani, Peng Tang et al.

CVPR 2024posterarXiv:2403.16385
25
citations
#688

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Hongyan Zhi, Peihao Chen, Junyan Li et al.

CVPR 2025posterarXiv:2412.01292
25
citations
#689

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Chengyao Wang, Li Jiang, Xiaoyang Wu et al.

CVPR 2024posterarXiv:2403.09639
25
citations
#690

Learning Correlation Structures for Vision Transformers

Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid et al.

CVPR 2024posterarXiv:2404.03924
25
citations
#691

Frequency Dynamic Convolution for Dense Image Prediction

Linwei Chen, Lin Gu, Liang Li et al.

CVPR 2025posterarXiv:2503.18783
25
citations
#692

Multi-Object Tracking in the Dark

Xinzhe Wang, Kang Ma, Qiankun Liu et al.

CVPR 2024posterarXiv:2405.06600
25
citations
#693

Interleaved-Modal Chain-of-Thought

Jun Gao, Yongqi Li, Ziqiang Cao et al.

CVPR 2025posterarXiv:2411.19488
25
citations
#694

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025posterarXiv:2411.17820
25
citations
#695

Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities

AJ Piergiovanni, Isaac Noble, Dahun Kim et al.

CVPR 2024posterarXiv:2311.05698
25
citations
#696

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye et al.

CVPR 2025posterarXiv:2411.12814
25
citations
#697

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025posterarXiv:2411.09703
25
citations
#698

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu, Yihang Zhu, Yunao Huang et al.

CVPR 2025posterarXiv:2412.03142
25
citations
#699

Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts

Fei Ni, Jianye Hao, Shiguang Wu et al.

CVPR 2024poster
25
citations
#700

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.

CVPR 2025posterarXiv:2501.06693
25
citations
#701

Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation

Yuan Wang, Rui Sun, Naisong Luo et al.

CVPR 2024posterarXiv:2404.00262
25
citations
#702

DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting

Hyunwoo Park, Gun Ryu, Wonjun Kim

CVPR 2025posterarXiv:2504.00773
25
citations
#703

Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation

Jihyun Kim, Changjae Oh, Hoseok Do et al.

CVPR 2024posterarXiv:2405.04356
24
citations
#704

Tyche: Stochastic In-Context Learning for Medical Image Segmentation

Marianne Rakic, Hallee Wong, Jose Javier Gonzalez Ortiz et al.

CVPR 2024highlightarXiv:2401.13650
24
citations
#705

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Hao Chen, Yuqi Hou, Chenyuan Qu et al.

CVPR 2024posterarXiv:2404.00989
24
citations
#706

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.

CVPR 2024posterarXiv:2405.14497
24
citations
#707

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025posterarXiv:2411.17221
24
citations
#708

Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking

Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu

CVPR 2024poster
24
citations
#709

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025posterarXiv:2503.20188
24
citations
#710

Training-Free Pretrained Model Merging

Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.

CVPR 2024posterarXiv:2403.01753
24
citations
#711

AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Yuwei Tang, ZhenYi Lin, Qilong Wang et al.

CVPR 2024posterarXiv:2404.08958
24
citations
#712

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108
24
citations
#713

Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes

Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das et al.

CVPR 2024posterarXiv:2312.04043
24
citations
#714

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Kyungmin Lee, Xiaohang Li, Qifei Wang et al.

CVPR 2025posterarXiv:2502.02588
24
citations
#715

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

Ce Zhang, Simon Stepputtis, Joseph Campbell et al.

CVPR 2024posterarXiv:2403.12033
24
citations
#716

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

Fengxiang Wang, hongzhen wang, Zonghao Guo et al.

CVPR 2025highlightarXiv:2503.23771
24
citations
#717

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

CVPR 2025posterarXiv:2404.15611
24
citations
#718

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

Zijie Chen, Lichao Zhang, Fangsheng Weng et al.

CVPR 2024posterarXiv:2310.08129
24
citations
#719

Diffusion Time-step Curriculum for One Image to 3D Generation

YI Xuanyu, Zike Wu, Qingshan Xu et al.

CVPR 2024posterarXiv:2404.04562
24
citations
#720

Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment

Ziyu Shan, Yujie Zhang, Qi Yang et al.

CVPR 2024posterarXiv:2403.10066
24
citations
#721

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi et al.

CVPR 2025posterarXiv:2503.01113
24
citations
#722

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

CVPR 2024poster
24
citations
#723

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025posterarXiv:2411.10836
24
citations
#724

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang et al.

CVPR 2024posterarXiv:2401.08739
24
citations
#725

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

CVPR 2025posterarXiv:2405.17403
24
citations
#726

Supervised Anomaly Detection for Complex Industrial Images

Aimira Baitieva, David Hurych, Victor Besnier et al.

CVPR 2024posterarXiv:2405.04953
24
citations
#727

Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation

Xianghui Xie, Bharat Lal Bhatnagar, Jan Lenssen et al.

CVPR 2024highlightarXiv:2312.07063
24
citations
#728

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

Shunlin Lu, Jingbo Wang, Zeyu Lu et al.

CVPR 2025posterarXiv:2412.14559
24
citations
#729

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel

CVPR 2024poster
24
citations
#730

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

Duojun Huang, Xinyu Xiong, Jie Ma et al.

CVPR 2024posterarXiv:2406.00480
24
citations
#731

Would Deep Generative Models Amplify Bias in Future Models?

Tianwei Chen, Yusuke Hirota, Mayu Otani et al.

CVPR 2024posterarXiv:2404.03242
23
citations
#732

Deep Equilibrium Diffusion Restoration with Parallel Sampling

Jiezhang Cao, Yue Shi, Kai Zhang et al.

CVPR 2024posterarXiv:2311.11600
23
citations
#733

Garment Recovery with Shape and Deformation Priors

Ren Li, Corentin Dumery, Benoît Guillard et al.

CVPR 2024posterarXiv:2311.10356
23
citations
#734

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

Gihyun Kwon, Simon Jenni, Ding Li et al.

CVPR 2024posterarXiv:2404.03913
23
citations
#735

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

Jiahao Li, Weijian Ma, Xueyang Li et al.

CVPR 2025posterarXiv:2505.04481
23
citations
#736

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.

CVPR 2024posterarXiv:2312.02137
23
citations
#737

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024posterarXiv:2311.15841
23
citations
#738

SANeRF-HQ: Segment Anything for NeRF in High Quality

Yichen Liu, Benran Hu, Chi-Keung Tang et al.

CVPR 2024posterarXiv:2312.01531
23
citations
#739

Efficient Visual State Space Model for Image Deblurring

Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.

CVPR 2025posterarXiv:2405.14343
23
citations
#740

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025posterarXiv:2412.03324
23
citations
#741

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.

CVPR 2025highlightarXiv:2503.19199
23
citations
#742

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

Li Xu, Haoxuan Qu, Yujun Cai et al.

CVPR 2024posterarXiv:2401.00029
23
citations
#743

Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions

Yujeong Chae, Hyeonseong Kim, Kuk-Jin Yoon

CVPR 2024poster
23
citations
#744

Face2Diffusion for Fast and Editable Face Personalization

Kaede Shiohara, Toshihiko Yamasaki

CVPR 2024posterarXiv:2403.05094
23
citations
#745

VAREN: Very Accurate and Realistic Equine Network

Silvia Zuffi, Ylva Mellbin, Ci Li et al.

CVPR 2024poster
23
citations
#746

PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo et al.

CVPR 2024highlightarXiv:2403.01852
23
citations
#747

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

CVPR 2025posterarXiv:2411.17190
23
citations
#748

Bayesian Diffusion Models for 3D Shape Reconstruction

Haiyang Xu, Yu lei, Zeyuan Chen et al.

CVPR 2024posterarXiv:2403.06973
23
citations
#749

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.

CVPR 2025posterarXiv:2405.17811
23
citations
#750

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Xin Liu, Jie Liu, Jie Tang et al.

CVPR 2025posterarXiv:2503.06896
23
citations
#751

EditAR: Unified Conditional Generation with Autoregressive Models

Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang

CVPR 2025posterarXiv:2501.04699
23
citations
#752

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li et al.

CVPR 2025posterarXiv:2503.14908
23
citations
#753

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

Yuan Dong, Chuan Fang, Liefeng Bo et al.

CVPR 2024posterarXiv:2305.12497
23
citations
#754

Learning Equi-angular Representations for Online Continual Learning

Minhyuk Seo, Hyunseo Koh, Wonje Jeung et al.

CVPR 2024posterarXiv:2404.01628
23
citations
#755

Language-Guided Image Tokenization for Generation

Kaiwen Zha, Lijun Yu, Alireza Fathi et al.

CVPR 2025posterarXiv:2412.05796
23
citations
#756

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot, Bogdan Mazoure, Omar Attia et al.

CVPR 2025posterarXiv:2412.08442
23
citations
#757

Test-Time Adaptation for Depth Completion

Hyoungseob Park, Anjali W Gupta, Alex Wong

CVPR 2024posterarXiv:2402.03312
23
citations
#758

Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation

Sangyun Shin, Kaichen Zhou, Madhu Vankadari et al.

CVPR 2024posterarXiv:2312.11269
23
citations
#759

Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Mainak Singha, Ankit Jha, Shirsha Bose et al.

CVPR 2024posterarXiv:2404.00710
23
citations
#760

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025posterarXiv:2409.11367
22
citations
#761

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Le Yang, Ziwei Zheng, Boxu Chen et al.

CVPR 2025posterarXiv:2412.13817
22
citations
#762

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Yixiong Zou, Yicong Liu, Yiman Hu et al.

CVPR 2024posterarXiv:2403.00567
22
citations
#763

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers

Zhibo Yang, Sounak Mondal, Seoyoung Ahn et al.

CVPR 2024posterarXiv:2303.09383
22
citations
#764

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

Sidi Wu, Yizi Chen, Loic Landrieu et al.

CVPR 2024posterarXiv:2403.20142
22
citations
#765

COCONut: Modernizing COCO Segmentation

Xueqing Deng, Qihang Yu, Peng Wang et al.

CVPR 2024posterarXiv:2404.08639
22
citations
#766

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

Xiaotian Li, Baojie Fan, Jiandong Tian et al.

CVPR 2024posterarXiv:2411.00340
22
citations
#767

FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features

Andre Rochow, Max Schwarz, Sven Behnke

CVPR 2024posterarXiv:2404.09736
22
citations
#768

Domain Prompt Learning with Quaternion Networks

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

CVPR 2024highlightarXiv:2312.08878
22
citations
#769

Rethinking Multi-view Representation Learning via Distilled Disentangling

Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.

CVPR 2024posterarXiv:2403.10897
22
citations
#770

Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Yichen Li, Kaichun Mo, Yueqi Duan et al.

CVPR 2024posterarXiv:2303.06163
22
citations
#771

PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

Jingbo Wang, Zhengyi Luo, Ye Yuan et al.

CVPR 2024posterarXiv:2404.19722
22
citations
#772

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025posterarXiv:2412.01694
22
citations
#773

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin et al.

CVPR 2025posterarXiv:2503.14182
22
citations
#774

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025posterarXiv:2408.06047
22
citations
#775

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

yiming ren, xiao han, Chengfeng Zhao et al.

CVPR 2024highlightarXiv:2402.17171
22
citations
#776

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.

CVPR 2024posterarXiv:2301.09209
22
citations
#777

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025posterarXiv:2503.10080
22
citations
#778

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025posterarXiv:2412.11890
22
citations
#779

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025posterarXiv:2505.16652
22
citations
#780

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

Quan Liu, Hongzi Zhu, Zhenxi Wang et al.

CVPR 2024posterarXiv:2403.03532
22
citations
#781

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025posterarXiv:2412.12077
22
citations
#782

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025posterarXiv:2412.05263
22
citations
#783

GEARS: Local Geometry-aware Hand-object Interaction Synthesis

Keyang Zhou, Bharat Lal Bhatnagar, Jan Lenssen et al.

CVPR 2024posterarXiv:2404.01758
22
citations
#784

Semantic-aware SAM for Point-Prompted Instance Segmentation

Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.

CVPR 2024highlightarXiv:2312.15895
22
citations
#785

Spatio-Temporal Turbulence Mitigation: A Translational Perspective

Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.

CVPR 2024posterarXiv:2401.04244
22
citations
#786

CleanDIFT: Diffusion Features without Noise

Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.

CVPR 2025posterarXiv:2412.03439
22
citations
#787

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138
22
citations
#788

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

CVPR 2025posterarXiv:2412.03342
22
citations
#789

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

yaofeng xie, Lingwei Kong, Kai Chen et al.

CVPR 2024posterarXiv:2404.14542
21
citations
#790

PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization

Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.

CVPR 2024posterarXiv:2404.09011
21
citations
#791

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685
21
citations
#792

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025posterarXiv:2406.05821
21
citations
#793

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2024posterarXiv:2403.00592
21
citations
#794

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering

Zaid Khan, Yun Fu

CVPR 2024posterarXiv:2404.10193
21
citations
#795

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

CVPR 2024highlightarXiv:2403.03122
21
citations
#796

Clustering Propagation for Universal Medical Image Segmentation

Yuhang Ding, Liulei Li, Wenguan Wang et al.

CVPR 2024posterarXiv:2403.16646
21
citations
#797

Self-Supervised Multi-Object Tracking with Path Consistency

Zijia Lu, Bing Shuai, Yanbei Chen et al.

CVPR 2024highlightarXiv:2404.05136
21
citations
#798

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

CVPR 2024posterarXiv:2312.13980
21
citations
#799

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Songchun Zhang, Yibo Zhang, Quan Zheng et al.

CVPR 2024posterarXiv:2403.09439
21
citations
#800

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

CVPR 2024posterarXiv:2404.16306
21
citations