Most Cited CVPR "message-passing gnns" Papers
5,589 papers found • Page 10 of 28
Conference
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang, Fuling Wang, Yuehang Li et al.
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
Zhenhua Xu, Yan Bai, Yujia Zhang et al.
Hash3D: Training-free Acceleration for 3D Generation
Xingyi Yang, Songhua Liu, Xinchao Wang
Overload: Latency Attacks on Object Detection for Edge Devices
Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
Hao Yin, Guangzong Si, Zilei Wang
A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning
Xiaoyang Xu, Mengda Yang, Wenzhe Yi et al.
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang, Zhang Zhang, Qiang Wu et al.
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Nicolas Dufour, Vicky Kalogeiton, David Picard et al.
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK, Zi Li, Yunhao Bai et al.
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee, Jongwon Choi
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
Chaokang Jiang, Guangming Wang, Jiuming Liu et al.
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
Yunqi Miao, Jiankang Deng, Jungong Han
T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory
Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua, Qing Liu, Lingzhi Zhang et al.
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.
FreePoint: Unsupervised Point Cloud Instance Segmentation
Zhikai Zhang, Jian Ding, Li Jiang et al.
Text-Guided 3D Face Synthesis - From Generation to Editing
Yunjie Wu, Yapeng Meng, Zhipeng Hu et al.
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
Huiqiang Sun, Xingyi Li, Liao Shen et al.
Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
Huicong Zhang, Haozhe Xie, Hongxun Yao
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan
Data Valuation and Detections in Federated Learning
Wenqian Li, Shuran Fu, Fengrui Zhang et al.
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
Junfeng Ni, Yu Liu, Ruijie Lu et al.
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
Zhao Dong, Ka chen, Zhaoyang Lv et al.
AssistGUI: Task-Oriented PC Graphical User Interface Automation
Difei Gao, Lei Ji, Zechen Bai et al.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.
Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Xinting Liao, Weiming Liu, Chaochao Chen et al.
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Xiaohan Lei, Min Wang, Wengang Zhou et al.
Revisiting Adversarial Training Under Long-Tailed Distributions
Xinli Yue, Ningping Mou, Qian Wang et al.
RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang, Xin Li, Zikun Zhou et al.
Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation
Xiyi Chen, Marko Mihajlovic, Shaofei Wang et al.
A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization
Hongwei Ren, Jiadong Zhu, Yue Zhou et al.
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Yuhao Wang, Yongfeng Lv, Pingping Zhang et al.
Understanding Video Transformers via Universal Concept Discovery
Matthew Kowal, Achal Dave, Rares Andrei Ambrus et al.
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Junha Hyung, Kinam Kim, Susung Hong et al.
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing
Boqiang Zhang, Hongtao Xie, Zuan Gao et al.
MagicArticulate: Make Your 3D Models Articulation-Ready
Chaoyue Song, Jianfeng Zhang, Xiu Li et al.
Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen, Ming Jiang, Qi Zhao
SFOD: Spiking Fusion Object Detector
Yimeng Fan, Wei Zhang, Changsong Liu et al.
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Youjia Zhang, Anpei Chen, Yumin Wan et al.
What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
Yihua Cheng, Yaning Zhu, Zongji Wang et al.
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Jiangyong Huang, Baoxiong Jia, Yan Wang et al.
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.
Spiking Transformer with Spatial-Temporal Attention
Donghyun Lee, Yuhang Li, Youngeun Kim et al.
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
Rao Fu, Dingxi Zhang, Alex Jiang et al.
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection
Jianan Ye, Weiguang Zhao, Xi Yang et al.
HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images
Xihe Yang, Xingyu Chen, Daiheng Gao et al.
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan, Vijay Kumar BG, Samuel Schulter et al.
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation
Sumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar et al.
Shadow Generation for Composite Image Using Diffusion Model
Qingyang Liu, Junqi You, Jian-Ting Wang et al.
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
Mingkun Lei, Xue Song, Beier Zhu et al.
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou, Manli Tao, Chaoyang Zhao et al.
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin, Yongzhi Su, Praveen Nathan et al.
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang, Weiming Zhuang, Chen Chen et al.
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li et al.
Cubify Anything: Scaling Indoor 3D Object Detection
Justin Lazarow, David Griffiths, Gefen Kohavi et al.
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.
DaReNeRF: Direction-aware Representation for Dynamic Scenes
Ange Lou, Benjamin Planche, Zhongpai Gao et al.
Any6D: Model-free 6D Pose Estimation of Novel Object
Taeyeop Lee, Bowen Wen, Minjun Kang et al.
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Stefan Lionar, Jiabin Liang, Gim Hee Lee
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
Dongkai Wang, shiyu xuan, Shiliang Zhang
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Junyan Wang, Zhenhong Sun, Stewart Tan et al.
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Chao Xu, Yang Liu, Jiazheng Xing et al.
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu, Jinhua Hao, Yukang Ding et al.
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
Vidit Goel, Elia Peruzzo, Yifan Jiang et al.
Robust Synthetic-to-Real Transfer for Stereo Matching
Jiawei Zhang, Jiahe Li, Lei Huang et al.
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He, Shaofei Huang, Xuecheng Nie et al.
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
Junyang Chen, Jinshan Pan, Jiangxin Dong
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han, Siyuan Li, Jiaqi Chen et al.
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen, Juze Zhang, Shrinidhi Kowshika Lakshmikanth et al.
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen, Siyuan Liang, Jingzhi Li et al.
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu, Xingxuan Zhang, Renzhe Xu et al.
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
Xiao Wang, Yu Jin, Wentao Wu et al.
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
Yuan Wang, Ouxiang Li, Tingting Mu et al.
FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Harsh Rangwani, Pradipto Mondal, Mayank Mishra et al.
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Yiming Li, Zhiheng Li, Nuo Chen et al.
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
Chenfan Qu, Yiwu Zhong, Chongyu Liu et al.
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
Lei Fan, Dongdong Fan, Zhiguang Hu et al.
Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing
Jiayi Fu, Siyu Liu, Zikun Liu et al.
HRAvatar: High-Quality and Relightable Gaussian Head Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang et al.
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
Yunzhi Zhang, Zizhang Li, Matt Zhou et al.
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations
Shengeng Tang, Jiayi He, Lechao Cheng et al.
Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen, Dapeng Chen, Ruijin Liu et al.
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
Zhiyang Guo, Jinxu Xiang, Kai Ma et al.
ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D Image
Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos et al.
Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization
Khiem Le, Tuan Long Ho, Cuong Do et al.
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.
Stratified Avatar Generation from Sparse Observations
Han Feng, Wenchao Ma, Quankai Gao et al.
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
Yiqun Mei, Mingming He, Li Ma et al.
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.
Neural Visibility Field for Uncertainty-Driven Active Mapping
Shangjie Xue, Jesse Dill, Pranay Mathur et al.
CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data
Wei Fang, Yuxing Tang, Heng Guo et al.
Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang, Yuan Meng, Jiacheng Jiang et al.
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang, Hanpeng Liu, Stephen Lin et al.
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu et al.
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang et al.
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Xi Liu, Ying Guo, Cheng Zhen et al.
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haochen Han, Qinghua Zheng, Guang Dai et al.
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
Gangwei Xu, Yujin Wang, Jinwei Gu et al.
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
Haoxuanye Ji, Pengpeng Liang, Erkang Cheng
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei, Chenxi Liu, Siyuan Qiao et al.
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang, Xinyi Chen, Yilun Chen et al.
Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
Jaeha Kim, Junghun Oh, Kyoung Mu Lee
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li, Taishi Ono, Takeshi Uemori et al.
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli, Jindong Jiang, Di Liu et al.
MambaIC: State Space Models for High-Performance Learned Image Compression
Fanhu Zeng, Hao Tang, Yihua Shao et al.
PrEditor3D: Fast and Precise 3D Shape Editing
Ziya Erkoc, Can Gümeli, Chaoyang Wang et al.
Geometry Transfer for Stylizing Radiance Fields
Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos et al.
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
Tao Xie, Xi Chen, Zhen Xu et al.
HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
Mengcheng Li, Hongwen Zhang, Yuxiang Zhang et al.
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Jaeho Moon, Juan Luis Gonzalez Bello, Byeongjun Kwon et al.
Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models
Kota Sueyoshi, Takashi Matsubara
C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Yiqun Lin, Jiewen Yang, hualiang wang et al.
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
Kunyang Zhou
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning
Yunlu Yan, Huazhu Fu, Yuexiang Li et al.
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.
Condition-Aware Neural Network for Controlled Image Generation
Han Cai, Muyang Li, Qinsheng Zhang et al.
A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals
Jiangnan Tang, Jingya Wang, Kaiyang Ji et al.
Differentiable Information Bottleneck for Deterministic Multi-view Clustering
Xiaoqiang Yan, Zhixiang Jin, Fengshou Han et al.
Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation
Fahimeh Hosseini Noohdani, Parsa Hosseini, Aryan Yazdan Parast et al.
SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations
Pu Li, Jianwei Guo, HUIBIN LI et al.
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li, Jinglu Wang, Xiaohao Xu et al.
Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Guocheng Qian, Kuan-Chieh Wang, Or Patashnik et al.
Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning
Youqi Pan, Wugen Zhou, Yingdian Cao et al.
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang, Shenghao Ren, Haolei Yuan et al.
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li, Yifei Xing, Xiangyuan Lan et al.
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
Zhuoman Liu, Weicai Ye, Yan Luximon et al.
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
Alireza Ganjdanesh, Shangqian Gao, Heng Huang
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Qihao Liu, Yi Zhang, Song Bai et al.
Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification
Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.
VideoDirector: Precise Video Editing via Text-to-Video Models
Yukun Wang, Longguang Wang, Zhiyuan Ma et al.
Object Pose Estimation via the Aggregation of Diffusion Features
Tianfu Wang, Guosheng Hu, Hongguang Wang
Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model
Ziyuan Yang, Yingyu Chen, Zhiwen Wang et al.
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
yunlong lin, Zixu Lin, Haoyu Chen et al.
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen, Jianwei Yang, Haiping Wu et al.
NeRF Director: Revisiting View Selection in Neural Volume Rendering
Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal et al.
SURE: SUrvey REcipes for building reliable and robust deep networks
Yuting Li, Yingyi Chen, Xuanlong Yu et al.
Adapters Strike Back
Jan-Martin Steitz, Stefan Roth
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang, Xin Li, Shengzhao Wen et al.
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang, Shuming Hu, Shiyu Zhao et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu, Bin Duan, Weitai Kang et al.
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.
MaGGIe: Masked Guided Gradual Human Instance Matting
Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava et al.
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Wenyuan Zhang, Yixiao Yang, Han Huang et al.
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy et al.
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.
PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng, Liwei Liao, Xufeng Li et al.
Multiple View Geometry Transformers for 3D Human Pose Estimation
Ziwei Liao, jialiang zhu, Chunyu Wang et al.
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong, Jiadong Pan, Liang Li et al.
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
Xin Zhang, Xue Yang, Yuxuan Li et al.
Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI
Chong Wang, Lanqing Guo, Yufei Wang et al.
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan, Pei Fu, Shan Guo et al.
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
Real-Time Simulated Avatar from Head-Mounted Sensors
Zhengyi Luo, Jinkun Cao, Rawal Khirodkar et al.
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.
Gaussian Shadow Casting for Neural Characters
Luis Bolanos, Shih-Yang Su, Helge Rhodin
Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
Gaurav Shrivastava, Abhinav Shrivastava
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Zhe Shan, Yang Liu, Lei Zhou et al.
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
Wei Deng, Mengshi Qi, Huadong Ma
Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
Lennart Bastian, Yizheng Xie, Nassir Navab et al.
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model
Leheng Zhang, Weiyi You, Kexuan Shi et al.
Interactive3D: Create What You Want by Interactive 3D Generation
Shaocong Dong, Lihe Ding, Zhanpeng Huang et al.
Diversified and Personalized Multi-rater Medical Image Segmentation
Yicheng Wu, Xiangde Luo, Zhe Xu et al.
Reversible Decoupling Network for Single Image Reflection Removal
Hao Zhao, Mingjia Li, Qiming Hu et al.
Commonsense Prototype for Outdoor Unsupervised 3D Object Detection
Hai Wu, Shijia Zhao, Xun Huang et al.
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao Yang, Liyuan Pan, Yan Yang et al.
Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
Feng Liang, Haoyu Ma, Zecheng He et al.
Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes
Zhiyuan Yu, Zheng Qin, lintao zheng et al.
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Haotian Wang, Yuzhe Weng, Yueyan Li et al.
Iterated Learning Improves Compositionality in Large Vision-Language Models
Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi et al.
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
Siyuan Cheng, Lingjuan Lyu, Zhenting Wang et al.
What How and When Should Object Detectors Update in Continually Changing Test Domains?
Jayeon Yoo, Dongkwan Lee, Inseop Chung et al.
Revisiting MAE Pre-training for 3D Medical Image Segmentation
Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Shuai Tan, Biao Gong, Yutong Feng et al.
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta, Srijan Das, Jacob Nielsen et al.
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
Fanqi Pu, Yifan Wang, Jiru Deng et al.
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
Gianni Franchi, Olivier Laurent, Maxence Leguéry et al.
Efficient Meshflow and Optical Flow Estimation from Event Cameras
Xinglong Luo, Ao Luo, Zhengning Wang et al.
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu, Ao Li, Yansong Tang et al.
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Tariq Berrada, Jakob Verbeek, camille couprie et al.