Most Cited CVPR "message passing networks" Papers
5,589 papers found • Page 12 of 28
Conference
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
Yian Zhao, Wanshi Xu, Ruochong Zheng et al.
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
Bo Tong, Bokai Lai, Yiyi Zhou et al.
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen, Guoqiang Gong, Tianxiang Hao et al.
Neural Hierarchical Decomposition for Single Image Plant Modeling
Zhihao Liu, Zhanglin Cheng, Naoto Yokoya
Order-One Rolling Shutter Cameras
Marvin Anas Hahn, Kathlén Kohn, Orlando Marigliano et al.
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Shaoan Xie, Lingjing Kong, Yujia Zheng et al.
Geometry in Style: 3D Stylization via Surface Normal Deformation
Nam Anh Dinh, Itai Lang, Hyunwoo Kim et al.
ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On
Ji Woo Hong, Tri Ton, Trung X. Pham et al.
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces
Linyi Jin, Nilesh Kulkarni, David Fouhey
Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jintai Chen, Jian liu et al.
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
Alessandro Achille, Greg Ver Steeg, Tian Yu Liu et al.
Reasoning to Attend: Try to Understand How <SEG> Token Works
Rui Qian, Xin Yin, Dejing Dou
Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair
Jeonghoon Park, Chaeyeon Chung, Jaegul Choo
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Namitha Padmanabhan, Matthew A Gwilliam, Pulkit Kumar et al.
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
Xi Wang, Hongzhen Li, Heng Fang et al.
Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?
Renshuai Tao, Haoyu Wang, Yuzhe Guo et al.
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
Shengqiong Wu, Hao Fei, Jingkang Yang et al.
Pose Adapted Shape Learning for Large-Pose Face Reenactment
Gee-Sern Hsu, Jie-Ying Zhang, Yu-Hsiang Huang et al.
SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
Dihan Zheng, Yihang Zou, Xiaowen Zhang et al.
SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing
Tomoki Ichikawa, Shohei Nobuhara, Ko Nishino
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Maochen Yang, Zekun Li, Jian Zhang et al.
Learning to Remove Wrinkled Transparent Film with Polarized Prior
Jiaqi Tang, RUIZHENG WU, Xiaogang Xu et al.
GCC: Generative Color Constancy via Diffusing a Color Checker
Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang et al.
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning
Jiayi Guan, Li Shen, Ao Zhou et al.
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Kwan-Yee Lin, Stella X. Yu
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer, Olaf Dünkel, Christian Theobalt et al.
Building Optimal Neural Architectures using Interpretable Knowledge
Keith Mills, Fred Han, Mohammad Salameh et al.
Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression
Yu Mao, Jun Wang, Nan Guan et al.
Locally Adaptive Neural 3D Morphable Models
Michail Tarasiou, Rolandos Alexandros Potamias, Eimear O' Sullivan et al.
IDEA-Bench: How Far are Generative Models from Professional Designing?
Chen Liang, Lianghua Huang, Jingwu Fang et al.
AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation
Zeyi Xu, Jinfan Liu, Kuangxu Chen et al.
Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features
Yuanbo Xiangli, Ruojin Cai, Hanyu Chen et al.
Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds
Shengjun Zhang, Xin Fei, Yueqi Duan
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao, Weijia Mao, Mike Zheng Shou
Towards All-in-One Medical Image Re-Identification
Yuan Tian, Kaiyuan Ji, Rongzhao Zhang et al.
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao et al.
Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory
Han Hu, Wenli Du, Peng Liao et al.
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
Luyuan Xie, Tianyu Luan, Wenyuan Cai et al.
ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
Radu Berdan, Beril Besbinar, Christoph Reinders et al.
FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
Tongyuan Bai, Wangyuanfan Bai, Dong Chen et al.
Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding
Sai Wang, Yutian Lin, Yu Wu
DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging
Zhu Liu, Zijun Wang, Jinyuan Liu et al.
Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval
Fan Zhang, Xian-Sheng Hua, Chong Chen et al.
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik et al.
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.
VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
Juil Koo, Paul Guerrero, Chun-Hao P. Huang et al.
Fixed Point Diffusion Models
Luke Melas-Kyriazi, Xingjian Bai
Articulated Kinematics Distillation from Video Diffusion Models
Xuan Li, Qianli Ma, Tsung-Yi Lin et al.
AeSPa : Attention-guided Self-supervised Parallel Imaging for MRI Reconstruction
Jinho Joo, Hyeseong Kim, Hyeyeon Won et al.
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Songsong Yu, Yuxin Chen, Zhongang Qi et al.
Recognition-Synergistic Scene Text Editing
Zhengyao Fang, Pengyuan Lyu, Jingjing Wu et al.
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions
Jiangbei Hu, Yanggeng Li, Fei Hou et al.
NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery
Reese Kneeland, Paul Scotti, Ghislain St-Yves et al.
Cheb-GR: Rethinking K-nearest Neighbor Search in Re-ranking for Person Re-identification
Jinxi Yang, He Li, Bo Du et al.
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen, Bingchen Zhao, Yilun Chen et al.
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
Yiqi Zhu, Ziyue Wang, Can Zhang et al.
Uncertainty Weighted Gradients for Model Calibration
Jinxu Lin, Linwei Tao, Minjing Dong et al.
RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement
Gang He, Weiran Wang, Guancheng Quan et al.
Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising
Tong Li, Lizhi Wang, Zhiyuan Xu et al.
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
Zhantao Yang, Ruili Feng, Keyu Yan et al.
PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
Barza Nisar, Steven L. Waslander
Spherical Manifold Guided Diffusion Model for Panoramic Image Generation
Xiancheng Sun, Mai Xu, Shengxi Li et al.
BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning
Hao Zhu, Yifei Zhang, Junhao Dong et al.
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
Yijie Xu, Bolun Zheng, Wei Zhu et al.
Enhancing Diversity for Data-free Quantization
Kai Zhao, zhihao zhuang, Miao Zhang et al.
Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification
Zhiqi Pang, Junjie Wang, Lingling Zhao et al.
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
Yun Chang, Leonor Fermoselle, Duy Ta et al.
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
Jialai Wang, Yuxiao Wu, Weiye Xu et al.
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu, Jichao Zhang, Paolo Rota et al.
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung et al.
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion
Konyul Park, Yecheol Kim, Daehun Kim et al.
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
Xuanchen Li, Jianyu Wang, Yuhao Cheng et al.
Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation
Tianran Chen, Jiarui Chen, Baoquan Zhang et al.
Less Attention is More: Prompt Transformer for Generalized Category Discovery
Wei Zhang, Baopeng Zhang, Zhu Teng et al.
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Zijia Lu, ASM Iftekhar, Gaurav Mittal et al.
EvDiG: Event-guided Direct and Global Components Separation
xinyu zhou, Peiqi Duan, Boyu Li et al.
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
Daikun Liu, Lei Cheng, Teng Wang et al.
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
Shalini Maiti, Lourdes Agapito, Filippos Kokkinos
Fractal Calibration for Long-tailed Object Detection
Konstantinos Alexandridis, Ismail Elezi, Jiankang Deng et al.
Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
Huiwon Jang, Sihyun Yu, Jinwoo Shin et al.
Enhanced then Progressive Fusion with View Graph for Multi-View Clustering
Zhibin Dong, Meng Liu, Siwei Wang et al.
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu, Liying Yang, Peike Li et al.
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang, Wonmin Byeon, Jiarui Xu et al.
Leveraging SD Map to Augment HD Map-based Trajectory Prediction
Zhiwei Dong, Ran Ding, Wei Li et al.
ICP: Immediate Compensation Pruning for Mid-to-high Sparsity
Xin Luo, Fu Xueming, Zihang Jiang et al.
STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
Siyi Du, Xinzhe Luo, Declan ORegan et al.
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
Feng Yan, Xiaoheng Jiang, Yang Lu et al.
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong, Jicheol Park, Sungyeon Kim et al.
End-to-End Implicit Neural Representations for Classification
Alexander Gielisse, Jan van Gemert
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
Xinhao Zhong, Hao Fang, Bin Chen et al.
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
Jungsoo Lee, Debasmit Das, Munawar Hayat et al.
FedSPA: Generalizable Federated Graph Learning under Homophily Heterogeneity
Zihan Tan, Guancheng Wan, Wenke Huang et al.
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
Xudong Li, Wenjie Nie, Yan Zhang et al.
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Junyuan Deng, Xinyi Wu, Yongxing Yang et al.
OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary
Yifeng Yang, Lin Zhu, Zewen Sun et al.
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
Shian Du, Menghan Xia, Chang Liu et al.
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu, Tangyu Jiang, Shuning Jia et al.
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
Yang Zheng, Menglei Chai, Delio Vicini et al.
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.
Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach
Steeven JANNY, Hervé Poirier, Leonid Antsfeld et al.
NADER: Neural Architecture Design via Multi-Agent Collaboration
Zekang Yang, Wang ZENG, Sheng Jin et al.
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning
Guangyan Chen, Te Cui, Meiling Wang et al.
Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers?
Zebin You, Xinyu Zhang, Hanzhong Guo et al.
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning
Hao Zheng, Zhigang Hu, Boyu Wang et al.
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu, Lu Zhang, Hang Yao et al.
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
What Makes a Good Dataset for Knowledge Distillation?
Logan Frank, Jim Davis
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong, Junjun Jiang, Kui Jiang et al.
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models
Jaerin Lee, Daniel Jung, Kanggeon Lee et al.
Image Referenced Sketch Colorization Based on Animation Creation Workflow
Dingkun Yan, Xinrui Wang, Zhuoru Li et al.
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
Xinpeng Liu, Zeyi Huang, Fumio Okura et al.
Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention
Saad Wazir, Daeyoung Kim
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
Yixuan Zhu, Haolin Wang, Shilin Ma et al.
Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation
Suruchi Kumari, Pravendra Singh
BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer
Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.
SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images
Daisheng Jin, Jiangbei Hu, Baixin Xu et al.
Gyro-based Neural Single Image Deblurring
Heemin Yang, Jaesung Rim, Seungyong Lee et al.
FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance
Yinglong Li, Hongyu Wu, Wang et al.
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee et al.
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao, Feng Liu, Yue Liu et al.
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu, Yangcen Liu
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin, Yunsheng Li, Dongdong Chen et al.
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn et al.
Masking meets Supervision: A Strong Learning Alliance
Byeongho Heo, Taekyung Kim, Sangdoo Yun et al.
Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model
Haobo Jiang, Jin Xie, Jian Yang et al.
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
Jinneyong Kim, Seung-Hwan Baek
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Marco Garosi, Alessandro Conti, Gaowen Liu et al.
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu, Yan Huang, Min Gao et al.
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
Haonan An, Guang Hua, Zhengru Fang et al.
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
Yichong Lu, Yichi Cai, Shangzhan Zhang et al.
BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion
Hao Guo, Xiaoshui Huang, Hao jiacheng et al.
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
Dong Liang, Jinyuan Jia, Yuhao Liu et al.
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Aviral Chharia, Wenbo Gou, Haoye Dong
Improve Representation for Imbalanced Regression through Geometric Constraints
Zijian Dong, Yilei Wu, Chongyao Chen et al.
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion
Wei Wu, Xi Guo, Weixuan TANG et al.
PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing
Ziyu Wu, Yufan Xiong, Mengting Niu et al.
Previously on ... From Recaps to Story Summarization
Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
Jinchang Xu, Shaokang Wang, Jintao Chen et al.
Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling
Yuhui Quan, Tianxiang Zheng, Zhiyuan Ma et al.
Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories
Yan Zhang, Sergey Prokudin, Marko Mihajlovic et al.
Efficient Video Super-Resolution for Real-time Rendering with Decoupled G-buffer Guidance
Mingjun Zheng, Long Sun, Jiangxin Dong et al.
Spectral State Space Model for Rotation-Invariant Visual Representation Learning
Sahar Dastani, Ali Bahri, Moslem Yazdanpanah et al.
Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
Jianyang Xie, Yitian Zhao, Yanda Meng et al.
Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization
Xiran Wang, Jian Zhang, Lei Qi et al.
Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
Guoyu Lu
FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
Matthieu Terris, Ulugbek Kamilov, Thomas Moreau
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang, Zhimin Sun, Yuzhen Du et al.
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi
Distilled Datamodel with Reverse Gradient Matching
Jingwen Ye, Ruonan Yu, Songhua Liu et al.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Zhenglin Zhou, Fan Ma, Hehe Fan et al.
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Bo Sun, Thibault Groueix, Chen Song et al.
OW-OVD: Unified Open World and Open Vocabulary Object Detection
Xing Xi, Yangyang Huang, Ronghua Luo et al.
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal
Yujie Wang, Praneeth Chakravarthula, Baoquan Chen
DFM: Differentiable Feature Matching for Anomaly Detection
Wu Sheng, Yimi Wang, Xudong Liu et al.
LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup
Jianxiong Shen, Yue Qian, Xiaohang Zhan
Face Forgery Video Detection via Temporal Forgery Cue Unraveling
Zonghui Guo, YingJie Liu, Jie Zhang et al.
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
Differentiable Display Photometric Stereo
Seokjun Choi, Seungwoo Yoon, Giljoo Nam et al.
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.
Universal Scene Graph Generation
Shengqiong Wu, Hao Fei, Tat-seng Chua
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
Eunkyu Park, Minyeong Kim, Gunhee Kim
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
Dongnan Gui, Xun Guo, Wengang Zhou et al.
Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network
Xingyu Qiu, Mengying Yang, Xinghua Ma et al.
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi, Saurabh ., Ayush Abhay Shrivastava et al.
A Flag Decomposition for Hierarchical Datasets
Nathan Mankovich, Ignacio Santamaria, Gustau Camps-Valls et al.
3D Dental Model Segmentation with Geometrical Boundary Preserving
Shufan Xi, Zexian Liu, Junlin Chang et al.
BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting
Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang et al.
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
Yan Jiang, Hao Yu, Xu Cheng et al.
4D-Fly: Fast 4D Reconstruction from a Single Monocular Video
Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.
Scaling Down Text Encoders of Text-to-Image Diffusion Models
Lifu Wang, Daqing Liu, Xinchen Liu et al.
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
Zechuan Li, Hongshan Yu, Yihao Ding et al.
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes
Weixiao Gao, Liangliang Nan, Hugo Ledoux
Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-space Diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek et al.
KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang, Chenyangguang Zhang, Yan Di et al.
GG-SSMs: Graph-Generating State Space Models
Nikola Zubic, Davide Scaramuzza
Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Ashish Kumar, Daneul Kim, Jaesik Park et al.
ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images
Yanqing Shen, Turcan Tuna, Marco Hutter et al.
FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation
Ziqian Yang, Xinqiao Zhao, Xiaolei Wang et al.
Action Detail Matters: Refining Video Recognition with Local Action Queries
Mengmeng Wang, Zeyi Huang, Xiangjie Kong et al.
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost
Haiyang Mei, Pengyu Zhang, Mike Zheng Shou
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang, Bowen Jin, Jiacheng Shen et al.
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
Uri Gadot, Shie Mannor, Assaf Shocher et al.
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
Zihui Zhang, Weisheng Dai, Hongtao Wen et al.
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment
Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
Qirui Huang, Runze Zhang, Kangjun Liu et al.
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
Faridoun Mehri, Mahdieh Baghshah, Mohammad Taher Pilehvar
StickMotion: Generating 3D Human Motions by Drawing a Stickman
Tao Wang, Zhihua Wu, Qiaozhi He et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training
Ziliang Chen, Xin Huang, Xiaoxuan Fan et al.
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation
Nadav Z. Cohen, Oron Nir, Ariel Shamir
Optimizing for the Shortest Path in Denoising Diffusion Model
Ping Chen, Xingpeng Zhang, Zhaoxiang Liu et al.
GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
Weiqi Feng, Dong Han, Zekang Zhou et al.