Most Cited ICCV "transformer model" Papers
2,701 papers found • Page 7 of 14
Conference
UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
Yuhao Wang, Wei Xi
Supercharging Floorplan Localization with Semantic Rays
Yuval Grader, Hadar Averbuch-Elor
Consistency Trajectory Matching for One-Step Generative Super-Resolution
Weiyi You, Mingyang Zhang, Leheng Zhang et al.
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan, Hanqing Liu, Yao Huang et al.
CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization
Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects
Yidi Shao, Mu Huang, Chen Change Loy et al.
Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge
Xiaogang Xu, Jiafei Wu, Qingsen Yan et al.
MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions
Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.
CVPT: Cross Visual Prompt Tuning
Lingyun Huang, Jianxu Mao, Junfei YI et al.
Addressing Text Embedding Leakage in Diffusion-based Image Editing
Sunung Mun, Jinhwan Nam, Sunghyun Cho et al.
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing
Chengxu Liu, Lu Qi, Jinshan Pan et al.
Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong et al.
Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching
Tianli Liao, Chenyang Zhao, Lei Li et al.
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
Xiao Liang, Di Wang, Zhicheng Jiao et al.
Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions
Yuanhong Zheng, Ruixuan Yu, Jian Sun
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu, Yizhou Wang, Xiangyu Yue et al.
Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation
Bozhong Zheng, Jinye Gan, Xiaohao Xu et al.
Simultaneous Motion And Noise Estimation with Event Cameras
Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim
Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
Hongyu Wen, Yiming Zuo, Venkat Subramanian et al.
Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
Wenyao Zhang, Hongsi Liu, Bohan Li et al.
Guiding Diffusion-Based Articulated Object Generation by Partial Point Cloud Alignment and Physical Plausibility Constraints
Jens U. Kreber, Joerg Stueckler
Multi-Modal Few-Shot Temporal Action Segmentation
Zijia Lu, Ehsan Elhamifar
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Liang Xu, Chengqun Yang, Zili Lin et al.
Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent
En Ci, Shanyan Guan, Yanhao Ge et al.
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Yehao Lu, Minghe Weng, Zekang Xiao et al.
DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image
Jijun Xiang, Xuan Zhu, Xianqi Wang et al.
Online Generic Event Boundary Detection
Hyung Rok Jung, Daneul Kim, Seunggyun Lim et al.
ETA: Energy-based Test-time Adaptation for Depth Completion
Younjoon Chung, Hyoungseob Park, Patrick Rim et al.
SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting
Zihui Gao, Jia-Wang Bian, Guosheng Lin et al.
HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image
Junyi Guo, Jingxuan Zhang, Fangyu Wu et al.
Enhancing Transformers Through Conditioned Embedded Tokens
Hemanth Saratchandran, Simon Lucey
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders
Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.
CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation
Jiannan Ge, Lingxi Xie, Hongtao Xie et al.
Generative Modeling of Shape-Dependent Self-Contact Human Poses
Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito et al.
PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions
Mahesh Bhosale, Abdul Wasi, Yuanhao Zhai et al.
Towards Open-World Generation of Stereo Images and Unsupervised Matching
Feng Qiao, Zhexiao Xiong, Eric Xing et al.
CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images
Jungho Lee, DongHyeong Kim, Dogyoon Lee et al.
Joint Asymmetric Loss for Learning with Noisy Labels
Jialiang Wang, Xianming Liu, Xiong Zhou et al.
JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models
Xiaolong Jin, Zixuan Weng, Hanxi Guo et al.
SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
Bhavna Gopal, Huanrui Yang, Mark Horton et al.
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella et al.
SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
Jessica Bader, Leander Girrbach, Stephan Alaniz et al.
Refer to Any Segmentation Mask Group With Vision-Language Prompts
Shengcao Cao, Zijun Wei, Jason Kuen et al.
Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising
Xiangbin Wei, Yuanfeng Wang, Ao XU et al.
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
Yuanze Li, Shihao Yuan, Haolin Wang et al.
Multimodal Prompt Alignment for Facial Expression Recognition
Fuyan Ma, Yiran He, Bin Sun et al.
RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
Johannes Künzel, Anna Hilsmann, Peter Eisert
InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes
Zesong Yang, Bangbang Yang, Wenqi Dong et al.
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.
Everything is a Video: Unifying Modalities through Next-Frame Prediction
G Thomas Hudson, Dean Slack, Thomas Winterbottom et al.
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas, Deepti Ghadiyaram
LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion
Yisu Zhang, Chenjie Cao, Chaohui Yu et al.
CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling
Trong-Thang Pham, AKASH AWASTHI, Saba Khan et al.
EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
Quang Nguyen, Nhat Le, Baoru Huang et al.
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Ji Du, Xin WANG, Fangwei Hao et al.
Advancing Visual Large Language Model for Multi-granular Versatile Perception
Wentao Xiang, Haoxian Tan, Cong Wei et al.
Kaputt: A Large-Scale Dataset for Visual Defect Detection
Sebastian Höfer, Dorian Henning, Artemij Amiranashvili et al.
Free-running vs Synchronous: Single-Photon Lidar for High-flux 3D Imaging
Ruangrawee Kitichotkul, Shashwath Bharadwaj, Joshua Rapp et al.
ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction
Han Yu, Kehan Li, Dongbai Li et al.
DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering
Rongjia Zheng, Qing Zhang, Chengjiang Long et al.
Stylized-Face: A Million-level Stylized Face Dataset for Face Recognition
Zhengyuan Peng, Jianqing Xu, Yuge Huang et al.
Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography
Jianing Zhang, Jiayi Zhu, Feiyu Ji et al.
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation
Shuchang Ye, Usman Naseem, Mingyuan Meng et al.
Generate, Transduct, Adapt: Iterative Transduction with VLMs
Oindrila Saha, Logan Lawrence, Grant Horn et al.
Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion
Yidi Liu, Dong Li, Yuxin Ma et al.
LookOut: Real-World Humanoid Egocentric Navigation
Boxiao Pan, Adam Harley, Francis Engelmann et al.
PseudoMapTrainer: Learning Online Mapping without HD Maps
Christian Löwens, Thorben Funke, Jingchao Xie et al.
DIP: Unsupervised Dense In-Context Post-training of Visual Representations
Sophia Sirko-Galouchenko, Spyros Gidaris, Antonin Vobecky et al.
MAVias: Mitigate any Visual Bias
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos et al.
Evading Data Provenance in Deep Neural Networks
Hongyu Zhu, Sichu Liang, Wenwen Wang et al.
MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning
Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion et al.
MMOne: Representing Multiple Modalities in One Scene
Zhifeng Gu, Bing WANG
DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup
Zhen Qu, Xian Tao, Xinyi Gong et al.
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem, Piotr Teterwak, Kate Saenko et al.
Fast Globally Optimal and Geometrically Consistent 3D Shape Matching
Paul Roetzer, Florian Bernard
DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data
Junjie Wu, Jiangtao Xie, Zhaolin Zhang et al.
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
Weitian Wang, Shubham rai, Cecilia De la Parra et al.
Consensus-Driven Active Model Selection
Justin Kay, Grant Horn, Subhransu Maji et al.
SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation
Hao Ban, Gokul Ram Subramani, Kaiyi Ji
CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective
Zongheng Tang, Yi Liu, Yifan Sun et al.
CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Xiao Lin, Yun Peng, Liuyi Wang et al.
OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization
Saihui Hou, Panjian Huang, Zengbin Wang et al.
LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering
Xiaohang Zhan, Dingming Liu
Generalized Few-Shot Point Cloud Segmentation via LLM-Assisted Hyper-Relation Matching
Zhaoyang Li, Yuan Wang, Guoxin Xiong et al.
SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting
Haiyang Ying, Matthias Zwicker
SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion
Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.
PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection
Mahdiyar Molahasani, Azadeh Motamedi, Michael Greenspan et al.
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.
Selective Contrastive Learning for Weakly Supervised Affordance Grounding
WonJun Moon, Hyun Seok Seong, Jae-Pil Heo
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
Lei Tian, Xiaomin Li, Liqian Ma et al.
Generative Adversarial Diffusion
U-Chae Jun, Jaeeun Ko, Jiwoo Kang
Monocular Facial Appearance Capture in the Wild
Yingyan Xu, Kate Gadola, Prashanth Chandran et al.
Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
Shiming Chen, Bowen Duan, Salman Khan et al.
AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction
Bin Rao, Haicheng Liao, Yanchen Guan et al.
Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention
Jiawei Gu, Ziyue Qiao, Zechao Li
Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance
Jiaqi Jin, Siwei Wang, Zhibin Dong et al.
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
Zhengzhuo Xu, Sinan Du, Yiyan Qi et al.
Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval
Ziwei Wang, Sameera Ramasinghe, Chenchen Xu et al.
Improving Noise Efficiency in Privacy-preserving Dataset Distillation
Runkai Zheng, Vishnu Dasu, Yinong Wang et al.
D2ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei, Qizhong Tan, Guangming Lu et al.
DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
Yang JingYi, Xun Lin, Zitong YU et al.
Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
Bingqing Zhang, Zhuo Cao, Heming Du et al.
Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment
Kejia Zhang, Juanjuan Weng, Zhiming Luo et al.
Federated Domain Generalization with Domain-specific Soft Prompts Generation
Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.
Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training
Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.
Timestep-Aware Diffusion Model for Extreme Image Rescaling
Ce Wang, Zhenyu Hu, Wanjie Sun et al.
Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn, Selim Tekin, Fatih Ilhan et al.
Global Regulation and Excitation via Attention Tuning for Stereo Matching
Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.
Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis
Byung Hyun Lee, Wongi Jeong, Woojae Han et al.
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo, Shengkun Tang, Cong Zeng et al.
Cross-Architecture Distillation Made Simple with Redundancy Suppression
Weijia Zhang, Yuehao Liu, Wu Ran et al.
Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting
Hengyu Meng, Duotun Wang, Zhijing Shao et al.
Diffusion Image Prior
Hamadi Chihaoui, Paolo Favaro
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian, Zheng Lu, Mingqi Gao et al.
BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting
Zipei Ma, Junzhe Jiang, Yurui Chen et al.
Backdoor Attacks on Neural Networks via One-Bit Flip
Xiang Li, Lannan Luo, Qiang Zeng
GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin, Ruohan Gao
FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
Wenzhuang Wang, Yifan Zhao, Mingcan Ma et al.
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes
Mai Su, Zhongtao Wang, Huishan Au et al.
Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
Katie Luo, Minh-Quan Dao, Zhenzhen Liu et al.
LOTA: Bit-Planes Guided AI-Generated Image Detection
Renxi Cheng, Hongsong Wang, Yang Zhang et al.
PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening
Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk et al.
GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations
Yunqi Liu, Xiaohui Cui, Ouyang Xue
Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
Aniruddha Mahapatra, Long Mai, David Bourgin et al.
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim, Seokho Ahn, Young-Duk Seo
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu, Song-Li Wu, Sule Bai et al.
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
JIAHE ZHAO, RuiBing Hou, zejie tian et al.
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
PLA: Prompt Learning Attack against Text-to-Image Generative Models
XINQI LYU, Yihao LIU, Yanjie Li et al.
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen, Iro Armeni, Daniel Barath et al.
Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.
Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion
Haoyang Chen, Dongfang Sun, Caoyuan Ma et al.
Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining
Qi Fan, Kaiqi Liu, Nian Liu et al.
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Mainak Singha, Subhankar Roy, Sarthak Mehrotra et al.
Demeter: A Parametric Model of Crop Plant Morphology from the Real World
Tianhang Cheng, Albert Zhai, Evan Chen et al.
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
Revant Teotia, Candace Ross, Karen Ullrich et al.
MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction
Zijian Dong, Longteng Duan, Jie Song et al.
Improving Rectified Flow with Boundary Conditions
Xixi Hu, Runlong Liao, Bo Liu et al.
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Tao Han, Wanghan Xu, Junchao Gong et al.
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.
Stable Score Distillation
Haiming Zhu, Yangyang Xu, Chenshu Xu et al.
Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey, Tianyu Zhang, Shu Ishida et al.
Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts
Viet Nguyen, Anh Nguyen, Trung Dao et al.
MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
Yanchen Liu, Yanan SUN, Zhening Xing et al.
Denoising Token Prediction in Masked Autoregressive Models
Ting Yao, Yehao Li, Yingwei Pan et al.
Balanced Sharpness-Aware Minimization for Imbalanced Regression
Yahao Liu, Qin Wang, Lixin Duan et al.
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Xiangdong Zhang, Shaofeng Zhang, Junchi Yan
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin et al.
SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition
Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng, Junqiang Wu, Yan Li et al.
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Jaeseok Jeong, Junho Kim, Youngjung Uh et al.
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
Zeyi Sun, Tong Wu, Pan Zhang et al.
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.
SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning
XIN Hu, Ke Qin, Guiduo Duan et al.
CAFA: a Controllable Automatic Foley Artist
Roi Benita, Michael Finkelson, Tavi Halperin et al.
Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion
Enyu Liu, En Yu, Sijia Chen et al.
LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression
Wenjie Huang, Qi Yang, Shuting Xia et al.
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar et al.
Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber et al.
Trust but Verify: Programmatic VLM Evaluation in the Wild
Viraj Prabhu, Senthil Purushwalkam, An Yan et al.
SDMatte: Grafting Diffusion Models for Interactive Matting
Longfei Huang, Yu Liang, Hao Zhang et al.
Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
In Cho, Youngbeom Yoo, Subin Jeon et al.
Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs
Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
Zijun Lin, Shuting He, Cheston Tan et al.
Understanding Co-speech Gestures in-the-wild
Sindhu Hegde, K R Prajwal, Taein Kwon et al.
AnyPortal: Zero-Shot Consistent Video Background Replacement
Wenshuo Gao, Xicheng Lan, Shuai Yang
MatchDiffusion: Training-free Generation of Match-Cuts
Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
Anand Kumar, Jiteng Mu, Nuno Vasconcelos
FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift
yong zhang, Feng Liang, Guanghu Yuan et al.
SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation
Shiqi Huang, Shuting He, Huaiyuan Qin et al.
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
Seunggwan Lee, Hwanhee Jung, ByoungSoo Koh et al.
LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
Xunpeng Yi, yibing zhang, Xinyu Xiang et al.
An Inversion-based Measure of Memorization for Diffusion Models
Zhe Ma, Qingming Li, Xuhong Zhang et al.
Learning to See in the Extremely Dark
Hai Jiang, Binhao Guan, Zhen Liu et al.
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao, Li, Shreyank Gowda et al.
Aligning Moments in Time using Video Queries
Yogesh Kumar, Uday Agarwal, Manish Gupta et al.
ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints
Debasmit Das, Hyoungwoo Park, Munawar Hayat et al.
Correspondence-Free Fast and Robust Spherical Point Pattern Registration
Anik Sarker, Alan Asbeck
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning
Junjie Shan, Ziqi Zhao, Jialin Lu et al.
TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
Yan Xia, Yunxiang Lu, Rui Song et al.
ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction
Sankeerth Durvasula, Sharanshangar Muhunthan, Zain Moustafa et al.
Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Zhenbang Du, Yonggan Fu, Lifu Wang et al.
Diff2I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior
Juncheng Mu, Chengwei REN, Weixiang Zhang et al.
Removing Cost Volumes from Optical Flow Estimators
Simon Kiefhaber, Stefan Roth, Simone Schaub-Meyer
CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection
Zhixin Cheng, Jiacheng Deng, Xinjun Li et al.
D-Attn: Decomposed Attention for Large Vision-and-Language Model
Chia-Wen Kuo, Sijie Zhu, Fan Chen et al.
ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling
Rolandos Alexandros Potamias, Stathis Galanakis, Jiankang Deng et al.
VRM: Knowledge Distillation via Virtual Relation Matching
Weijia Zhang, Fei Xie, Weidong Cai et al.
Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
Jinsol Song, Jiamu Wang, Anh Nguyen et al.
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei, Jiajin Tang, Sibei Yang
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning
Zedong Wang, Siyuan Li, Dan Xu
Activation Subspaces for Out-of-Distribution Detection
Barış Zöngür, Robin Hesse, Stefan Roth
IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark
Zhe Cao, Jin Zhang, Ruiheng Zhang