Most Cited CVPR "watermark removal attack" Papers
5,589 papers found • Page 15 of 28
Conference
TransPixeler: Advancing Text-to-Video Generation with Transparency
Luozhou Wang, Yijun Li, ZhiFei Chen et al.
Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon, Weipeng Xu, Rohan Joshi et al.
ProMotion: Prototypes As Motion Learners
Yawen Lu, Dongfang Liu, Qifan Wang et al.
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
Cheng Zhang, Haofei Xu, Qianyi Wu et al.
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
Jinlu Zhang, Yixin Chen, Zan Wang et al.
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli et al.
2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images
Junkai Deng, Fei Hou, Xuhui Chen et al.
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen, Nina Shvetsova, Andrew Rouditchenko et al.
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
Event-based Structure-from-Orbit
Ethan Elms, Yasir Latif, Tae Ha Park et al.
OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang et al.
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
Tao Hu, Fangzhou Hong, Ziwei Liu
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan, Lei Luo, Zhuo Hui et al.
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng, Joy Hsu, Weiyu Liu et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
DiffFNO: Diffusion Fourier Neural Operator
Xiaoyi Liu, Hao Tang
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham, Matthew Fisher, James Hays et al.
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
High-Quality Facial Geometry and Appearance Capture at Home
Yuxuan Han, Junfeng Lyu, Feng Xu
BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov, Shrisudhan Govindarajan, Cody Reading et al.
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang, Yang Liu
Purified and Unified Steganographic Network
GuoBiao Li, Sheng Li, Zicong Luo et al.
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng, Xiaoyong Chen, Jiaming Liang et al.
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman, Stefan Lee, Scott Cohen et al.
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren et al.
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.
Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi, Keita Takahashi, Chihiro Tsutake et al.
Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang, Quanmin Liang, Yijie Yu et al.
Towards Autonomous Micromobility through Scalable Urban Simulation
Wayne Wu, Honglin He, Chaoyuan Zhang et al.
A General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening
Jie Huang, Haorui Chen, Jiaxuan Ren et al.
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
Hyperbolic Category Discovery
Yuanpei Liu, Zhenqi He, Kai Han
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
Chongjian GE, Chenfeng Xu, Yuanfeng Ji et al.
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure, Sahithya Ravi, Raymond Ng et al.
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.
Semantics-aware Motion Retargeting with Vision-Language Models
Haodong Zhang, ZhiKe Chen, Haocheng Xu et al.
PICO: Reconstructing 3D People In Contact with Objects
Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.
Hierarchical Correlation Clustering and Tree Preserving Embedding
Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani
Memory-Scalable and Simplified Functional Map Learning
Robin Magnet, Maks Ovsjanikov
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian, Zhongliang Guo, Bowen Deng et al.
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang, Guanlong Jiao, Yan Di et al.
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
Ling-An Zeng, Guohong Huang, Yi-Lin Wei et al.
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers et al.
Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation
Hadi Alzayer, Philipp Henzler, Jonathan T. Barron et al.
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko, Nils Wandel, Reinhard Klein
Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views
Ziwei Zhao, Yuchen Wang, Chuhua Wang
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
Zihao Wang, Yuxiang Wei, Fan Li et al.
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu, Huan Zhang
ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping
Youxin Pang, Ruizhi Shao, Jiajun Zhang et al.
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Yabiao Wang, Shuo Wang, Jiangning Zhang et al.
Relation Rectification in Diffusion Model
Yinwei Wu, Xingyi Yang, Xinchao Wang
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
Yang Qin, Chao Chen, Zhihang Fu et al.
Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients
Li Lun, Kunyu Feng, Qinglong Ni et al.
Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes
Ziqian Bai, Feitong Tan, Sean Fanello et al.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling
Guillem Capellera, Antonio Rubio, Luis Ferraz et al.
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang et al.
Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution
Hongjun Wang, Jiyuan Chen, Yinqiang Zheng et al.
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
Zhengxue Wang, Zhiqiang Yan, Jinshan Pan et al.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
Yiren Lu, Yunlai Zhou, Disheng Liu et al.
Resolution Limit of Single-Photon LiDAR
Stanley H. Chan, Hashan K Weerasooriya, Weijian Zhang et al.
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform
Toan Nguyen, Kien Do, Duc Kieu et al.
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
Valentin Gabeff, Haozhe Qi, Brendan Flaherty et al.
UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang et al.
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai, HangChen, Jun Du et al.
Meta-Point Learning and Refining for Category-Agnostic Pose Estimation
Junjie Chen, Jiebin Yan, Yuming Fang et al.
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Yuji Wang, Haoran Xu, Yong Liu et al.
Motion Diversification Networks
Hee Jae Kim, Eshed Ohn-Bar
DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
Miguel Fainstein, Viviana Siless, Emmanuel Iarussi
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang, Tongkun Guan, Pei Fu et al.
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
Fangyin Wei, Hanlin Chen, Gim Hee Lee
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li et al.
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang, Yuehuai LIU, Yu-Wing Tai et al.
PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding
Hongjia Zhai, Hai Li, Zhenzhe Li et al.
Quaffure: Real-Time Quasi-Static Neural Hair Simulation
Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov et al.
DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
Maregu Assefa, Muzammal Naseer, IYYAKUTTI IYAPPAN GANAPATHI et al.
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views
Jiang Wu, Rui Li, Yu Zhu et al.
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo, Dan Casas, Elena Garces et al.
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
Sebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Cheeun Hong, Kyoung Mu Lee
Generative Zero-Shot Composed Image Retrieval
Lan Wang, Wei Ao, Vishnu Naresh Boddeti et al.
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
Jiangpeng He, Zhihao Duan, Fengqing Zhu
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
Miroslav Purkrábek, Jiri Matas
Total Selfie: Generating Full-Body Selfies
Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
The Art of Deception: Color Visual Illusions and Diffusion Models
Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga et al.
Generating Multimodal Driving Scenes via Next-Scene Prediction
Yanhao Wu, Haoyang Zhang, Tianwei Lin et al.
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.
Instruction-based Image Manipulation by Watching How Things Move
Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng et al.
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang, Zhan Tong, Kecheng Zheng et al.
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu, Yuxin Peng
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
Sen Wang, Le Wang, Sanping Zhou et al.
Enhancing Creative Generation on Stable Diffusion-based Models
Jiyeon Han, Dahee Kwon, Gayoung Lee et al.
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui, Xuangeng Chu, Tatsuya Harada
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand, Peiran Yu, Shangqian Gao et al.
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Lizhe Liu, Bohua Wang, Hongwei Xie et al.
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge, Jixiang Luo, XINJIE ZHANG et al.
Restoration by Generation with Constrained Priors
Zheng Ding, Xuaner Zhang, Zhuowen Tu et al.
Interactive Medical Image Analysis with Concept-based Similarity Reasoning
Ta Duc Huy, Sen Kim Tran, Phan Nguyen et al.
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang, Molin Huang, Yuxuan Zhou et al.
Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning
Xialei Liu, Jiang-Tian Zhai, Andrew Bagdanov et al.
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian, Jing Han, Chengcheng Wang et al.
Cross-spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker, Stefanie Walz, Mario Bijelic et al.
BHViT: Binarized Hybrid Vision Transformer
Tian Gao, Yu Zhang, Zhiyuan Zhang et al.
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina, Massimiliano Mancini, Elia Cunegatti et al.
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Run He, Kai Tong, Di Fang et al.
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang et al.
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.
What Sketch Explainability Really Means for Downstream Tasks?
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia et al.
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf
Yanis Benidir, Nicolas Gonthier, Clement Mallet
Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field
Yuanzhen Li, Fei LUO, Chunxia Xiao
Optical-Flow Guided Prompt Optimization for Coherent Video Generation
Hyelin Nam, Jaemin Kim, Dohun Lee et al.
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living
Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha et al.
Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments
Yinhua Piao, Sangseon Lee, Yijingxiu Lu et al.
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao, Amy Lin, Jeff Tan et al.
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
Jaewoo Jeong, Seohee Lee, Daehee Park et al.
Learning from Synthetic Human Group Activities
Che-Jui Chang, Danrui Li, Deep Patel et al.
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Bolin Lai, Felix Juefei-Xu, Miao Liu et al.
Towards Generalizable Scene Change Detection
Jae-Woo KIM, Ue-Hwan Kim
Boost Your Human Image Generation Model via Direct Preference Optimization
Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Feiran Li, Haiyang Jiang, Daisuke Iso
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
Jiayi Gao, Zijin Yin, Changcheng Hua et al.
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park, Jeongyeon Hwang, Sunung Mun et al.
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu et al.
WaveMo: Learning Wavefront Modulations to See Through Scattering
Mingyang Xie, Haiyun Guo, Brandon Y. Feng et al.
Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim, Dongha Lee, SeongKu Kang et al.
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu, Zhikai Li, Qingyi Gu
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu, Jieke Wang, Meng Tang
An N-Point Linear Solver for Line and Motion Estimation with Event Cameras
Ling Gao, Daniel Gehrig, Hang Su et al.
BrainWash: A Poisoning Attack to Forget in Continual Learning
Ali Abbasi, Parsa Nooralinejad, Hamed Pirsiavash et al.
Cross-Dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu, Miaomiao Cai, Yinda Chen et al.
Activity-Biometrics: Person Identification from Daily Activities
Shehreen Azad, Yogesh S. Rawat
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation
Yeonguk Yu, Sungho Shin, Seunghyeok Back et al.
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios
Ziming Huang, Xurui Li, Haotian Liu et al.
PERSE: Personalized 3D Generative Avatars from A Single Portrait
Hyunsoo Cha, Inhee Lee, Hanbyul Joo
From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian, Ruize Han, Wei Feng et al.
Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang, Qiao Feng, Zhuo Su et al.
Automatic Controllable Colorization via Imagination
Xiaoyan Cong, Yue Wu, Qifeng Chen et al.
HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal, Stefano Alletto, Denis Tome
FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models
Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
Xinghui Li, Qichao Sun, Pengze Zhang et al.
Multirate Neural Image Compression with Adaptive Lattice Vector Quantization
Hao Xu, Xiaolin Wu, Xi Zhang
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Chun-Hung Wu, Shih-Hong Chen, Chih Yao Hu et al.
Diversity-aware Channel Pruning for StyleGAN Compression
Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim et al.
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim, Jungwon Park, Wonjong Rhee
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization
You Shen, Zhipeng Zhang, Xinyang Li et al.
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.
EventFly: Event Camera Perception from Ground to the Sky
Lingdong Kong, Dongyue Lu, Xiang Xu et al.
Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang, David Yunis, Michael Maire
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou, Chao Yang, Yu Qiao et al.
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen, Lingting Zhu, Zeyu HU et al.
NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Ziyi Chen, Xiaolong Wu, Yu Zhang
Spectral Informed Mamba for Robust Point Cloud Processing
Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori et al.
Focusing on Tracks for Online Multi-Object Tracking
Kyujin Shim, Kangwook Ko, YuJin Yang et al.
Linear Attention Modeling for Learned Image Compression
Donghui Feng, Zhengxue Cheng, Shen Wang et al.
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda et al.
ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation
Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis et al.
ARM: Appearance Reconstruction Model for Relightable 3D Generation
Xiang Feng, Chang Yu, Zoubin Bi et al.
Learning to Control Camera Exposure via Reinforcement Learning
Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
Li Ren, Chen Chen, Liqiang Wang et al.
Relational Matching for Weakly Semi-Supervised Oriented Object Detection
Wenhao Wu, Hau San Wong, Si Wu et al.
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu, Zhixin Shu, Cameron Smith et al.
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Joey Wilson, Marcelino M. de Almeida, Sachit Mahajan et al.
FedUV: Uniformity and Variance for Heterogeneous Federated Learning
Ha Min Son, Moon-Hyun Kim, Tai-Myoung Chung et al.
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
Zilong Huang, Jun He, Junyan Ye et al.
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
Jing Li, Yihang Fu, Falai Chen
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li, Zhangyang Gao, Cheng Tan et al.
Monocular and Generalizable Gaussian Talking Head Animation
Shengjie Gong, Haojie Li, Jiapeng Tang et al.
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Yifei Zhang, Chang Liu, Jin Wei et al.
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang, Yanhong Zeng, Ke Fan et al.
Effective SAM Combination for Open-Vocabulary Semantic Segmentation
Minhyeok Lee, Suhwan Cho, Jungho Lee et al.
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation
Aishik Konwer, Zhijian Yang, Erhan Bas et al.
Deep Imbalanced Regression via Hierarchical Classification Adjustment
Haipeng Xiong, Angela Yao
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li, Cristiano Saltori, Fabio Poiesi et al.
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables
Sidi Yang, Binxiao Huang, Yulun Zhang et al.
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana et al.
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang, Tianyi Gao, Haoran Xu et al.
Cross-modal Causal Relation Alignment for Video Question Grounding
weixing chen, Yang Liu, Binglin Chen et al.
Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
Jinyoung Jun, Jae-Han Lee, Chang-Su Kim
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu, Keyu Fan, Weihao Yu et al.
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li, Qianli Shen, Kenji Kawaguchi
Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Ning Gao, Yilun Chen, Shuai Yang et al.
StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
Yining Shi, Kun JIANG, Ke Wang et al.