Most Cited CVPR "cross-modal representation" Papers
5,589 papers found • Page 22 of 28
Conference
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
Zhipeng Huang, Wangbo Yu, Xinhua Cheng et al.
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan, Yibo Peng, Jinke Ren et al.
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning
Qianli Ma, Xuefei Ning, Dongrui Liu et al.
Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration
Yudong Mao, Hao Luo, Zhiwei Zhong et al.
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong, Junjun Jiang, Kui Jiang et al.
All Rivers Run to the Sea: Private Learning with Asymmetric Flows
Yue Niu, Ramy E. Ali, Saurav Prakash et al.
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
Haoyue Liu, Jinghan Xu, Yi Chang et al.
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie, Qile He, Youjia Zhu et al.
Image Referenced Sketch Colorization Based on Animation Creation Workflow
Dingkun Yan, Xinrui Wang, Zhuoru Li et al.
Effortless Active Labeling for Long-Term Test-Time Adaptation
Guowei Wang, Changxing Ding
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
Xinpeng Liu, Zeyi Huang, Fumio Okura et al.
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
Jinneyong Kim, Seung-Hwan Baek
Cheb-GR: Rethinking K-nearest Neighbor Search in Re-ranking for Person Re-identification
Jinxi Yang, He Li, Bo Du et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
Zhenxuan Fang, Fangfang Wu, Tao Huang et al.
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu, Yan Huang, Min Gao et al.
GPS as a Control Signal for Image Generation
Chao Feng, Ziyang Chen, Aleksander Holynski et al.
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
Dongnan Gui, Xun Guo, Wengang Zhou et al.
SceneCrafter: Controllable Multi-View Driving Scene Editing
Zehao Zhu, Yuliang Zou, Chiyu “Max” Jiang et al.
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
Yingying Deng, Xiangyu He, Fan Tang et al.
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
Daikun Liu, Lei Cheng, Teng Wang et al.
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
Yichong Lu, Yichi Cai, Shangzhan Zhang et al.
Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura et al.
BLADE: Single-view Body Mesh Estimation through Accurate Depth Estimation
Shengze Wang, Jiefeng Li, Tianye Li et al.
Integral Fast Fourier Color Constancy
Wenjun Wei, Yanlin Qian, Huaian Chen et al.
Coherent 3D Portrait Video Reconstruction via Triplane Fusion
Shengze Wang, Xueting Li, Chao Liu et al.
PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation
Uyoung Jeong, Jonathan Freer, Seungryul Baek et al.
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla et al.
GenVDM: Generating Vector Displacement Maps From a Single Image
Yuezhi Yang, Qimin Chen, Vladimir G. Kim et al.
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
Sizai Hou, Songze Li, Duanyi Yao
Rethinking Correspondence-based Category-Level Object Pose Estimation
Huan Ren, Wenfei Yang, Shifeng Zhang et al.
CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification
wenlong yu, Qilong Wang, Chuang Liu et al.
Track Any Anomalous Object:A Granular Video Anomaly Detection Pipeline
Yuzhi Huang, Chenxin Li, Haitao Zhang et al.
LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians
Jiamin WU, Kenkun Liu, Han Gao et al.
MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video
Hengyi Wang, Jingwen Wang, Lourdes Agapito
Generative Map Priors for Collaborative BEV Semantic Segmentation
Jiahui Fu, Yue Gong, Luting Wang et al.
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
Kefan Chen, Chaerin Min, Linguang Zhang et al.
HardMo: A Large-Scale Hardcase Dataset for Motion Capture
Jiaqi Liao, Chuanchen Luo, Yinuo Du et al.
Decision SpikeFormer: Spike-Driven Transformer for Decision Making
Wei Huang, Qinying Gu, Nanyang Ye
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao, Ziquan Liu, Yu Cao et al.
Generative Modeling of Class Probability for Multi-Modal Representation Learning
JungKyoo Shin, Bumsoo Kim, Eunwoo Kim
Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images
Tai Nguyen, Aref Azizpour, Matthew Stamm
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation
Tanner Schmidt, Richard Newcombe
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
Seokil Ham, Hee-Seon Kim, Sangmin Woo et al.
Category-Agnostic Neural Object Rigging
Guangzhao He, Chen Geng, Shangzhe Wu et al.
RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects
Jaeguk Kim, Jaewoo Park, Keuntek Lee et al.
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models
German Barquero, Nadine Bertsch, Manojkumar Marramreddy et al.
FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering
Jingqiu Zhou, Lue Fan, Linjiang Huang et al.
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han, Yuan Tang, Jinfeng Xu et al.
Compass Control: Multi Object Orientation Control for Text-to-Image Generation
Rishubh Parihar, Vaibhav Agrawal, Sachidanand VS et al.
From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers
Swaminathan Gurumurthy, Karnik Ram, Bingqing Chen et al.
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
Erik Wallin, Fredrik Kahl, Lars Hammarstrand
Zero-Shot Head Swapping in Real-World Scenarios
Sohyun Jeong, Taewoong Kang, Hyojin Jang et al.
EAP-GS: Efficient Augmentation of Pointcloud for 3D Gaussian Splatting in Few-shot Scene Reconstruction
Dongrui Dai, Yuxiang Xing
DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge
Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi et al.
RepAn: Enhanced Annealing through Re-parameterization
Xiang Fei, Xiawu Zheng, Yan Wang et al.
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning
Kun Zhang, Jingyu Li, Zhe Li et al.
Pre-training Vision Models with Mandelbulb Variations
Benjamin N. Chiche, Yuto Horikawa, Ryo Fujita
Controllable Human Image Generation with Personalized Multi-Garments
Yisol Choi, Sangkyung Kwak, Sihyun Yu et al.
CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation
Yuxing Long, Jiyao Zhang, Mingjie Pan et al.
PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution
Shuangfan Zhou, Chu Zhou, Youwei Lyu et al.
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
Tianming Liang, Chaolei Tan, Beihao Xia et al.
ViUniT: Visual Unit Tests for More Robust Visual Programming
Artemis Panagopoulou, Honglu Zhou, silvio savarese et al.
Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models
Jianlong Jin, Chenglong Zhao, Ruixin Zhang et al.
Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment
Guanglu Dong, Xiangyu Liao, Mingyang Li et al.
Dense Match Summarization for Faster Two-view Estimation
Jonathan Astermark, Anders Heyden, Viktor Larsson
From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation
Javier Tirado-Garín, Javier Civera
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning
Lei-Lei Ma, Shuo Xu, Ming-Kun Xie et al.
LOCORE: Image Re-ranking with Long-Context Sequence Modeling
Zilin Xiao, Pavel Suma, Ayush Sachdeva et al.
Sample- and Parameter-Efficient Auto-Regressive Image Models
Elad Amrani, Leonid Karlinsky, Alex M. Bronstein
Learning on Model Weights using Tree Experts
Eliahu Horwitz, Bar Cavia, Jonathan Kahana et al.
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
Cheng Lei, Ao Li, Hu Yao et al.
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
Yuxuan Wang, Aming Wu, Muli Yang et al.
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI
Won Jun Kim, Hyungjin Chung, Jaemin Kim et al.
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data
Zhiyuan Ma, Xinyue Liang, Rongyuan Wu et al.
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
Rohan Sarkar, Avinash Kak
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
Awais Nizamani, Hamid Laga, Guanjin Wang et al.
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Andrea Maracani, Savas Ozkan, Sijun Cho et al.
Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene
Tai-Yu Daniel Pan, Sooyoung Jeon, Mengdi Fan et al.
Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation
Rong Qin, Xingyu Liu, Jinglei Shi et al.
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
Zhengang Li, Yan Kang, Yuchen Liu et al.
Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva et al.
Boltzmann Attention Sampling for Image Analysis with Small Objects
Theodore Zhao, Sid Kiblawi, Mu Wei et al.
GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection
Dušan Malić, Christian Fruhwirth-Reisinger, Samuel Schulter et al.
COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation
Fanding Huang, Jingyan Jiang, Qinting Jiang et al.
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.
Few-shot Personalized Scanpath Prediction
Ruoyu Xue, Jingyi Xu, Sounak Mondal et al.
When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach
Vaibhav Rathore, Shubhranil B, Saikat Dutta et al.
ToNNO: Tomographic Reconstruction of a Neural Network’s Output for Weakly Supervised Segmentation of 3D Medical Images
Marius Schmidt-Mengin, Alexis Benichoux, Shibeshih Belachew et al.
Residual Learning in Diffusion Models
Junyu Zhang, Daochang Liu, Eunbyung Park et al.
STDD: Spatio-Temporal Dual Diffusion for Video Generation
Shuaizhen Yao, Xiaoya Zhang, Xin Liu et al.
Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception
Luke Chen, Junyao Wang, Trier Mortlock et al.
Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
Hua Yu, Weiming Liu, Gui Xu et al.
UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units
Huakun Liu, Hiroki Ota, Xin Wei et al.
Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.
Color Alignment in Diffusion
Ka Chun SHUM, Binh-Son Hua, Thanh Nguyen et al.
Enhancing Adversarial Transferability with Checkpoints of a Single Model’s Training
Shixin Li, Chaoxiang He, Xiaojing Ma et al.
Beyond Background Shift: Rethinking Instance Replay in Continual Semantic Segmentation
Hongmei Yin, Tingliang Feng, Fan Lyu et al.
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
Zhuoran ZHAO, Linlin Yang, Pengzhan Sun et al.
EntitySAM: Segment Everything in Video
Mingqiao Ye, Seoung Wug Oh, Lei Ke et al.
Identifying and Mitigating Spurious Correlation in Multi-Task Learning
Junyi Chai, Shenyu Lu, Xiaoqian Wang
GBC-Splat: Generalizable Gaussian-Based Clothed Human Digitalization under Sparse RGB Cameras
Hanzhang Tu, Zhanfeng Liao, Boyao Zhou et al.
Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries
Wei Xu, Charlie Wagner, Junjie Luo et al.
Learning Visual Composition through Improved Semantic Guidance
Austin Stone, Hagen Soltau, Robert Geirhos et al.
Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection
Zhuo Xu, Xiang Xiang, Yifan Liang
One-Step Event-Driven High-Speed Autofocus
Yuhan Bao, Shaohua Gao, Wenyong Li et al.
RDD: Robust Feature Detector and Descriptor using Deformable Transformer
Gonglin Chen, Tianwen Fu, Haiwei Chen et al.
Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
Tianyu Chen, Xingcheng Fu, Yisen Gao et al.
Test-Time Fine-Tuning of Image Compression Models for Multi-Task Adaptability
Unki Park, Seongmoon Jeong, Jang Youngchan et al.
CustAny: Customizing Anything from A Single Example
Lingjie Kong, Kai WU, Chengming Xu et al.
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
Ho-Joong Kim, Yearang Lee, Jung-Ho Hong et al.
Distinguish Then Exploit: Source-free Open Set Domain Adaptation via Weight Barcode Estimation and Sparse Label Assignment
Weiming Liu, Jun Dan, Fan Wang et al.
Deep Fair Multi-View Clustering with Attention KAN
HaiMing Xu, Qianqian Wang, Boyue Wang et al.
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo, Andrew Rouditchenko, Yuan Gong et al.
SerialGen: Personalized Image Generation by First Standardization Then Personalization
Cong Xie, Han Zou, Ruiqi Yu et al.
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
Souhail Hadgi, Luca Moschella, Andrea Santilli et al.
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Eitan Shaar, Ariel Shaulov, Gal Chechik et al.
Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion
Jongseong Bae, Junwoo Ha, Ha Young Kim
EZSR: Event-based Zero-Shot Recognition
Yan Yang, Liyuan Pan, Dongxu Li et al.
CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images
Jungho Lee, Suhwan Cho, Taeoh Kim et al.
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis
Tianyu Wang, Jianming Zhang, Haitian Zheng et al.
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
Haopeng Sun, Yingwei Zhang, Lumin Xu et al.
Understanding Multi-layered Transmission Matrices
Marina Alterman, Anat Levin
Insightful Instance Features for 3D Instance Segmentation
Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.
Augmenting Perceptual Super-Resolution via Image Quality Predictors
Fengjia Zhang, Samrudhdhi Rangrej, Tristan T Aumentado-Armstrong et al.
COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation
Arnav Mohanty Das, Gantavya Bhatt, Lilly Kumari et al.
Perceptual Video Compression with Neural Wrapping
Muhammad Umar Karim Khan, Aaron Chadha, Mohammad Ashraful Anam et al.
TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer
Jialun Liu, Jinbo Wu, Xiaobo Gao et al.
Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs
Sijie Wang, Rui She, Qiyu Kang et al.
Person De-reidentification: A Variation-guided Identity Shift Modeling
Yi-Xing Peng, Yu-Ming Tang, Kun-Yu Lin et al.
Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation
Tanuj Sur, Samrat Mukherjee, Kaizer Rahaman et al.
Maintaining Consistent Inter-Class Topology in Continual Test-Time Adaptation
Chenggong Ni, Fan Lyu, Jiayao Tan et al.
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao liang, Baoquan Zhang, Zhiyuan Wen et al.
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation
Qiao Yu, Xianzhi Li, Yuan Tang et al.
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun, Barath Lakshmanan, Maying Shen et al.
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
Yucheng Mao, Boyang Wang, Nilesh Kulkarni et al.
Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection
Yifan Chang, Junjie Huang, Xiaofeng Wang et al.
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim, YOUNGKIL SONG, Jicheol Park et al.
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
Zhuowei Li, Tianchen Zhao, Xiang Xu et al.
BOE-ViT: Boosting Orientation Estimation with Equivariance in Self-Supervised 3D Subtomogram Alignment
Runmin Jiang, Jackson Daggett, Shriya Pingulkar et al.
RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler
Xin Ding, Lei Yu, Xin Li et al.
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations
Jungin Park, Jiyoung Lee, Kwanghoon Sohn
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
Lei Wang, Senmao Li, Fei Yang et al.
Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
Xiaohong Zhang, Huisheng Ye, Jingwen Li et al.
Beyond Human Perception: Understanding Multi-Object World from Monocular View
Keyu Guo, Yongle Huang, Shijie Sun et al.
Hierarchical Flow Diffusion for Efficient Frame Interpolation
Yang Hai, Guo Wang, Tan Su et al.
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Tong Wang, Ting Liu, Xiaochao Qu et al.
Towards Human-Understandable Multi-Dimensional Concept Discovery
Arne Grobrügge, Niklas Kühl, Gerhard Satzger et al.
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual
Chong Wang, Lanqing Guo, Zixuan Fu et al.
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
Kunyu Wang, Xueyang Fu, Xin Lu et al.
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
Suzhen Wang, Weijie Chen, Wei Zhang et al.
HORP: Human-Object Relation Priors Guided HOI Detection
Pei Geng, Jian Yang, Shanshan Zhang
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies
Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki et al.
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues
Sihong Huang, Jiaxin Wu, Xiaoyong Wei et al.
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
Bangbang Zhou, Zuan Gao, Zixiao Wang et al.
Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning
Ye Li, Yanchao Zhao, chengcheng zhu et al.
Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation
Weichen Dai, wu hexing, xiaoyang weng et al.
Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor
Hao Yu, Xin Yang, Le Zhang et al.
Early-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training
Lexington Whalen, Zhenbang Du, Haoran You et al.
Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks
Yong Xie, Weijie Zheng, Hanxun Huang et al.
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.
V2V3D: View-to-View Denoised 3D Reconstruction for Light Field Microscopy
Jiayin Zhao, Zhenqi Fu, Tao Yu et al.
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Yuhan Wang, Fangzhou Hong, Shuai Yang et al.
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
Erjian Guo, Zhen Zhao, Zicheng Wang et al.
Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios
Hang Shao, lei luo, Jianjun Qian et al.
EntityErasure: Erasing Entity Cleanly via Amodal Entity Segmentation and Completion
Yixing Zhu, Qing Zhang, Yitong Wang et al.
Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes
Stefano Esposito, Anpei Chen, Christian Reiser et al.
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
Hongkai Lin, Dingkang Liang, Zhenghao Qi et al.
TSP-Mamba: The Travelling Salesman Problem Meets Mamba for Image Super-resolution and Beyond
Kun Zhou, Xinyu Lin, Jiangbo Lu
Adapting Dense Matching for Homography Estimation with Grid-based Acceleration
Kaining Zhang, Yuxin Deng, Jiayi Ma et al.
Auto-Encoded Supervision for Perceptual Image Super-Resolution
MinKyu Lee, Sangeek Hyun, Woojin Jun et al.
Multi-Group Proportional Representations for Text-to-Image Models
Sangwon Jung, Alex Oesterling, Claudio Mayrink Verdun et al.
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding
Rong Gao, Xin Liu, Zhuozhao Hu et al.
Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis
Tongtong Su, Chengyu Wang, Bingyan Liu et al.
Coherence As Texture – Passive Textureless 3D Reconstruction by Self-interference
Wei-Yu Chen, Aswin C. Sankaranarayanan, Anat Levin et al.
Improving Visual and Downstream Performance of Low-Light Enhancer with Vision Foundation Models Collaboration
yuxuan Gu, Huaian Chen, Yi Jin et al.
Diffusion-based Event Generation for High-Quality Image Deblurring
Xinan Xie, Qing Zhang, Wei-Shi Zheng
Observation-Guided Diffusion Probabilistic Models
Junoh Kang, Jinyoung Choi, Sungik Choi et al.
H2ST: Hierarchical Two-Sample Tests for Continual Out-of-Distribution Detection
Yuhang Liu, Wenjie Zhao, Yunhui Guo
LLM-driven Multimodal and Multi-Identity Listening Head Generation
Peiwen Lai, Weizhi Zhong, Yipeng Qin et al.
Consistency-aware Self-Training for Iterative-based Stereo Matching
Jingyi Zhou, Peng Ye, Haoyu Zhang et al.
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
Yinqiao Wang, Hao Xu, Pheng-Ann Heng et al.
Incomplete Multi-modal Brain Tumor Segmentation via Learnable Sorting State Space Model
Zheyu Zhang, Yayuan Lu, Feipeng Ma et al.
NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics
Kun Yang, Yuxiang Liu, Zeyu Cui et al.
Deep Change Monitoring: A Hyperbolic Representative Learning Framework and a Dataset for Long-term Fine-grained Tree Change Detection
Yante Li, Hanwen Qi, Haoyu Chen et al.
Where's the Liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Haoyue Bai, Yiyou Sun, Wei Cheng et al.
Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints
Yuhao Zhou, Yuxin Tian, Jindi Lv et al.
Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model
Changchang Sun, Gaowen Liu, Charles Fleming et al.
Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition
ZHANG LINTONG, Kang Yin, Seong-Whan Lee
Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention
Kyungmin Jo, Jooyeol Yun, Jaegul Choo
Progressive Correspondence Regenerator for Robust 3D Registration
Guiyu Zhao, Sheng Ao, Ye Zhang et al.
CLOAF: CoLlisiOn-Aware Human Flow
Andrey Davydov, Martin Engilberge, Mathieu Salzmann et al.
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Chang-Bin Zhang, Jinhong Ni, Yujie Zhong et al.