Most Cited CVPR "cached activations" Papers
5,589 papers found • Page 17 of 28
Conference
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu, Tianbin, Quanfeng Lu et al.
S²MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering
Zhen Long, Qiyuan Wang, Yazhou Ren et al.
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang, Yeliduosi Xiaokaiti, Yujia Liu et al.
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu, Wenjie Wang, Yongqi Li et al.
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang, Francis Williams, Žan Gojčič et al.
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang, Saeed Khorram, Li Fuxin
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang, Zhang Zhang, Qiang Wu et al.
Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection
Chen Chen, Jiahao Qi, Xingyue Liu et al.
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
Deng Li, Aming Wu, Yaowei Wang et al.
MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction
Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su et al.
MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
Nicolás Ugrinovic, Boxiao Pan, Georgios Pavlakos et al.
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao, Yuxuan Hu, Letian Wang et al.
CAGE: Controllable Articulation GEneration
Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi Amiri et al.
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu, Tao Hu, Ziwei Liu
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
Jiehong Lin, lihua liu, Dekun Lu et al.
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems
Haoquan Zhang, Ronggang Huang, Yi Xie et al.
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li, Wei Huang, Jingya Wang et al.
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing, Jean Ponce, Cordelia Schmid
Multi-agent Collaborative Perception via Motion-aware Robust Communication Network
Shixin Hong, Yu LIU, Zhi Li et al.
Rapid Motor Adaptation for Robotic Manipulator Arms
Yichao Liang, Kevin Ellis, João F. Henriques
LiSA: LiDAR Localization with Semantic Awareness
Bochun Yang, Zijun Li, Wen Li et al.
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan, Zechen Bai, Tianjun Xiao et al.
C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video
Hyunjik Kim, Matthias Bauer, Lucas Theis et al.
AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
Fan Yang, Tianyi Chen, XIAOSHENG HE et al.
iToF-flow-based High Frame Rate Depth Imaging
Yu Meng, Zhou Xue, Xu Chang et al.
Rethinking Human Motion Prediction with Symplectic Integral
Haipeng Chen, Kedi L yu, Zhenguang Liu et al.
DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
Clara Maria Fernandez Labrador, Mertcan Akcay, Eitan Abecassis et al.
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
Chen Liu, Peike Li, Qingtao Yu et al.
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu, Xintao Lv, Yichao Yan et al.
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
Jiawen Zhu, Choubo Ding, Yu Tian et al.
IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM
Minghao Yin, Shangzhe Wu, Kai Han
Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement
Kangmin Xu, Liang Liao, Jing Xiao et al.
Towards Realistic Scene Generation with LiDAR Diffusion Models
Haoxi Ran, Vitor Guizilini, Yue Wang
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu, Yingwei Pan, Yehao Li et al.
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.
Latency Correction for Event-guided Deblurring and Frame Interpolation
Yixin Yang, Jinxiu Liang, Bohan Yu et al.
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.
Adversarial Distillation Based on Slack Matching and Attribution Region Alignment
Shenglin Yin, Zhen Xiao, Mingxuan Song et al.
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu, Ce Zheng, Ming Qian et al.
AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
Sixing Yan, William K. Cheung, Ivor Tsang et al.
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation
Yanzhe Liu, Rong Chen, Yushi Li et al.
Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
Kewei Wang, Yizheng Wu, Jun Cen et al.
MMA: Multi-Modal Adapter for Vision-Language Models
Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang et al.
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment
Zheren Fu, Lei Zhang, Hou Xia et al.
A Category Agnostic Model for Visual Rearrangment
Yuyi Liu, Xinhang Song, Weijie Li et al.
Towards Progressive Multi-Frequency Representation for Image Warping
Jun Xiao, Zihang Lyu, Cong Zhang et al.
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Jiaqi Lin, Zhihao Li, Xiao Tang et al.
OTE: Exploring Accurate Scene Text Recognition Using One Token
Jianjun Xu, Yuxin Wang, Hongtao Xie et al.
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses
Chen Zhao, Tong Zhang, Zheng Dang et al.
TULIP: Multi-camera 3D Precision Assessment of Parkinson’s Disease
Kyungdo Kim, Sihan Lyu, Sneha Mantri et al.
MFP: Making Full Use of Probability Maps for Interactive Image Segmentation
Chaewon Lee, Seon-Ho Lee, Chang-Su Kim
When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
Xiaoming Li, Xinyu Hou, Chen Change Loy
Low-power Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang, Youngki Lee
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One
Mike Ranzinger, Greg Heinrich, Jan Kautz et al.
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran, Qianyu Zhou, Xiangtai Li et al.
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
Gabriele Berton, Alex Stoken, Barbara Caputo et al.
PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.
Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model
Tian Liang, Jing Huang, Ming Kong et al.
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge et al.
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
Hao Jiang, Bingfeng Zhou, Yadong Mu
ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation
Yan Di, Chenyangguang Zhang, Chaowei Wang et al.
Error Detection in Egocentric Procedural Task Videos
Shih-Po Lee, Zijia Lu, Zekun Zhang et al.
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding, Canyu Zhao, Wen Wang et al.
Generative Unlearning for Any Identity
Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu, Tong Zhang, Wei Ke et al.
Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds
Heejoon Moon, Chunghwan Lee, Je Hyeong Hong
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK, Zi Li, Yunhao Bai et al.
PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization
Yanlu Cai, Weizhong Zhang, Yuan Wu et al.
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij
LoS: Local Structure-Guided Stereo Matching
Kunhong Li, Longguang Wang, Ye Zhang et al.
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
Oded Bialer, Yuval Haitman
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.
Differentiable Micro-Mesh Construction
Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu et al.
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang, Dongliang Cao, Florian Bernard
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
Jiawei Yao, Qi Qian, Juhua Hu
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao, Yan-Pei Cao, Kai Han et al.
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
Samy Tafasca, Anshul Gupta, Jean-marc Odobez
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang et al.
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan et al.
MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
Zhicheng Zhang, Pancheng Zhao, Eunil Park et al.
CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.
CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Yuxin Guo, Siyang Sun, Shuailei Ma et al.
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao, Zhuoyan Luo, Yong Liu et al.
Point Transformer V3: Simpler Faster Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang et al.
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang, Zhiding Yu, Christopher Choy et al.
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice et al.
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim, Kejie Li, Xueqing Deng et al.
EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
Rui Jiang, Fangwen Tu, Yixuan Long et al.
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano, Federico Magistri, Lucas Nunes et al.
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba, Matthew Walter, Norimichi Ukita
MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction
Eric-Tuan Le, Antonios Kakolyris, Petros Koutras et al.
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong, Ningtao Mao, He Wang
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
Jiasen Lu, Christopher Clark, Sangho Lee et al.
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI, Baolu Li, Zhengzhong Tu et al.
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov et al.
Unbiased Estimator for Distorted Conics in Camera Calibration
Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon et al.
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng et al.
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Xun Wang et al.
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma, Guanchao Qiao, Yian Liu et al.
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang, Shiyan Chen, Yajing Zheng et al.
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik, Adam Kortylewski, Alan L. Yuille
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction
Mi-Gyeong Gwon, Gi-Mun Um, Won-Sik Cheong et al.
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim, Kyungdon Joo
UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim et al.
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
Yunan Zeng, Yan Huang, Jinjin Zhang et al.
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen, Yingyi Zhang, Siming Huang et al.
Multi-Attribute Interactions Matter for 3D Visual Grounding
Can Xu, Yuehui Han, Rui Xu et al.
UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
Shuaibo Li, Wei Ma, Jianwei Guo et al.
Towards Better Vision-Inspired Vision-Language Models
Yun-Hao Cao, Kaixiang Ji, Ziyuan Huang et al.
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin, Yingke Chen, Dezhong Peng et al.
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
Yuhang He, YingJie Chen, Yuhan Jin et al.
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella, Willi Menapace, Massimiliano Mancini et al.
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu, Xintao Wang, Yixiao Ge et al.
Weakly Supervised Video Individual Counting
Xinyan Liu, Guorong Li, Yuankai Qi et al.
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai, Shangchen Zhou, Blake Li et al.
Preserving Fairness Generalization in Deepfake Detection
Li Lin, Xinan He, Yan Ju et al.
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang, Hui Chen, Zijia Lin et al.
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
Xingqian Xu, Jiayi Guo, Zhangyang Wang et al.
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada, David Towers, Matthew Forshaw et al.
Referring Expression Counting
Siyang Dai, Jun Liu, Ngai-Man Cheung
R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization
Kennard Chan, Fayao Liu, Guosheng Lin et al.
ControlRoom3D: Room Generation using Semantic Proxy Rooms
Jonas Schult, Sam Tsai, Lukas Höllein et al.
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi
Class Incremental Learning with Multi-Teacher Distillation
Haitao Wen, Lili Pan, Yu Dai et al.
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
Nirat Saini, Khoi Pham, Abhinav Shrivastava
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.
Learning Group Activity Features Through Person Attribute Prediction
Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita
FreeU: Free Lunch in Diffusion U-Net
Chenyang Si, Ziqi Huang, Yuming Jiang et al.
AnyScene: Customized Image Synthesis with Composited Foreground
Ruidong Chen, Lanjun Wang, Weizhi Nie et al.
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim, Jung-Ho Hong, Heejo Kong et al.
NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning
Mustafa B Gurbuz, Jean Moorman, Constantine Dovrolis
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu, Mayuna Gupta, Chetan Madan et al.
Noisy One-point Homographies are Surprisingly Good
Yaqing Ding, Jonathan Astermark, Magnus Oskarsson et al.
SUGAR: Pre-training 3D Visual Representations for Robotics
Shizhe Chen, Ricardo Garcia Pinel, Ivan Laptev et al.
Self-Calibrating Vicinal Risk Minimisation for Model Calibration
Jiawei Liu, Changkun Ye, Ruikai Cui et al.
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix et al.
Device-Wise Federated Network Pruning
Shangqian Gao, Junyi Li, Zeyu Zhang et al.
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit et al.
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
Yuhan Shen, Ehsan Elhamifar
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang, Guanlong Jiao, Yan Di et al.
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana et al.
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu, Xia Hu, Yaqing Wang et al.
Low-Latency Neural Stereo Streaming
Qiqi Hou, Farzad Farhadzadeh, Amir Said et al.
Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning
Ziming Hong, Li Shen, Tongliang Liu
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Yifan Bai, Zeyang Zhao, Yihong Gong et al.
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko, Feiyang Kang, Weiyan Shi et al.
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.
Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training
Shizhan Gong, Qi Dou, Farzan Farnia
Revisiting Global Translation Estimation with Feature Tracks
Peilin Tao, Hainan Cui, Mengqi Rong et al.
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong, Bin Chen, Xiulong Liu et al.
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin, Shuai Wang, RIZHAO CAI et al.
LAMP: Learn A Motion Pattern for Few-Shot Video Generation
Rui-Qi Wu, Liangyu Chen, Tong Yang et al.
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren, Zhicheng Huang, Yunchao Wei et al.
Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
Daniel Rebain, Soroosh Yazdani, Kwang Moo Yi et al.
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi, Jiahao Pan, Peng Li et al.
Global and Hierarchical Geometry Consistency Priors for Few-shot NeRFs in Indoor Scenes
Xiaotian Sun, Qingshan Xu, Xinjie Yang et al.
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang et al.
The STVchrono Dataset: Towards Continuous Change Recognition in Time
Yanjun Sun, Yue Qiu, Mariia Khan et al.
Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection
Ke Li, Di Wang, Zhangyuan Hu et al.
Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation
zhuangzhuang chen, Zhuonan Lai, Jie Chen et al.
Pixel-Aligned Language Model
Jiarui Xu, Xingyi Zhou, Shen Yan et al.
2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images
Junkai Deng, Fei Hou, Xuhui Chen et al.
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller et al.
CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video Editing
Guiwei Zhang, Tianyu Zhang, Guanglin Niu et al.
DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation
Yuanchen Wu, Xichen Ye, KequanYang et al.
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao, Zhan Tong, Kevin Qinghong Lin et al.
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang, Qi Chen, Libo Sun et al.
Domain Separation Graph Neural Networks for Saliency Object Ranking
Zijian Wu, Jun Lu, Jing Han et al.
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim, Kyle Min, Maitreya Patel et al.
Language-aware Visual Semantic Distillation for Video Question Answering
Bo Zou, Chao Yang, Yu Qiao et al.
Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
Yiwei Yang, Anthony Liu, Robert Wolfe et al.
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah, Matthew Chan, Haoming Cai et al.
Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
Hong Chen, Pei Yan, sihe xiang et al.
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
FeiFan Xu, Rui Li, Si Wu et al.
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
Tom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.
CurveCloudNet: Processing Point Clouds with 1D Structure
Colton Stearns, Alex Fu, Jiateng Liu et al.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng, Chenhao Lin, Jiahao Sun et al.
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
Bin Fu, Fanghua Yu, Anran Liu et al.
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Ryan Burgert, Brian Price, Jason Kuen et al.
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw, Seunghyun Yoon, Fabian Caba Heilbron et al.
Video Recognition in Portrait Mode
Mingfei Han, Linjie Yang, Xiaojie Jin et al.
Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
飞 叶, Adrian Bors