Most Cited CVPR "feature partitioning" Papers
5,589 papers found • Page 19 of 28
Conference
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
Cheng Zhang, Haofei Xu, Qianyi Wu et al.
Sheared Backpropagation for Fine-tuning Foundation Models
Zhiyuan Yu, Li Shen, Liang Ding et al.
On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.
GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction
Jinguang Tong, Xuesong li, Fahira Afzal Maken et al.
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta, Srijan Das, Jacob Nielsen et al.
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition
Zimo Wang, Cheng Wang, Taiki Yoshino et al.
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
chenkai zhang, Yiming Lei, Zeming Liu et al.
Plug and Play Active Learning for Object Detection
Chenhongyi Yang, Lichao Huang, Elliot Crowley
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.
CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada, Kanta Kaneda, Daichi Saito et al.
XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
Guangyu Wang, Jinzhi Zhang, Fan Wang et al.
Differentiable Micro-Mesh Construction
Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang, Yang Fu, Zheng Ding et al.
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu, Jinhua Hao, Yukang Ding et al.
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu et al.
Learning from Synthetic Human Group Activities
Che-Jui Chang, Danrui Li, Deep Patel et al.
Can’t Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo et al.
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang, Dongliang Cao, Florian Bernard
Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia, Chih-Hao Lin, Wei-Chiu Ma et al.
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection
Jiangyi Wang, Na Zhao
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
Jiawei Yao, Qi Qian, Juhua Hu
Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Hanyang Chi, Jian Pang, Bingfeng Zhang et al.
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao, Yan-Pei Cao, Kai Han et al.
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen, Dongcheng Zhao, Tenglong Li et al.
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
Zeyuan Yang, LIU JIAGENG, Peihao Chen et al.
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
Samy Tafasca, Anshul Gupta, Jean-marc Odobez
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang et al.
Dynamic Support Information Mining for Category-Agnostic Pose Estimation
Pengfei Ren, Yuanyuan Gao, Haifeng Sun et al.
Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan et al.
MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
Zhicheng Zhang, Pancheng Zhao, Eunil Park et al.
Number it: Temporal Grounding Videos like Flipping Manga
Yongliang Wu, Xinting Hu, Yuyang Sun et al.
Autoregressive Sequential Pretraining for Visual Tracking
Shiyi Liang, Yifan Bai, Yihong Gong et al.
A Selective Re-learning Mechanism for Hyperspectral Fusion Imaging
Yuanye Liu, jinyang liu, Renwei Dian et al.
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
Zhiwei Ling, Yachen Chang, Hailiang Zhao et al.
CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.
Neural Clustering based Visual Representation Learning
Guikun Chen, Xia Li, Yi Yang et al.
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.
CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Yuxin Guo, Siyang Sun, Shuailei Ma et al.
Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning
yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu et al.
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao, Zhuoyan Luo, Yong Liu et al.
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip H.S. Torr et al.
VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
Leyuan Liu, Yuhan Li, Yunqi Gao et al.
Minority-Focused Text-to-Image Generation via Prompt Optimization
Soobin Um, Jong Chul Ye
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Yuxuan Wang, Yueqian Wang, Bo Chen et al.
Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline
Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.
Point Transformer V3: Simpler Faster Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang et al.
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang, Zhiding Yu, Christopher Choy et al.
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu, Rui Liu, Bolun Zheng et al.
Mamba-Reg: Vision Mamba Also Needs Registers
Feng Wang, Jiahao Wang, Sucheng Ren et al.
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha, Woo-Young Kang, Jonghwan Mun et al.
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Hao Yin, Guangzong Si, Zilei Wang
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
Hao Cheng, Erjia Xiao, Jiayan Yang et al.
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection
Fuyun Wang, Tong Zhang, Yuanzhi Wang et al.
SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim, Kejie Li, Xueqing Deng et al.
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haochen Han, Qinghua Zheng, Guang Dai et al.
Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition
ZHANG LINTONG, Kang Yin, Seong-Whan Lee
EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
Rui Jiang, Fangwen Tu, Yixuan Long et al.
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano, Federico Magistri, Lucas Nunes et al.
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Yu, Jie Huang, Li et al.
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba, Matthew Walter, Norimichi Ukita
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li, Songyang Zhang, Dahua Lin et al.
RAEncoder: A Label-Free Reversible Adversarial Examples Encoder for Dataset Intellectual Property Protection
Fan Xing, Zhuo Tian, Xuefeng Fan et al.
MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction
Eric-Tuan Le, Antonios Kakolyris, Petros Koutras et al.
Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights
Ondrej Tybl, Lukas Neumann
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong, Ningtao Mao, He Wang
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
Xiaolong Deng, Huisi Wu, Runhao Zeng et al.
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang, Chaojie Mao, Yulin Pan et al.
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao et al.
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
Jiasen Lu, Christopher Clark, Sangho Lee et al.
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv, Hong Chen, Jinyang Guo et al.
Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection
Ji Du, Fangwei Hao, Mingyang Yu et al.
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
Zhengqi Li, Richard Tucker, Forrester Cole et al.
MAD: Memory-Augmented Detection of 3D Objects
Ben Agro, Sergio Casas, Patrick Wang et al.
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI, Baolu Li, Zhengzhong Tu et al.
Dynamic Pseudo Labeling via Gradient Cutting for High-Low Entropy Exploration
Jae Hyeon Park, Joo Hyeon Jeon, Jae Yun Lee et al.
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs
Zicheng Zhang, Ziheng Jia, Haoning Wu et al.
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai, Addison, Lin Wang
Learning Triangular Distribution in Visual World
Ping Chen, Xingpeng Zhang, Chengtao Zhou et al.
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov et al.
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar, Kshitij Madhav Bhat, Vedang Bhupesh Shenvi Nadkarni et al.
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering
Yuanhao Zou, Zhaozheng Yin
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang, Ruizhi Shao, Qi Zhang et al.
Unbiased Estimator for Distorted Conics in Camera Calibration
Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon et al.
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng et al.
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing, Mai Xu, Shengxi Li et al.
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
Kunyu Wang, Xueyang Fu, Xin Lu et al.
TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
Zhiyuan Ren, Minchul Kim, Feng Liu et al.
Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video
Hoang Chuong Nguyen, Wei Mao, Jose M. Alvarez et al.
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Kadoglou et al.
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Xun Wang et al.
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Linfeng Yuan, Miaojing Shi, Zijie Yue et al.
PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface Reconstruction
Mingzhi Pei, Xu Cao, Xiangyi Wang et al.
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han, Siyuan Li, Jiaqi Chen et al.
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye et al.
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
Mengqiu XU, Kaixin Chen, Heng Guo et al.
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen et al.
COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation
Arnav Mohanty Das, Gantavya Bhatt, Lilly Kumari et al.
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma, Guanchao Qiao, Yian Liu et al.
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng et al.
Z*: Zero-shot Style Transfer via Attention Reweighting
Yingying Deng, Xiangyu He, Fan Tang et al.
All-directional Disparity Estimation for Real-world QPD Images
Hongtao Yu, Shaohui Song, Lihu Sun et al.
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
Yuan Zhou, Qingshan Xu, Jiequan Cui et al.
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye, Abhinav Gupta, Kris Kitani et al.
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang, Shiyan Chen, Yajing Zheng et al.
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik, Adam Kortylewski, Alan L. Yuille
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco, Stefano Berti, Giulia Pasquale et al.
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction
Mi-Gyeong Gwon, Gi-Mun Um, Won-Sik Cheong et al.
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
Chaoyu Li, Eun Woo Im, Pooyan Fazli
VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks
Jinseong Jang, Chunfei Ma, Byeongwon Lee
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei, Faizan Siddiqui, Jiacong Xu et al.
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure, Sahithya Ravi, Raymond Ng et al.
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim, Kyungdon Joo
Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition
Anqi Zhu, Jingmin Zhu, James Bailey et al.
CraftsMan3D: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner
Weiyu Li, Jiarui Liu, Hongyu Yan et al.
SEEN-DA: SEmantic ENtropy guided Domain-aware Attention for Domain Adaptive Object Detection
Haochen Li, Rui Zhang, Hantao Yao et al.
UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim et al.
WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models
Fu Feng, Yucheng Xie, Jing Wang et al.
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
Ling-An Zeng, Guohong Huang, Yi-Lin Wei et al.
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao, Zeyu Wang, Wei-Shi Zheng et al.
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim, Feng Liu, Yiyang Su et al.
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li, Jinglu Wang, Xiaohao Xu et al.
Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks
Yu Zhou, Dian Zheng, Qijie Mo et al.
ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects
Woojin Lee, Hyugjae Chang, Jaeho Moon et al.
From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian, Ruize Han, Wei Feng et al.
Advancing Multiple Instance Learning with Continual Learning for Whole Slide Imaging
Xianrui Li, Yufei Cui, Jun Li et al.
Segment Anything, Even Occluded
Wei-En Tai, Yu-Lin Shih, Cheng Sun et al.
Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang, Qiao Feng, Zhuo Su et al.
ReWind: Understanding Long Videos with Instructed Learnable Memory
Anxhelo Diko, Tinghuai Wang, Wassim Swaileh et al.
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
Yunan Zeng, Yan Huang, Jinjin Zhang et al.
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen, Yingyi Zhang, Siming Huang et al.
Cross-View Completion Models are Zero-shot Correspondence Estimators
Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang et al.
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min, Dawei Zhao, Liang Xiao et al.
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
Diljeet Jagpal, Xi Chen, Vinay P. Namboodiri
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.
MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun, Yueqi Duan, Juncheng Yan et al.
Scaling up Image Segmentation across Data and Tasks
Pei Wang, Zhaowei Cai, Hao Yang et al.
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Yifei Zhang, Chang Liu, Jin Wei et al.
TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features
Dana Cohen-Bar, Daniel Cohen-Or, Gal Chechik et al.
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Damien Teney, Liangze Jiang, Florin Gogianu et al.
ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration
Johan Edstedt, André Mateus, Alberto Jaenal
CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan, Despoina Paschalidou, Ian Huang et al.
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
Dahun Kim, AJ Piergiovanni, Ganesh Satish Mallya et al.
Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning Across Pseudo-Mismatch
Yijie Liu, Xinyi Shang, Yiqun Zhang et al.
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
Emanuele Aiello, Umberto Michieli, Diego Valsesia et al.
Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model
Yingmao Miao, Zhanpeng Huang, Rui Han et al.
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin, Yingke Chen, Dezhong Peng et al.
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong, Minjing Dong, Siqi Ma et al.
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
Qi Zhao, M. Salman Asif, Zhan Ma
Chebyshev Attention Depth Permutation Texture Network with Latent Texture Attribute Loss
Ravishankar Evani, Deepu Rajan, Shangbo Mao
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
Yuhang He, YingJie Chen, Yuhan Jin et al.
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella, Willi Menapace, Massimiliano Mancini et al.
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma, Danda Paudel, Ajad Chhatkuli et al.
Learned Trajectory Embedding for Subspace Clustering
Yaroslava Lochman, Christopher Zach, Carl Olsson
Generalizable Object Keypoint Localization from Generative Priors
Dongkai Wang, Jiang Duan, Liangjian Wen et al.
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu et al.
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu, Xintao Wang, Yixiao Ge et al.
Weakly Supervised Video Individual Counting
Xinyan Liu, Guorong Li, Yuankai Qi et al.
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen et al.
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui, Zihao Wei, Hongru Zhu et al.
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai, Shangchen Zhou, Blake Li et al.
Glossy Object Reconstruction with Cost-effective Polarized Acquisition
Bojian Wu, YIFAN PENG, Ruizhen Hu et al.
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang, Xin Yang, Jiantao Lin et al.
Preserving Fairness Generalization in Deepfake Detection
Li Lin, Xinan He, Yan Ju et al.
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang, Hui Chen, Zijia Lin et al.
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi, Xingyu Zhou, Shuhang Gu
Gradient Alignment for Cross-Domain Face Anti-Spoofing
MINH BINH LE, Simon Woo
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
You Wu, Kean Liu, Xiaoyue Mi et al.
Towards Universal Dataset Distillation via Task-Driven Diffusion
Ding Qi, Jian Li, Junyao Gao et al.
Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather
Longyu Yang, Ping Hu, Shangbo Yuan et al.
PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
Sifan Zhou, Zhihang Yuan, Dawei Yang et al.
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada, David Towers, Matthew Forshaw et al.
Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers
Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon et al.
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang, Dayan Wu, Chenming Wu et al.
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli, Jindong Jiang, Di Liu et al.
MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
Cong Wang, Di Kang, Heyi Sun et al.
ORIDa: Object-centric Real-world Image Composition Dataset
Jinwoo Kim, Sangmin Han, Jinho Jeong et al.
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.