Most Cited 2024 "object preservation" Papers
12,324 papers found • Page 53 of 62
Conference
MFP: Making Full Use of Probability Maps for Interactive Image Segmentation
Chaewon Lee, Seon-Ho Lee, Chang-Su Kim
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Jiahao Nie, Yun Xing, Gongjie Zhang et al.
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
Zhengwei Fang, Rui Wang, Tao Huang et al.
An Empirical Study of Scaling Law for Scene Text Recognition
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Xianqi Wang, Gangwei Xu, Hao Jia et al.
When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
Xiaoming Li, Xinyu Hou, Chen Change Loy
Differentiable Neural Surface Refinement for Modeling Transparent Objects
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
Low-power Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang, Youngki Lee
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One
Mike Ranzinger, Greg Heinrich, Jan Kautz et al.
Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation
Agastya Kalra, Guy Stoppi, Dmitrii Marin et al.
Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Jinheng Xie, Songhe Deng, Bing Li et al.
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran, Qianyu Zhou, Xiangtai Li et al.
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
Gabriele Berton, Alex Stoken, Barbara Caputo et al.
PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.
Close Imitation of Expert Retouching for Black-and-White Photography
Seunghyun Shin, Jisu Shin, Jihwan Bae et al.
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi, Hyeonjoong Jang, Min H. Kim
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang, Jian Tao, Jiafei Lyu et al.
Reconstructing Hands in 3D with Transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo et al.
Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification
Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation
WEIMING ZHANG, Yexin Liu, Xu Zheng et al.
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun, Jiahui Chen, Shan Zhang et al.
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.
Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge et al.
Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
Hugh Blayney, Hanlin Tian, Hamish Scott et al.
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao et al.
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
Hao Jiang, Bingfeng Zhou, Yadong Mu
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
Zijia Lu, Ehsan Elhamifar
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.
ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation
Yan Di, Chenyangguang Zhang, Chaowei Wang et al.
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li, Tobias Fischer, Mattia Segu et al.
SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction
Yuan Li, Zhihao Liu, Bedrich Benes et al.
Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching
Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson et al.
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding, Canyu Zhao, Wen Wang et al.
Generative Unlearning for Any Identity
Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu, Tong Zhang, Wei Ke et al.
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang, Siyang Song, Cheng Luo et al.
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei, Shaofeng Yin, Yang Liu
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK, Zi Li, Yunhao Bai et al.
PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization
Yanlu Cai, Weizhong Zhang, Yuan Wu et al.
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
shiyu xuan, Qingpei Guo, Ming Yang et al.
LoS: Local Structure-Guided Stereo Matching
Kunhong Li, Longguang Wang, Ye Zhang et al.
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
Oded Bialer, Yuval Haitman
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.
Mip-Splatting: Alias-free 3D Gaussian Splatting
Zehao Yu, Anpei Chen, Binbin Huang et al.
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
Guangyang Wu, Xiaohong Liu, Jun Jia et al.
ProMark: Proactive Diffusion Watermarking for Causal Attribution
Vishal Asnani, John Collomosse, Tu Bui et al.
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Jiawen Zhu, Guansong Pang
DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
Zeqin Yu, Jiangqun Ni, Yuzhen Lin et al.
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu, Sai Bi, Zexiang Xu et al.
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Gang Zhang, Chen Junnan, Guohuan Gao et al.
Sheared Backpropagation for Fine-tuning Foundation Models
Zhiyuan Yu, Li Shen, Liang Ding et al.
On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta, Srijan Das, Jacob Nielsen et al.
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.
Plug and Play Active Learning for Object Detection
Chenhongyi Yang, Lichao Huang, Elliot Crowley
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.
CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada, Kanta Kaneda, Daichi Saito et al.
XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
Guangyu Wang, Jinzhi Zhang, Fan Wang et al.
Differentiable Micro-Mesh Construction
Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang, Yang Fu, Zheng Ding et al.
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu, Jinhua Hao, Yukang Ding et al.
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu et al.
Learning from Synthetic Human Group Activities
Che-Jui Chang, Danrui Li, Deep Patel et al.
Can’t Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo et al.
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang, Dongliang Cao, Florian Bernard
Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia, Chih-Hao Lin, Wei-Chiu Ma et al.
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
Jiawei Yao, Qi Qian, Juhua Hu
Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Hanyang Chi, Jian Pang, Bingfeng Zhang et al.
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao, Yan-Pei Cao, Kai Han et al.
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen, Dongcheng Zhao, Tenglong Li et al.
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
Zeyuan Yang, LIU JIAGENG, Peihao Chen et al.
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
Samy Tafasca, Anshul Gupta, Jean-marc Odobez
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang et al.
Dynamic Support Information Mining for Category-Agnostic Pose Estimation
Pengfei Ren, Yuanyuan Gao, Haifeng Sun et al.
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan et al.
MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
Zhicheng Zhang, Pancheng Zhao, Eunil Park et al.
CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.
Neural Clustering based Visual Representation Learning
Guikun Chen, Xia Li, Yi Yang et al.
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.
CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Yuxin Guo, Siyang Sun, Shuailei Ma et al.
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu et al.
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao, Zhuoyan Luo, Yong Liu et al.
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip H.S. Torr et al.
VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
Leyuan Liu, Yuhan Li, Yunqi Gao et al.
Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline
Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.
Point Transformer V3: Simpler Faster Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang et al.
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang, Zhiding Yu, Christopher Choy et al.
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu, Rui Liu, Bolun Zheng et al.
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha, Woo-Young Kang, Jonghwan Mun et al.
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice et al.
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.
SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim, Kejie Li, Xueqing Deng et al.
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haochen Han, Qinghua Zheng, Guang Dai et al.
EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
Rui Jiang, Fangwen Tu, Yixuan Long et al.
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano, Federico Magistri, Lucas Nunes et al.
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Yu, Jie Huang, Li et al.
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba, Matthew Walter, Norimichi Ukita
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li, Songyang Zhang, Dahua Lin et al.
MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction
Eric-Tuan Le, Antonios Kakolyris, Petros Koutras et al.
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong, Ningtao Mao, He Wang
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
Xiaolong Deng, Huisi Wu, Runhao Zeng et al.
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang, Chaojie Mao, Yulin Pan et al.
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao et al.
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
Jiasen Lu, Christopher Clark, Sangho Lee et al.
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv, Hong Chen, Jinyang Guo et al.
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI, Baolu Li, Zhengzhong Tu et al.
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai, Addison, Lin Wang
Learning Triangular Distribution in Visual World
Ping Chen, Xingpeng Zhang, Chengtao Zhou et al.
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov et al.
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar, Kshitij Madhav Bhat, Vedang Bhupesh Shenvi Nadkarni et al.
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang, Ruizhi Shao, Qi Zhang et al.
Unbiased Estimator for Distorted Conics in Camera Calibration
Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon et al.
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng et al.
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing, Mai Xu, Shengxi Li et al.
TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
Zhiyuan Ren, Minchul Kim, Feng Liu et al.
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Kadoglou et al.
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Xun Wang et al.
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Linfeng Yuan, Miaojing Shi, Zijie Yue et al.
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen et al.
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma, Guanchao Qiao, Yian Liu et al.
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng et al.
Z*: Zero-shot Style Transfer via Attention Reweighting
Yingying Deng, Xiangyu He, Fan Tang et al.
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye, Abhinav Gupta, Kris Kitani et al.
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang, Shiyan Chen, Yajing Zheng et al.
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik, Adam Kortylewski, Alan L. Yuille
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco, Stefano Berti, Giulia Pasquale et al.
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction
Mi-Gyeong Gwon, Gi-Mun Um, Won-Sik Cheong et al.
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim, Kyungdon Joo
UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim et al.
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao, Zeyu Wang, Wei-Shi Zheng et al.
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim, Feng Liu, Yiyang Su et al.
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li, Jinglu Wang, Xiaohao Xu et al.
From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian, Ruize Han, Wei Feng et al.
Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang, Qiao Feng, Zhuo Su et al.
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
Yunan Zeng, Yan Huang, Jinjin Zhang et al.
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen, Yingyi Zhang, Siming Huang et al.
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang et al.
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min, Dawei Zhao, Liang Xiao et al.
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.
MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun, Yueqi Duan, Juncheng Yan et al.
CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan, Despoina Paschalidou, Ian Huang et al.
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin, Yingke Chen, Dezhong Peng et al.
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong, Minjing Dong, Siqi Ma et al.
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
Qi Zhao, M. Salman Asif, Zhan Ma
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
Yuhang He, YingJie Chen, Yuhan Jin et al.
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella, Willi Menapace, Massimiliano Mancini et al.
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma, Danda Paudel, Ajad Chhatkuli et al.
Learned Trajectory Embedding for Subspace Clustering
Yaroslava Lochman, Christopher Zach, Carl Olsson
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu et al.
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu, Xintao Wang, Yixiao Ge et al.
Weakly Supervised Video Individual Counting
Xinyan Liu, Guorong Li, Yuankai Qi et al.
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen et al.
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui, Zihao Wei, Hongru Zhu et al.
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai, Shangchen Zhou, Blake Li et al.
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang, Xin Yang, Jiantao Lin et al.
Preserving Fairness Generalization in Deepfake Detection
Li Lin, Xinan He, Yan Ju et al.
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang, Hui Chen, Zijia Lin et al.
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi, Xingyu Zhou, Shuhang Gu
Gradient Alignment for Cross-Domain Face Anti-Spoofing
MINH BINH LE, Simon Woo
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
You Wu, Kean Liu, Xiaoyue Mi et al.
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada, David Towers, Matthew Forshaw et al.
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang, Dayan Wu, Chenming Wu et al.
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli, Jindong Jiang, Di Liu et al.
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.
From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
Hyeokjun Kweon, Kuk-Jin Yoon
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang, Hao Wen, Junting Dong et al.
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Yushuang Wu, Luyue Shi, Junhao Cai et al.
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He, Hengduo Li, Young Kyun Jang et al.
SVGDreamer: Text Guided SVG Generation with Diffusion Model
XiMing Xing, Chuang Wang, Haitao Zhou et al.
Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.
R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization
Kennard Chan, Fayao Liu, Guosheng Lin et al.