Most Cited CVPR "synthetic data finetuning" Papers
5,589 papers found • Page 25 of 28
Conference
Revisiting Adversarial Training at Scale
Zeyu Wang, Xianhang li, Hongru Zhu et al.
PersonaBooth: Personalized Text-to-Motion Generation
Boeun Kim, Hea In Jeong, JungHoon Sung et al.
G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping
Junfeng Cheng, Tania Stathaki
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Keda Tao, Can Qin, Haoxuan You et al.
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng, Guoqiang Wei, Jiani Zheng et al.
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu, Jiaxing Huang, Peng Gao et al.
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao, Haori Lu, Linlan Huang et al.
Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang, David Yunis, Michael Maire
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Jingnan Shi, Rajat Talak, Harry Zhang et al.
Yo’Chameleon: Personalized Vision and Language Generation
Thao Nguyen, Krishna Kumar Singh, Jing Shi et al.
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou, Chao Yang, Yu Qiao et al.
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.
Learning Temporally Consistent Video Depth from Video Diffusion Priors
Jiahao Shao, Yuanbo Yang, Hongyu Zhou et al.
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury et al.
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
Zhenghao Xing, Hao Chen, Binzhu Xie et al.
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks Methods and Applications
Karren Yang, Anurag Ranjan, Jen-Hao Rick Chang et al.
From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation
Yiwei Bao, Feng Lu
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
Mohamad Hassan N C, Divyam Gupta, Mainak Singha et al.
NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Ziyi Chen, Xiaolong Wu, Yu Zhang
Temporal Alignment-Free Video Matching for Few-shot Action Recognition
SuBeen Lee, WonJun Moon, Hyun Seok Seong et al.
Language Models as Black-Box Optimizers for Vision-Language Models
Shihong Liu, Samuel Yu, Zhiqiu Lin et al.
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training
Di Ming, Peng Ren, Yunlong Wang et al.
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing
Chun Gu, Xiaofei Wei, Zixuan Zeng et al.
Practical Solutions to the Relative Pose of Three Calibrated Cameras
Charalambos Tzamos, Viktor Kocur, Yaqing Ding et al.
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
Xinpeng Ding, Jianhua Han, Hang Xu et al.
MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering across Multiple Imaging Modalities
Federico Lincetto, Gianluca Agresti, Mattia Rossi et al.
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Haokai Pang, Heming Zhu, Adam Kortylewski et al.
Exploring Timeline Control for Facial Motion Generation
Yifeng Ma, Jinwei Qi, Chaonan Ji et al.
Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng XIANG, Zelong Lv, Sicheng Xu et al.
Equivariant Plug-and-Play Image Reconstruction
Matthieu Terris, Thomas Moreau, Nelly Pustelnik et al.
DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
Tobias Kirschstein, Simon Giebenhain, Matthias Nießner
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
Feng Liang, Haoyu Ma, Zecheng He et al.
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
Jiaming Li, Jiacheng Zhang, Jichang Li et al.
Enhancing Facial Privacy Protection via Weakening Diffusion Purification
Ali Salar, Qing Liu, Yingli Tian et al.
Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation
Lanyun Zhu, Tianrun Chen, Jianxiong Yin et al.
Learned Representation-Guided Diffusion Models for Large-Image Generation
Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le et al.
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
Yuanbin Man, Ying Huang, Chengming Zhang et al.
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation
Xiongwei Wu, Sicheng Yu, Ee-Peng Lim et al.
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff, Surya Koppisetti, Nicolo Bonettini et al.
CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
Haonan Zhang, Longjun Liu, Yuqi Huang et al.
Friendly Sharpness-Aware Minimization
Tao Li, Pan Zhou, Zhengbao He et al.
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Xi Liu, Ying Guo, Cheng Zhen et al.
Brain Decodes Deep Nets
Huzheng Yang, James Gee, Jianbo Shi
MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got et al.
Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
Yujia Liu, Anton Obukhov, Jan D. Wegner et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jian Wang, Zhe Cao, Diogo Luvizon et al.
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang, Pengyu Zheng, Wanquan Yan et al.
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
Haipeng Liu, Yang Wang, Biao Qian et al.
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Sili Chen, Hengkai Guo, Shengnan Zhu et al.
Misalignment-Robust Frequency Distribution Loss for Image Transformation
Zhangkai Ni, Juncheng Wu, Zian Wang et al.
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
Jona Ballé, Luca Versari, Emilien Dupont et al.
WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
Satish Kumar, Bowen Zhang, Chandrakanth Gudavalli et al.
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou, Jiazheng Xing, Yijie Qian et al.
Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph
Rao Fu, Jianmin Zheng, Liang Yu
BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology
Amaya Gallagher-Syed, Henry Senior, Omnia Alwazzan et al.
SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
Yunfei Fan, Tianyu Zhao, Guidong Wang
MACE: Mass Concept Erasure in Diffusion Models
Shilin Lu, Zilan Wang, Leyang Li et al.
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Tianhao Qi, Shancheng Fang, Yanze Wu et al.
Descriptor-In-Pixel : Point-Feature Tracking For Pixel Processor Arrays
Laurie Bose, Piotr Dudek, Jianing Chen
Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face Restoration
Lianxin Xie, csbingbing zheng, Wen Xue et al.
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
Qian Wang, Weiqi Li, Chong Mou et al.
Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields
Joshua Ahn, Haochen Wang, Raymond A. Yeh et al.
Countering Personalized Text-to-Image Generation with Influence Watermarks
Hanwen Liu, Zhicheng Sun, Yadong Mu
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Multi-Scale Aggregation and Anthropic Prior Knowledge
Bo Zou, Shaofeng Wang, Hao Liu et al.
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen, Siyuan Liang, Jingzhi Li et al.
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud, Yapeng Tian, Diana Marculescu
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
Xiang Xu, Lingdong Kong, hui shuai et al.
TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions
Wang Yu-Hang, Junkang Guo, Aolei Liu et al.
ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D Image
Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos et al.
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao, Ziquan Liu, Yu Cao et al.
vid-TLDR: Training Free Token Merging for Light-weight Video Transformer
Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.
Initialization Matters for Adversarial Transfer Learning
Andong Hua, Jindong Gu, Zhiyu Xue et al.
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Chirag Parikh, Deepti Rawat, Rakshitha R. T. et al.
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang, Songhua Liu, Zhenxiong Tan et al.
Loopy-SLAM: Dense Neural SLAM with Loop Closures
Lorenzo Liso, Erik Sandström, Vladimir Yugay et al.
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
Kranthi Kumar Rachavarapu, Kalyan Ramakrishnan, A. N. Rajagopalan
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng, Yujie Zhong, Zequn Jie et al.
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang, Sule Bai, Guangyi Chen et al.
DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking
Cheng Huang, Shoudong Han, Mengyu He et al.
ChatPose: Chatting about 3D Human Pose
Yao Feng, Jing Lin, Sai Kumar Dwivedi et al.
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention
Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim et al.
NC-TTT: A Noise Constrastive Approach for Test-Time Training
David OSOWIECHI, Gustavo Vargas Hakim, Mehrdad Noori et al.
JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy Tho Le, Chenhui Gou, Stavya Datta et al.
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Jingyao Xu, Yuetong Lu, Yandong Li et al.
ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation
Khoi D Nguyen, Chen Li, Gim Hee Lee
Task-Specific Gradient Adaptation for Few-Shot One-Class Classification
Yunlong Li, Xiabi Liu, Liyuan Pan et al.
Minimal Perspective Autocalibration
Andrea Porfiri Dal Cin, Timothy Duff, Luca Magri et al.
ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu, Yizhou Zhou, Yichao Yan et al.
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
Hongchi Xia, Yang Fu, Sifei Liu et al.
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen et al.
ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li, Bohan Zeng, Yutang Feng et al.
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
Buzhen Huang, Chen Li, Chongyang Xu et al.
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias
IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
Mengshun Hu, Kui Jiang, Zhihang Zhong et al.
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
Anqi Zhu, Qiuhong Ke, Mingming Gong et al.
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving
Brian Yang, Huangyuan Su, Nikolaos Gkanatsios et al.
Parametric Point Cloud Completion for Polygonal Surface Reconstruction
Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.
Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization
Zhi-Fan Wu, Chaojie Mao, Xue Wang et al.
TinyFusion: Diffusion Transformers Learned Shallow
Gongfan Fang, Kunjun Li, Xinyin Ma et al.
Poly-Autoregressive Prediction for Modeling Interactions
Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegaran et al.
CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections
Thomas Walker, Salvatore Esposito, Daniel Rebain et al.
Decentralized Diffusion Models
David McAllister, Matthew Tancik, Jiaming Song et al.
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier et al.
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Lingjun Zhao, Jingyu Song, Katherine Skinner
Investigating the Role of Weight Decay in Enhancing Nonconvex SGD
Tao Sun, Yuhao Huang, Li Shen et al.
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
Aashish Rai, Dilin Wang, Mihir Jain et al.
Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
Yingying Deng, Xiangyu He, Fan Tang et al.
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni, Yuxin Guo, Yichen Liu et al.
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu, Chengyu Wang, Tingfeng Cao et al.
ALIEN: Implicit Neural Representations for Human Motion Prediction under Arbitrary Latency
Dong Wei, Xiaoning Sun, Xizhan Gao et al.
Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Yu Qi, Yuanchen Ju, Tianming Wei et al.
GIF: Generative Inspiration for Face Recognition at Scale
Mohammad Saadabadi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei et al.
TULIP: Transformer for Upsampling of LiDAR Point Clouds
Bin Yang, Patrick Pfreundschuh, Roland Siegwart et al.
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning
Kaixuan Wu, Xinde Li, Xinglin Li et al.
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li, Wenhao Chai, XINGYU FU et al.
Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition
Yang Chen, Jingcai Guo, Song Guo et al.
Navigating Image Restoration with VAR’s Distribution Alignment Prior
Siyang Wang, Naishan Zheng, Jie Huang et al.
Incremental Residual Concept Bottleneck Models
Chenming Shang, Shiji Zhou, Hengyuan Zhang et al.
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
Hermann Kumbong, Xian Liu, Tsung-Yi Lin et al.
Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation
Lexin Fang, Yunyang Xu, Xiang Ma et al.
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
Xiaoyan Xing, Konrad Groh, Sezer Karaoglu et al.
ProjAttacker: A Configurable Physical Adversarial Attack for Face Recognition via Projector
Yuanwei Liu, Hui Wei, Chengyu Jia et al.
Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev et al.
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao, Amy Lin, Jeff Tan et al.
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon et al.
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao liang, Baoquan Zhang, Zhiyuan Wen et al.
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Zixuan Ye, Huijuan Huang, Xintao Wang et al.
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification
Gaozheng Pei, Shaojie Lyu, Gong Chen et al.
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
Shentong Mo, Yibing Song
DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
Yizhilv, Xiao Lu, Hong Ding et al.
Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression
Zhenqi Dai, Ting Liu, Yanning Zhang
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning
Mi Luo, Zihui Xue, Alex Dimakis et al.
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
Yunqi Gu, Ian Huang, Jihyeon Je et al.
NightCC: Nighttime Color Constancy via Adaptive Channel Masking
Shuwei Li, Robby T. Tan
AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments
Xiangyu Chang, Fahim Faisal Niloy, Sk Miraj Ahmed et al.
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen, Keyi Li, Yifan Jia et al.
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
Yiming Dou, Wonseok Oh, Yuqing Luo et al.
Fortifying Federated Learning Towards Trustworthiness via Auditable Data Valuation and Verifiable Client Contribution
Naveen Kumar Kummari, Ranjeet Ranjan Jha, Krishna Mohan Chalavadi et al.
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
Zhipeng Huang, Wangbo Yu, Xinhua Cheng et al.
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning
Yunlu Yan, Huazhu Fu, Yuexiang Li et al.
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
Kai Xu, Ziwei Yu, Xin Wang et al.
Domain Generalization in CLIP via Learning with Diverse Text Prompts
Changsong Wen, Zelin Peng, Yu Huang et al.
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Yifan Zhou, Zeqi Xiao, Shuai Yang et al.
Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data
Lilin Zhang, Chengpei Wu, Ning Yang
Graph-Embedded Structure-Aware Perceptual Hashing for Neural Network Protection and Piracy Detection
Ruiheng Liu, Haozhe Chen, Boyao Zhao et al.
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert et al.
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li, Mi Zhang, Yiming Sun et al.
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
Dingcheng Zhen, Shunshun Yin, Shiyang Qin et al.
AnyMap: Learning a General Camera Model for Structure-from-Motion with Unknown Distortion in Dynamic Scenes
Andrea Porfiri Dal Cin, Georgi Dikov, Jihong Ju et al.
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Ziheng Ouyang, Zhen Li, Qibin Hou
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li, Liansheng Zhuang, Xiao Long et al.
CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving
Dongkun Zhang, Jiaming Liang, Ke Guo et al.
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection
Chenxu Dang, Pei An, Xinmin Zhang et al.
UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification
Xingyue Liu, Jiahao Qi, Chen Chen et al.
Unboxed: Geometrically and Temporally Consistent Video Outpainting
Zhongrui Yu, Martina Megaro-Boldini, Robert Sumner et al.
Less is More: Efficient Model Merging with Binary Task Switch
Biqing Qi, Fangyuan Li, Zhen Wang et al.
Adversarial Text to Continuous Image Generation
Kilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen et al.
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi et al.
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
Hyeonggon Ryu, Seongyu Kim, Joon Chung et al.
Continual SFT Matches Multimodal RLHF with Negative Supervision
Ke Zhu, Yu Wang, Yanpeng Sun et al.
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen, Weilong Chen, Yifan Zuo et al.
Cross-Modal 3D Representation with Multi-View Images and Point Clouds
Ziyang Zhou, Pinghui Wang, Zi Liang et al.
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma, jidong kuang, Hongsong Wang et al.
DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image
Ziwei Zhao, Zhixing Zhang, Yuhang Liu et al.
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants
Chong Yu, Tao Chen, Zhongxue Gan
Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks
Daizong Liu, Wei Hu
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
Ding Ding, Yueming Pan, Ruoyu Feng et al.
Towards Continual Universal Segmentation
Zihan Lin, Zilei Wang, Xu Wang
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng, Han Li, Wenrui Dai et al.
Decoupled Motion Expression Video Segmentation
Hao Fang, Runmin Cong, Xiankai Lu et al.
Exploring Contextual Attribute Density in Referring Expression Counting
Zhicheng Wang, Zhiyu Pan, Zhan Peng et al.
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li, Boyang Li
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
Yongfan Liu, Hyoukjun Kwon
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
Shuai Liu, Mingyue Cui, Boyang Li et al.
Mixture of Submodules for Domain Adaptive Person Search
Minsu Kim, Seungryong Kim, Kwanghoon Sohn
Unsupervised Discovery of Facial Landmarks and Head Pose
Satyajit Tourani, Siddharth Tourani, Arif Mahmood et al.
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu, Pan Zhou, Shuicheng Yan et al.
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
Jiashuo Li, Shaokun Wang, Bo Qian et al.
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
Mashrur M. Morshed, Vishnu Naresh Boddeti
Test-time Augmentation Improves Efficiency in Conformal Prediction
Divya M Shanmugam, Helen Lu, Swami Sankaranarayanan et al.
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao, Wei Zhai, Yuhang Yang et al.
Robotic Visual Instruction
Yanbang Li, ZiYang Gong, Haoyang Li et al.
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues
Yuhui Liu, Liangxun Ou, Qiang Fu et al.
Dual Diffusion for Unified Image Generation and Understanding
Zijie Li, Henry Li, Yichun Shi et al.
Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval
Yushuai Sun, Zikun Zhou, Dongmei Jiang et al.
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
Huabin Liu, Filip Ilievski, Cees G. M. Snoek
Opportunistic Single-Photon Time of Flight
Sotiris Nousias, Mian Wei, Howard Xiao et al.
Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework
Yi Yu, Weizhen Han, Libing Wu et al.
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
Teng Hu, Jiangning Zhang, Ran Yi et al.
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning
Long Zhou, Fereshteh Shakeri, Aymen Sadraoui et al.
Query Efficient Black-Box Visual Prompting with Subspace Learning
Haozhen Zhang, Zhaogeng Liu, Hualin Zhang et al.
Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes
Stefano Esposito, Anpei Chen, Christian Reiser et al.
Fingerprinting Denoising Diffusion Probabilistic Models
Huan Teng, Yuhui Quan, Chengyu Wang et al.
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
Hanwen Jiang, Zexiang Xu, Desai Xie et al.
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation
Tianfu Wang, Mingyang Xie, Haoming Cai et al.
AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation
Jingyi Xie, Jintao Yang, Zhunchen Luo et al.