Most Cited CVPR "self-driving applications" Papers
5,589 papers found • Page 12 of 28
Conference
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
Tiehan Fan, Kepan Nan, Rui Xie et al.
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
Zhonglin Sun, Chen Feng, Ioannis Patras et al.
Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance
Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias et al.
Unifying Automatic and Interactive Matting with Pretrained ViTs
Zixuan Ye, Wenze Liu, He Guo et al.
DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations
Ziqiao Peng, Yanbo Fan, Haoyu Wu et al.
X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Shuofeng Sun, Yongming Rao, Jiwen Lu et al.
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation
Qingchen Tang, Lei Fan, Maurice Pagnucco et al.
Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
Yiduo Hao, Sohrab Madani, Junfeng Guan et al.
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
Siyuan Bian, Chenghao Xu, Yuliang Xiu et al.
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
Zeren Jiang, Chen Guo, Manuel Kaufmann et al.
L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou, Xianhang li, Fengze Liu et al.
In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
Xin Wang, Lizhi Wang, Xiangtian Ma et al.
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Yifan Zhou, Zeqi Xiao, Shuai Yang et al.
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao, Albert J. Zhai, Shenlong Wang
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing
Jingxuan Wei, Cheng Tan, Qi Chen et al.
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao, Zhuowan Li, YadongLu et al.
Unseen Visual Anomaly Generation
HAN SUN, Yunkang Cao, Hao Dong et al.
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Junwen Xiong, Peng Zhang, Tao You et al.
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi, Biao Gong, Xi Chen et al.
DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters
Mingze Sun, Junting Dong, Junhao Chen et al.
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han et al.
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
Bang-Dang Pham, Phong Tran, Anh Tran et al.
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram et al.
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
Kun Liu, Qi Liu, Xinchen Liu et al.
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park et al.
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang, Wenbin He, Xiwei Xuan et al.
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu, Chong Xia, Ziwei Wang et al.
Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
Rui Zhao, Bin Shi, Jianfei Ruan et al.
Unified Entropy Optimization for Open-Set Test-Time Adaptation
Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu
Hyperbolic Learning with Synthetic Captions for Open-World Detection
Fanjie Kong, Yanbei Chen, Jiarui Cai et al.
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions
Hao Xu, Li Haipeng, Yinqiao Wang et al.
Scaling Inference Time Compute for Diffusion Models
Nanye Ma, Shangyuan Tong, Haolin Jia et al.
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
Xiang Xu, Lingdong Kong, hui shuai et al.
3D Neural Edge Reconstruction
Lei Li, Songyou Peng, Zehao Yu et al.
ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting
Guo Junfu, Yu Xin, Gaoyi Liu et al.
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao, Wei Zhai, Yuhang Yang et al.
Few-shot Learner Parameterization by Diffusion Time-steps
Zhongqi Yue, Pan Zhou, Richang Hong et al.
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
Oded Bialer, Yuval Haitman
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
Buzhen Huang, Chen Li, Chongyang Xu et al.
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu, Sai Bi, Zexiang Xu et al.
Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis
Feng Zhou, Ruiyang Liu, chen liu et al.
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang, Dacheng Yin, Yizhou Zhou et al.
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu, Runyu He, Gangshan Wu et al.
UniHuman: A Unified Model For Editing Human Images in the Wild
Nannan Li, Qing Liu, Krishna Kumar Singh et al.
SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani et al.
Region-Based Representations Revisited
Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao et al.
Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte, José Pedro Iglesias, Carl Olsson et al.
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.
Open-Canopy: Towards Very High Resolution Forest Monitoring
Fajwel Fogel, Yohann PERRON, Nikola Besic et al.
PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-wise Hardness
Siyao Jiang, Huisi Wu, Junyang Chen et al.
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui, Zihao Wei, Hongru Zhu et al.
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie, Yequan Bie, Jianda Mao et al.
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.
Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective
Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.
SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
Jiaben Chen, Huaizu Jiang
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
Mukund Varma T, Peihao Wang, Zhiwen Fan et al.
Pathways on the Image Manifold: Image Editing via Video Generation
Noam Rotstein, Gal Yona, Daniel Silver et al.
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue, Jie Cheng, Longteng Guo et al.
CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan, Despoina Paschalidou, Ian Huang et al.
SmartEraser: Remove Anything from Images using Masked-Region Guidance
Longtao Jiang, Zhendong Wang, Jianmin Bao et al.
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting
Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.
RORem: Training a Robust Object Remover with Human-in-the-Loop
Ruibin Li, Tao Yang, Song Guo et al.
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim, Siwon Kim, Minjun Park et al.
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch et al.
F3Loc: Fusion and Filtering for Floorplan Localization
Changan Chen, Rui Wang, Christoph Vogel et al.
Object Recognition as Next Token Prediction
Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.
Brain Decodes Deep Nets
Huzheng Yang, James Gee, Jianbo Shi
Consistent and Controllable Image Animation with Motion Diffusion Models
Xin Ma, Yaohui Wang, Gengyun Jia et al.
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning
Tim Lenz, Peter Neidlinger, Marta Ligero et al.
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.
Event-based Video Super-Resolution via State Space Models
Zeyu Xiao, Xinchao Wang
Single-View Scene Point Cloud Human Grasp Generation
Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei et al.
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Jianzong Wu, Chao Tang, Jingbo Wang et al.
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao, Long Zhao, Vijay Kumar BG et al.
Real-World Mobile Image Denoising Dataset with Efficient Baselines
Roman Flepp, Andrey Ignatov, Radu Timofte et al.
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
Ege Özsoy, Chantal Pellegrini, Tobias Czempiel et al.
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski, Weitong Zhang, Hadrien Reynaud et al.
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Yu Yuan, Xijun Wang, Yichen Sheng et al.
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
Shahaf Arica, Or Rubin, Sapir Gershov et al.
Segment Any Motion in Videos
Nan Huang, Wenzhao Zheng, Chenfeng Xu et al.
Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Qingping Zheng, Ling Zheng, Yuanfan Guo et al.
Move Anything with Layered Scene Diffusion
Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu et al.
Hearing Anything Anywhere
Mason Wang, Ryosuke Sawata, Samuel Clarke et al.
Zero-Shot Monocular Scene Flow Estimation in the Wild
Yiqing Liang, Abhishek Badki, Hang Su et al.
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
Temporally Consistent Object-Centric Learning by Contrasting Slots
Anna Manasyan, Maximilian Seitzer, Filip Radovic et al.
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Haohong Lin, Xin Huang, Tung Phan-Minh et al.
MotionPro: A Precise Motion Controller for Image-to-Video Generation
Zhongwei Zhang, Fuchen Long, Zhaofan Qiu et al.
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection
Shun Wei, Jielin Jiang, Xiaolong Xu
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen, Jiaming Zhang, Kunyu Peng et al.
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
Weijia Li, Haote Yang, Zhenghao Hu et al.
Instant Adversarial Purification with Adversarial Consistency Distillation
Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.
Privacy-Preserving Optics for Enhancing Protection in Face De-Identification
Jhon Lopez, Carlos Hinojosa, Henry Arguello et al.
Towards More Unified In-context Visual Understanding
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran et al.
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
Huangyue Yu, Baoxiong Jia, Yixin Chen et al.
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability Composability and Decomposability from Anatomy via Self Supervision
Mohammad Reza Hosseinzadeh Taher, Michael Gotway, Jianming Liang
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution
Xingyuan Li, Zirui Wang, Yang Zou et al.
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
Rohit Kundu, Hao Xiong, Vishal Mohanty et al.
LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation
Nisarg Shah, Vibashan VS, Vishal M. Patel
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
Bin Wang, Fan Wu, Linke Ouyang et al.
Audio-Visual Instance Segmentation
Ruohao Guo, Xianghua Ying, Yaru Chen et al.
Effective Video Mirror Detection with Inconsistent Motion Cues
Alex Warren, Ke Xu, Jiaying Lin et al.
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective
Chen Zhao, Zhizhou Chen, Yunzhe Xu et al.
Progressive Focused Transformer for Single Image Super-Resolution
Wei Long, Xingyu Zhou, Leheng Zhang et al.
SeMoLi: What Moves Together Belongs Together
Jenny Seidenschwarz, Aljoša Ošep, Francesco Ferroni et al.
3D Multi-frame Fusion for Video Stabilization
Zhan Peng, Xinyi Ye, Weiyue Zhao et al.
Federated Online Adaptation for Deep Stereo
Matteo Poggi, Fabio Tosi
PoNQ: a Neural QEM-based Mesh Representation
Nissim Maruani, Maks Ovsjanikov, Pierre Alliez et al.
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Xiangyu Yin, Wenjie Ruan
Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion
Zhiqiang Yan, Zhengxue Wang, Kun Wang et al.
DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering
Yexing Xu, Longguang Wang, Minglin Chen et al.
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
Junchao Zhu, Ruining Deng, Tianyuan Yao et al.
GDA: Generalized Diffusion for Robust Test-time Adaptation
Yun-Yun Tsai, Fu-Chen Chen, Albert Chen et al.
Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball
Simon Weber, Barış Zöngür, Nikita Araslanov et al.
ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention
Jiawei Wang, Changjian Li
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun, Dohoon Kim, Taesup Moon
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao, Jiangtong Li, Li Niu et al.
StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
Shangjin Zhai, Zhichao Ye, Jialin Liu et al.
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su et al.
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation
Qi Lv, Hao Li, Xiang Deng et al.
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao, Tianchen Zhao, Zinan Lin et al.
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
Jianwen Jiang, Gaojie Lin, Zhengkun Rong et al.
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu, Tianrun Chen, Qianxiong Xu et al.
CoLLM: A Large Language Model for Composed Image Retrieval
Chuong Huynh, Jinyu Yang, Ashish Tawari et al.
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Luo Jiayun, Siddhesh Khandelwal, Leonid Sigal et al.
CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs
Yingji Zhong, Lanqing Hong, Zhenguo Li et al.
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
ObjectMover: Generative Object Movement with Video Prior
Xin Yu, Tianyu Wang, Soo Ye Kim et al.
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing
Ruiyi Wang, Yushuo Zheng, Zicheng Zhang et al.
An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains
George Eskandar
DefMamba: Deformable Visual State Space Model
Leiye Liu, Miao Zhang, Jihao Yin et al.
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
Zhenyang Liu, Yikai Wang, Sixiao Zheng et al.
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.
ACL: Activating Capability of Linear Attention for Image Restoration
Yubin Gu, Yuan Meng, Jiayi Ji et al.
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
Zeliang Zhang, Mingqian Feng, Zhiheng Li et al.
Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World
Wen Yin, Jian Lou, Pan Zhou et al.
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li, Tobias Fischer, Mattia Segu et al.
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
Haoran You, Connelly Barnes, Yuqian Zhou et al.
Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation
Byung Hyun Lee, Sungjin Lim, Se Young Chun
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang, Guodong Wang, yizhou jin et al.
Instance Tracking in 3D Scenes from Egocentric Videos
Yunhan Zhao, Haoyu Ma, Shu Kong et al.
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
Weiguang Zhao, Rui Zhang, Qiufeng Wang et al.
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Eunsu Baek, Keondo Park, Ji-yoon Kim et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
Zhanqiang Guo, Jiamin Wu, Yonghao Song et al.
PointInfinity: Resolution-Invariant Point Diffusion Models
Zixuan Huang, Justin Johnson, Shoubhik Debnath et al.
Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Tianhao Zhao, Yongcan Chen, Yu Wu et al.
Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM
Linyu Tang, Lei Zhang
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao, Taein Kwon, Paul Streli et al.
OmniStyle: Filtering High Quality Style Transfer Data at Scale
Ye Wang, Ruiqi Liu, Jiang Lin et al.
Multi-Attribute Interactions Matter for 3D Visual Grounding
Can Xu, Yuehui Han, Rui Xu et al.
Online Video Understanding: OVBench and VideoChat-Online
Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen, Lin Li, Yongqi Yang et al.
CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning
Hyuck Lee, Heeyoung Kim
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang, Tingwei Gao, Jie Shao et al.
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
Yujia Liu, Chenxi Yang, Dingquan Li et al.
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.
NoT: Federated Unlearning via Weight Negation
Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.
Correcting Diffusion Generation through Resampling
Yujian Liu, Yang Zhang, Tommi Jaakkola et al.
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
Lihan Jiang, Kerui Ren, Mulin Yu et al.
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao, Bingkun Huang, Sen Xing et al.
Low-power Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Rashindrie Perera, Saman Halgamuge
Weakly Supervised Monocular 3D Detection with a Single-View Image
Xueying Jiang, Sheng Jin, Lewei Lu et al.
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
Yulong Zheng, Zicheng Jiang, Shengfeng He et al.
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.
MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities
Bizhu Wu, Jinheng Xie, Keming Shen et al.
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra et al.
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.
Universal Novelty Detection Through Adaptive Contrastive Learning
Hossein Mirzaei, Mojtaba Nafez, Mohammad Jafari et al.
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
Jian Liang, Wenke Huang, Guancheng Wan et al.
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du, Guangyao Li, Chang Zhou et al.
EgoLM: Multi-Modal Language Model of Egocentric Motions
Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim et al.
One-for-More: Continual Diffusion Model for Anomaly Detection
Xiaofan Li, Xin Tan, Zhuo Chen et al.
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction
Xiaoyang Lyu, Chirui Chang, Peng Dai et al.
Image Neural Field Diffusion Models
Yinbo Chen, Oliver Wang, Richard Zhang et al.
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti et al.
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao, Mengxi Chen, Tianjie Dai et al.
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu, Jiajun Deng, Guoliang You et al.
Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection
Jikang Cheng, Zhiyuan Yan, Ying Zhang et al.
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
Yue Gao, Hong-Xing Yu, Bo Zhu et al.
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu, Mayuna Gupta, Chetan Madan et al.