Most Cited ICCV "ucb-driven elimination" Papers
2,701 papers found • Page 9 of 14
Conference
GenHaze: Pioneering Controllable One-Step Realistic Haze Generation for Real-World Dehazing
Sixiang Chen, Tian Ye, Yunlong Lin et al.
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
Jianzhe Gao, Rui Liu, Wenguan Wang
GECO: Geometrically Consistent Embedding with Lightspeed Inference
Regine Hartwig, Dominik Muhle, Riccardo Marin et al.
Closed-Loop Transfer for Weakly-supervised Affordance Grounding
Jiajin Tang, Zhengxuan Wei, Ge Zheng et al.
DyGS-SLAM: Real-Time Accurate Localization and Gaussian Reconstruction for Dynamic Scenes
Xinggang Hu, Chenyangguang Zhang, Mingyuan Zhao et al.
Training-Free Personalization via Retrieval and Reasoning on Fingerprints
Deepayan Das, Davide Talon, Yiming Wang et al.
PASD: A Pixel-Adaptive Swarm Dynamics Approach for Unsupervised Low-Light Image Enhancement
Shuai Jin, Yuhua Qian, Feijiang Li et al.
Proactive Scene Decomposition and Reconstruction
Baicheng Li, Zike Yan, Dong Wu et al.
RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration
Longxin Kou, Fei Ni, Jianye HAO et al.
Expressive Talking Human from Single-Image with Imperfect Priors
Jun Xiang, Yudong Guo, Leipeng Hu et al.
InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians
Kefan Chen, Sergiu Oprea, Justin Theiss et al.
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan, Vibashan VS, Rama Chellappa et al.
Continuous-Time Human Motion Field from Event Cameras
Ziyun Wang, Ruijun Zhang, Zi-Yan Liu et al.
LDIP: Long Distance Information Propagation for Video Super-Resolution
Michael Bernasconi, Abdelaziz Djelouah, Yang Zhang et al.
NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan, Jianing Li, Wei Chen et al.
Neuromanifold-Regularized KANs for Shape-fair Feature Representations
Mazlum Arslan, Weihong Guo, Shuo Li
GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar
SeungJun Moon, Hah Min Lew, Seungeun Lee et al.
Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
Vlad Hosu, Lorenzo Agnolucci, Daisuke Iso et al.
Less Static, More Private: Towards Transferable Privacy-Preserving Action Recognition by Generative Decoupled Learning
Zhi-Wei Xia, Kun-Yu Lin, Yuan-Ming Li et al.
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
Xiao Li, Qi Chen, Xiulian Peng et al.
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li, Yifan Jiao, Dan Meng et al.
MistSense: Versatile Online Detection of Procedural and Execution Mistakes
Constantin Patsch, Yuankai Wu, Marsil Zakour et al.
Penalizing Boundary Activation for Object Completeness in Diffusion Models
Haoyang Xu, Tianhao Zhao, Sibei Yang et al.
LUSD: Localized Update Score Distillation for Text-Guided Image Editing
Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong et al.
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
Jeongho Kim, Hoiyeong Jin, Sunghyun Park et al.
Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
Dale Decatur, Thibault Groueix, Wang Yifan et al.
Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation
Rui Yang, Huining Li, Yiyi Long et al.
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He, Qihang Yu, Qihao Liu et al.
LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.
Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification
Wenkui Yang, Jie Cao, Junxian Duan et al.
Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!
zihang zou, Boqing Gong, Liqiang Wang
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma et al.
On the Provable Importance of Gradients for Autonomous Language-Assisted Image Clustering
Bo Peng, Jie Lu, Guangquan Zhang et al.
HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos
Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta
CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Olaf Dünkel, Artur Jesslen, Jiahao Xie et al.
ESCNet:Edge-Semantic Collaborative Network for Camouflaged Object Detection
Sheng Ye, Xin Chen, Yan Zhang et al.
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu, Jingwen Fu, Yang Wu et al.
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Junqi Ge, Ziyi Chen, Jintao Lin et al.
Enhancing Zero-shot Object Counting via Text-guided Local Ranking and Number-evoked Global Attention
Shiwei Zhang, Qi Zhou, Wei Ke
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth, David Rozenberszki, Angela Dai
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
JIAHE ZHAO, rongkun Zheng, Yi Wang et al.
Controllable Latent Space Augmentation for Digital Pathology
Sofiène Boutaj, Marin Scalbert, Pierre Marza et al.
Interpretable point cloud classification using multiple instance learning
Matt De Vries, Reed Naidoo, Olga Fourkioti et al.
Learning Beyond Still Frames: Scaling Vision-Language Models with Video
Yiyuan Zhang, Handong Li, Jing Liu et al.
Borrowing Eyes for the Blind Spot: Overcoming Data Scarcity in Malicious Video Detection via Cross-Domain Retrieval Augmentation
Rongpei Hong, Jian Lang, Ting Zhong et al.
Intermediate Connectors and Geometric Priors for Language-Guided Affordance Segmentation on Unseen Object Categories
Yicong Li, Yiyang Chen, Zhenyuan Ma et al.
Similarity Memory Prior is All You Need for Medical Image Segmentation
Hao Tang, Zhiqing Guo, Liejun Wang et al.
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu, Xiuxiu Bai, Xiaojun Jia et al.
DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation
Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang, Han Qiu
Towards Robustness of Person Search against Corruptions
Woojung Son, Yoonki Cho, Guoyuan An et al.
Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow
Yingfan MA, Bohan An, Ao Shen et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
Yitong Jiang, Jinwei Gu, Tianfan Xue et al.
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos, Cordelia Schmid, Josef Sivic
Unbiased Missing-modality Multimodal Learning
Ruiting Dai, Chenxi Li, Yandong Yan et al.
DM-EFS: Dynamically Multiplexed Expanded Features Set Form for Robust and Efficient Small Object Detection
Aashish Sharma
Inverse Image-Based Rendering for Light Field Generation from Single Images
Hyunjun Jung, Hae-Gon Jeon
Bolt3D: Generating 3D Scenes in Seconds
Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan et al.
Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging
Ying Xue, Jiaxi Jiang, Rayan Armani et al.
FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction
Donghyun Lee, Dawoon Jeong, Jae W. Lee et al.
GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views
Hang Yang, Le Hui, Jianjun Qian et al.
InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation
Zhuoran Yang, Xi Guo, Chenjing Ding et al.
NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals
Jiro Abe, Gaku Nakano, Kazumine Ogura
Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling
Wenting Luan, Siqi Lu, Yongbin Zheng et al.
RIOcc: Efficient Cross-Modal Fusion Transformer with Collaborative Feature Refinement for 3D Semantic Occupancy Prediction
Baojie Fan, Xiaotian Li, Yuhan Zhou et al.
TARS: Traffic-Aware Radar Scene Flow Estimation
Jialong Wu, Marco Braun, Dominic Spata et al.
LightCity: An Urban Dataset for Outdoor Inverse Rendering and Reconstruction under Multi-illumination Conditions
Jingjing Wang, Qirui Hu, Chong Bao et al.
Feature Extraction and Representation of Pre-training Point Cloud Based on Diffusion Models
Chang Qiu, Feipeng Da, Zilei Zhang
S²M²: Scalable Stereo Matching Model for Reliable Depth Estimation
JUNHONG MIN, YOUNGPIL JEON, Jimin Kim et al.
MiDSummer: Multi-Guidance Diffusion for Controllable Zero-Shot Immersive Gaussian Splatting Scene Generation
Anjun Hu, Richard Tomsett, Valentin Gourmet et al.
Spatio-Spectral Pattern Illumination for Direct and Indirect Separation from a Single Hyperspectral Image
Shin Ishihara, Imari Sato
UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction
Jin Cao, Hongrui Wu, Ziyong Feng et al.
ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors
Minsu Kim, Subin Jeon, In Cho et al.
ArgMatch: Adaptive Refinement Gathering for Efficient Dense Matching
Yuxin Deng, Kaining Zhang, Linfeng Tang et al.
Thermal Polarimetric Multi-view Stereo
Takahiro Kushida, Kenichiro Tanaka
SynCity: Training-Free Generation of 3D Cities
Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.
Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs
Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.
Teeth Reconstruction and Performance Capture Using a Phone Camera
Weixi Zheng, Jingwang Ling, Zhibo Wang et al.
Sibai: A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning
Melanie Götz, Torsten Krauß, Alexandra Dmitrienko
Learning to See in the Extremely Dark
Hai Jiang, Binhao Guan, Zhen Liu et al.
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.
Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception
Hongwei Lin, Dongyu Pan, Qiming Xia et al.
Hypergraph Clustering Network with Partial Attribute Imputation
Qianqian Wang, Bowen Zhao, Zhengming Ding et al.
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
Jing Wang, Rui Zhao, Ruiqin Xiong et al.
LIRA: Reasoning Reconstruction via Multimodal Large Language Models
Zhen Zhou, Tong Wang, Yunkai Ma et al.
Backdoor Attacks on Neural Networks via One-Bit Flip
Xiang Li, Lannan Luo, Qiang Zeng
Learning an Implicit Physics Model for Image-based Fluid Simulation
Emily Jia, Jiageng Mao, Zhiyuan Gao et al.
ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios
Jun Yin, Pengyu Zeng, Licheng Shen et al.
Splat-based 3D Scene Reconstruction with Extreme Motion-blur
Hyeonjoong Jang, Dongyoung Choi, Donggun Kim et al.
RAGD: Regional-Aware Diffusion Model for Text-to-Image Generation
Chen Zhennan, Yajie Li, Haofan Wang et al.
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng, Jiaqi Mao, Minghao Lai et al.
HERO: Human Reaction Generation from Videos
Chengjun Yu, Wei Zhai, Yuhang Yang et al.
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
Enming Zhang, Yuzhe Li, Yuliang Liu et al.
A Unified Interpretation of Training-Time Out-of-Distribution Detection
Xu Cheng, Xin Jiang, Zechao Li
Removing Out-of-Focus Reflective Flares via Color Alignment
Fengbo Lan, Chang Wen Chen
M2EIT: Multi-Domain Mixture of Experts for Robust Neural Inertial Tracking
Yan Li, Yang Xu, Changhao Chen et al.
JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models
Xiaolong Jin, Zixuan Weng, Hanxi Guo et al.
Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
Jiahao Xia, Yike Wu, Wenjian Huang et al.
NETracer: A Topology-Aware Iterative Tracing Approach for Tubular Structure Extraction
Chao Liu, Yangbo Jiang, Nenggan Zheng
UIPro: Unleashing Superior Interaction Capability For GUI Agents
Hongxin Li, Jingran Su, Jingfan CHEN et al.
AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images
Jiao Tang, Junjie Zhou, Bo Qian et al.
One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution
Xinyu Mao, Xiaohan Xing, Fei MENG et al.
Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal
wanchang Yu, Qing Zhang, Rongjia Zheng et al.
FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment
Hang Xu, Jie Huang, Linjiang Huang et al.
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View
Zitong Zhang, Suranjan Gautam, Rui Yu
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Yaopeng Lou, Liao Shen, Tianqi Liu et al.
Region-Level Data Attribution for Text-to-Image Generative Models
Trong Bang Nguyen, Phi Le Nguyen, Simon Lucey et al.
Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training
Qiaosi Yi, Shuai Li, Rongyuan Wu et al.
Benefit From Seen: Enhancing Open-Vocabulary Object Detection by Bridging Visual and Textual Co-Occurrence Knowledge
Yanqi Li, Jianwei Niu, Tao Ren
Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios
Chunxiao Li, Xiaoxiao Wang, Meiling Li et al.
Neural Solver of Dichromatic Reflection Model for Specular Highlight Removal
Gang Fu
Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks
Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala et al.
OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving
Mingqian Ji, Jian Yang, Shanshan Zhang
CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor
Han Ji, Yuqi Feng, Jiahao Fan et al.
TCFG: Truncated Classifier-Free Guidance for Efficient and Scalable Text-to-Image Acceleration
Xiaomeng Fu, Jia Li
Knowledge-Guided Part Segmentation
Xuejian Gou, Fang Liu, Licheng Jiao et al.
DADet: Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection
Hongwei Yu, Xinlong Ding, Jiawei Li et al.
Rethinking Layered Graphic Design Generation with a Top-Down Approach
Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
Ren-Jie Lu, Yu Zhou, hao cheng et al.
More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
Luong Tran, Thieu Vo, Anh Nguyen et al.
Loss Functions for Predictor-based Neural Architecture Search
Han Ji, Yuqi Feng, Jiahao Fan et al.
Transformer-based Tooth Alignment Prediction with Occlusion and Collision Constraints
DongZhenXing DongZhenXing, Jiazhou Chen
Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction
Hongyang Sun, Qinglin Yang, Jiawei Wang et al.
Democratizing High-Fidelity Co-Speech Gesture Video Generation
Xu Yang, Shaoli Huang, Shenbo Xie et al.
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
Yu Cheng, Fajie Yuan
Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization
Weiying Xie, Zihan Meng, Jitao Ma et al.
MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency
Xingbo YAO, xuanmin Wang, Hao WU et al.
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement
Habin Lim, Youngseob Won, Juwon Seo et al.
Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
Baoyou Chen, Ce Liu, Weihao Yuan et al.
Hierarchical 3D Scene Graphs Construction Outdoors
Jon Nyffeler, Federico Tombari, Daniel Barath
Task-Decoupled Bézier Surface Constraint for Uneven Low-Light Image Enhancement
Xingxiang Zhou, Xiangdong Su, Haoran Zhang et al.
Unlearning the Noisy Correspondence Makes CLIP More Robust
Haochen Han, Alex Jinpeng Wang, Peijun Ye et al.
Text-to-Any-Skeleton Motion Generation Without Retargeting
Qingyuan Liu, Ke Lv, Kun Dong et al.
Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation
Fengchen He, Dayang Zhao, Hao Xu et al.
Conditional Visual Autoregressive Modeling for Pathological Image Restoration
Ziyi Liu, Zhe Xu, Jiabo MA et al.
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification
Tuo Xiang, Xuemiao Xu, Bangzhen Liu et al.
CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
Jiaqi Han, Haotian Ye, Puheng Li et al.
RayGaussX: Accelerating Gaussian-Based Ray Marching for Real-Time and High-Quality Novel View Synthesis
Hugo Blanc, Jean-Emmanuel Deschaud, Alexis Paljic
Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities
Yiyuan Zhang, Handong Li, Jing Liu et al.
Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures
Xinlong Ding, Hongwei Yu, Jiawei Li et al.
End-to-End Entity-Predicate Association Reasoning for Dynamic Scene Graph Generation
LiWei Wang, YanDuo Zhang, Tao Lu et al.
Evidential Knowledge Distillation
Liangyu Xiang, Junyu Gao, Changsheng Xu
CO2-Net: A Physics-Informed Spatio-Temporal Model for Global Surface CO2 Reconstruction
Hao Zheng, Yuting Zheng, Hanbo Huang et al.
Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
Yunshan Zhong, Yuyao Zhou, Yuxin Zhang et al.
Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment
ying ba, Tianyu Zhang, Yalong Bai et al.
RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation
Junwen Huang, Shishir Reddy Vutukur, Peter Yu et al.
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu, Siyuan Meng, Yanting Gao et al.
Backdooring Self-Supervised Contrastive Learning by Noisy Alignment
Tuo Chen, Jie Gui, Minjing Dong et al.
Robust Dataset Condensation using Supervised Contrastive Learning
Nicole Kim, Hwanjun Song
Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction
Wenhao Xu, Wenming Weng, Yueyi Zhang et al.
MoFRR: Mixture of Diffusion Models for Face Retouching Restoration
Jiaxin Liu, Qichao Ying, Zhenxing Qian et al.
Adversarial Reconstruction Feedback for Robust Fine-grained Generalization
Shijie Wang, Jian Shi, Haojie Li
Uncover Treasures in DCT: Advancing JPEG Quality Enhancement by Exploiting Latent Correlations
jing Yang, Qunliang Xing, Mai Xu et al.
OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding
Tianrun Xu, Guanyu Chen, Ye Li et al.
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
Aoxiong Yin, Kai Shen, Yichong Leng et al.
PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning
Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy et al.
SUV: Suppressing Undesired Video Content via Semantic Modulation Based on Text Embeddings
Xiang Lv, Mingwen Shao, Lingzhuang Meng et al.
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Ming Dai, Wenxuan Cheng, Jiang-Jiang Liu et al.
LLM Thought Divergence and Convergence for Dialogue-Based Image Generation Control
Hui Li
DynFaceRestore: Balancing Fidelity and Quality in Diffusion-Guided Blind Face Restoration with Dynamic Blur-Level Mapping and Guidance
Huu Phu Do, Yu-Wei Chen, Yi-Cheng Liao et al.
Gradient-Reweighted Adversarial Camouflage for Physical Object Detection Evasion
Jiawei Liang, Siyuan Liang, Tianrui Lou et al.
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen et al.
D2ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei, Qizhong Tan, Guangming Lu et al.
Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images
Changha Shin, Woong Oh Cho, Seon Joo Kim
CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation
Elena Bueno-Benito, Mariella Dimiccoli
Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation
Xiaolong Xu, Lei Zhang, Jiayi Li et al.
LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li et al.
MinCD-PnP: Learning 2D-3D Correspondences with Approximate Blind PnP
Pei An, Jiaqi Yang, Muyao Peng et al.
How To Make Your Cell Tracker Say "I dunno!"
Richard D Paul, Johannes Seiffarth, David Rügamer et al.
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu, Qizhi Chen, Zhigang Wang et al.
Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance
Shuchao Pang, Zhenghan Chen, Shen Zhang et al.
Enhanced Pansharpening via Quaternion Spatial-Spectral Interactions
Dong Li, Chunhui Luo, Yuanfei Bao et al.
ScanEdit: Hierarchically-Guided Functional 3D Scan Editing
Mohamed El Amine Boudjoghra, Ivan Laptev, Angela Dai
Zero-Shot Composed Image Retrieval via Dual-Stream Instruction-Aware Distillation
Wenliang Zhong, Rob Barton, Weizhi An et al.
Medical World Model
Yijun Yang, Zhao-Yang Wang, Qiuping Liu et al.
MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild
Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Patel et al.
Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention
Jiawei Gu, Ziyue Qiao, Zechao Li
Token Activation Map to Visually Explain Multimodal LLMs
Yi Li, Hualiang Wang, Xinpeng Ding et al.
Diffusion-Based Imaginative Coordination for Bimanual Manipulation
Huilin Xu, Jian Ding, Jiakun Xu et al.
Learning Neural Scene Representation from iToF Imaging
Wenjie Chang, Hanzhi Chang, Yueyi Zhang et al.
Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM): A Task-Adaptive Representation Learning Framework
Rohan Sharma, Changyou Chen, Feng-Ju Chang et al.
InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling
Xiaoxue Chen, Bhargav Chandaka, Chih-Hao Lin et al.
SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models
Stathis Galanakis, Alexandros Lattas, Stylianos Moschoglou et al.
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning
Junjie Shan, Ziqi Zhao, Jialin Lu et al.
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
Daniel Winter, Asaf Shul, Matan Cohen et al.
Active Learning Meets Foundation Models: Fast Remote Sensing Data Annotation for Object Detection
Marvin Burges, Philipe Dias, Dalton Lunga et al.
MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost
Taiga Yamane, Ryo Masumura, Satoshi Suzuki et al.
Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding
Nuoye Xiong, Anqi Dong, Ning Wang et al.
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Wooseong Jeong, Kuk-Jin Yoon
DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection
Francisco Caetano, Christiaan Viviers, Luis Zavala-Mondragón et al.
Scaling and Taming Adversarial Training with Synthetic Data
Juntao Wu, Xianting Huang, Yu Chen et al.
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
Zihan Ding, Chi Jin, Difan Liu et al.
Music Grounding by Short Video
Zijie Xin, Minquan Wang, Jingyu Liu et al.
Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Zhenbang Du, Yonggan Fu, Lifu Wang et al.
Your Text Encoder Can Be An Object-Level Watermarking Controller
Naresh Kumar Devulapally, Mingzhen Huang, Vishal Asnani et al.