Most Cited CVPR "text-based image stylization" Papers
5,589 papers found • Page 9 of 28
Conference
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
Peihua Deng, Jiehua Zhang, Xichun Sheng et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li et al.
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
Zihao Wang, Yuxiang Wei, Fan Li et al.
Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning
Xialei Liu, Jiang-Tian Zhai, Andrew Bagdanov et al.
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.
An N-Point Linear Solver for Line and Motion Estimation with Event Cameras
Ling Gao, Daniel Gehrig, Hang Su et al.
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon, Jinhyun Jang, Jin Kim et al.
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.
Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field
Yuanzhen Li, Fei LUO, Chunxia Xiao
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli
SapiensID: Foundation for Human Recognition
Minchul Kim, Dingqiang Ye, Yiyang Su et al.
Focusing on Tracks for Online Multi-Object Tracking
Kyujin Shim, Kangwook Ko, YuJin Yang et al.
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park, Jeongyeon Hwang, Sunung Mun et al.
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas et al.
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li, Zhangyang Gao, Cheng Tan et al.
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu, Jieke Wang, Meng Tang
Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi, Keita Takahashi, Chihiro Tsutake et al.
DiffFNO: Diffusion Fourier Neural Operator
Xiaoyi Liu, Hao Tang
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian, Zhongliang Guo, Bowen Deng et al.
Event-based Structure-from-Orbit
Ethan Elms, Yasir Latif, Tae Ha Park et al.
StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
Yining Shi, Kun JIANG, Ke Wang et al.
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Zihao Zhang, Haoran Chen, Haoyu Zhao et al.
Cross Initialization for Face Personalization of Text-to-Image Models
Lianyu Pang, Jian Yin, Haoran Xie et al.
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
Zilong Huang, Jun He, Junyan Ye et al.
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin, Yuqiang Ren, Ke Yan et al.
Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement
Hesong Li, Ziqi Wu, Ruiwen Shao et al.
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Jinhong Deng, Yuhang Yang, Wen Li et al.
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach
Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
Yunxiao Shi, Manish Singh, Hong Cai et al.
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
Yongkang Li, Tianheng Cheng, Bin Feng et al.
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.
Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim, Dongha Lee, SeongKu Kang et al.
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.
Relational Matching for Weakly Semi-Supervised Oriented Object Detection
Wenhao Wu, Hau San Wong, Si Wu et al.
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu, Zhixin Shu, Cameron Smith et al.
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge, Jixiang Luo, XINJIE ZHANG et al.
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai, HangChen, Jun Du et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
Jin Lyu, Tianyi Zhu, Yi Gu et al.
Diversity-aware Channel Pruning for StyleGAN Compression
Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim et al.
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Ziheng Zhang, Jianyang Gu, Arpita Chowdhury et al.
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
Valentin Gabeff, Haozhe Qi, Brendan Flaherty et al.
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
Yunhe Gao, Di Liu, Zhuowei Li et al.
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
Seokju Yun, Seunghye Chae, Dongheon Lee et al.
Clustering for Protein Representation Learning
Ruijie Quan, Wenguan Wang, Fan Ma et al.
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang, Jiale Fu, chenduo hao et al.
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.
Automatic Controllable Colorization via Imagination
Xiaoyan Cong, Yue Wu, Qifeng Chen et al.
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Run He, Kai Tong, Di Fang et al.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
Yiren Lu, Yunlai Zhou, Disheng Liu et al.
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand, Peiran Yu, Shangqian Gao et al.
Instruction-based Image Manipulation by Watching How Things Move
Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng et al.
RAD: Region-Aware Diffusion Models for Image Inpainting
Sora Kim, Sungho Suh, Minsik Lee
Neural Lineage
Runpeng Yu, Xinchao Wang
Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
Dominik Scheuble, Chenyang Lei, Mario Bijelic et al.
Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Yuying Ge, Yizhuo Li, Yixiao Ge et al.
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Chun-Hung Wu, Shih-Hong Chen, Chih Yao Hu et al.
The Computer Vision Foundation
Yancheng Cai, Fei Yin, Dounia Hammou et al.
AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning
Xuecheng Wu, Heli Sun, Yifan Wang et al.
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Wen-Sheng Chu et al.
MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Zhixiong Nan, Xianghong Li, Tao Xiang et al.
Towards Generalizable Scene Change Detection
Jae-Woo KIM, Ue-Hwan Kim
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard, Jean-marc Odobez
Instance-based Max-margin for Practical Few-shot Recognition
Minghao Fu, Ke Zhu
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang, Kairong Yu, Xian Zhong et al.
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
Xinghui Li, Qichao Sun, Pengze Zhang et al.
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images
Guanlin Shen, Jingwei Huang, Zhihua Hu et al.
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Yuze He, Yanning Zhou, Wang Zhao et al.
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Yabiao Wang, Shuo Wang, Jiangning Zhang et al.
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang, Zhan Tong, Kecheng Zheng et al.
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang, Maixuan Xue, Xinran Liu et al.
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen et al.
Unsupervised Deep Unrolling Networks for Phase Unwrapping
Zhile Chen, Yuhui Quan, Hui Ji
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Joey Wilson, Marcelino M. de Almeida, Sachit Mahajan et al.
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Yuji Wang, Haoran Xu, Yong Liu et al.
KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
Fengyuan Yang, Kerui Gu, Angela Yao
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living
Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha et al.
Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization
Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields
Sean Wu, Shamik Basu, Tim Broedermann et al.
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Zexin He, Tengfei Wang, Xin Huang et al.
CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.
Fun with Flags: Robust Principal Directions via Flag Manifolds
Tolga Birdal, Nathan Mankovich
Extreme Point Supervised Instance Segmentation
Hyeonjun Lee, Sehyun Hwang, Suha Kwak
GauSTAR: Gaussian Surface Tracking and Reconstruction
Chengwei Zheng, Lixin Xue, Juan Jose Zarate et al.
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei, Feng Li, Shengeng Tang et al.
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen, Lingting Zhu, Zeyu HU et al.
3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
yating wang, Xuan Wang, Ran Yi et al.
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
Zhengrui Guo, Conghao Xiong, Jiabo MA et al.
Language Model Guided Interpretable Video Action Reasoning
Ning Wang, Guangming Zhu, Hongsheng Li et al.
BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image
Minje Kim, Tae-Kyun Kim
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Feiran Li, Haiyang Jiang, Daisuke Iso
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
Takami Sato, Justin Yue, Nanze Chen et al.
Exploring Historical Information for RGBE Visual Tracking with Mamba
Chuanyu Sun, Jiqing Zhang, Yang Wang et al.
Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining
Qi Cui, Ruohan Meng, Chaohui Xu et al.
ROICtrl: Boosting Instance Control for Visual Generation
Yuchao Gu, Yipin Zhou, Yunfan Ye et al.
Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction
Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen et al.
Generating Illustrated Instructions
Sachit Menon, Ishan Misra, Rohit Girdhar
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
Zikang Zhou, Hengjian Zhou, Haibo Hu et al.
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
Zijin Yin, Kongming Liang, Bing Li et al.
Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs
Youyi Zhan, Tianjia Shao, Yin Yang et al.
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
Xiaoyu Zhang, Weihong Pan, Chong Bao et al.
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao, Yue Yang, Kaipeng Zhang et al.
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Shengjun Zhang, Jinzhao Li, Xin Fei et al.
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron, Ahmet Iscen, Alireza Fathi et al.
G3DR: Generative 3D Reconstruction in ImageNet
Pradyumna Reddy, Ismail Elezi, Jiankang Deng
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui, Xuangeng Chu, Tatsuya Harada
CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment
Hyeongmin Lee, Kyoungkook Kang, Jungseul Ok et al.
ARM: Appearance Reconstruction Model for Relightable 3D Generation
Xiang Feng, Chang Yu, Zoubin Bi et al.
Linear Attention Modeling for Learned Image Compression
Donghui Feng, Zhengxue Cheng, Shen Wang et al.
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models
Kartik Thakral, Tamar Glaser, Tal Hassner et al.
Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation
Takeshi Noda, Chao Chen, Junsheng Zhou et al.
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling
Guillem Capellera, Antonio Rubio, Luis Ferraz et al.
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu, Huan Zhang
Monocular and Generalizable Gaussian Talking Head Animation
Shengjie Gong, Haojie Li, Jiapeng Tang et al.
Monocular Identity-Conditioned Facial Reflectance Reconstruction
Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.
Panorama Generation From NFoV Image Done Right
Dian Zheng, Cheng Zhang, Xiao-Ming Wu et al.
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang, BIN CHEN, Yulin Li et al.
PFStorer: Personalized Face Restoration and Super-Resolution
Tuomas Varanka, Tapani Toivonen, Soumya Tripathy et al.
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
Akhil Perincherry, Jacob Krantz, Stefan Lee
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
TAO MA, Bing Bai, Haozhe Lin et al.
Cross-modal Causal Relation Alignment for Video Question Grounding
weixing chen, Yang Liu, Binglin Chen et al.
LEAD: Exploring Logit Space Evolution for Model Selection
Zixuan Hu, Xiaotong Li, SHIXIANG TANG et al.
Hyperbolic Category Discovery
Yuanpei Liu, Zhenqi He, Kai Han
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung et al.
Dynamic Updates for Language Adaptation in Visual-Language Tracking
Xiaohai Li, Bineng Zhong, Qihua Liang et al.
BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed
Abhishek Tandon, Anujraaj Goyal, Henry M. Clever et al.
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren et al.
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
Jinlu Zhang, Yixin Chen, Zan Wang et al.
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
Ian Huang, Yanan Bao, Karen Truong et al.
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.
Progress-Aware Video Frame Captioning
Zihui Xue, Joungbin An, Xitong Yang et al.
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
Alvi Md Ishmam, Chris Thomas
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
Peishan Cong, Ziyi Wang, Yuexin Ma et al.
Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu, Prashanth Chandran, Sebastian Weiss et al.
Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder
Junjie Zhou, Jiao Tang, Yingli Zuo et al.
InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation
Sirui Xu, Dongting Li, Yucheng Zhang et al.
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Jiadong Tang, Yu Gao, Dianyi Yang et al.
Single View Refractive Index Tomography with Neural Fields
Brandon Zhao, Aviad Levis, Liam Connor et al.
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li, Cristiano Saltori, Fabio Poiesi et al.
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation
Dong Zhao, Shuang Wang, Qi Zang et al.
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
Jing Li, Yihang Fu, Falai Chen
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
Jiayi Gao, Zijin Yin, Changcheng Hua et al.
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
Aaryan Garg, Akash Kumar, Yogesh S. Rawat
Detours for Navigating Instructional Videos
Kumar Ashutosh, Zihui Xue, Tushar Nagarajan et al.
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng, Haoyu Zhang, Meng Liu et al.
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim, Rui Xiao, Iuliana Georgescu et al.
L-MAGIC: Language Model Assisted Generation of Images with Coherence
zhipeng cai, Matthias Mueller, Reiner Birkl et al.
Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning
Yanbiao Ma, Wei Dai, Wenke Huang et al.
Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation
Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang et al.
ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping
Youxin Pang, Ruizhi Shao, Jiajun Zhang et al.
BrainWash: A Poisoning Attack to Forget in Continual Learning
Ali Abbasi, Parsa Nooralinejad, Hamed Pirsiavash et al.
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Bolin Lai, Felix Juefei-Xu, Miao Liu et al.
Semantic Human Mesh Reconstruction with Textures
xiaoyu zhan, Jianxin Yang, Yuanqi Li et al.
M3amba: Memory Mamba is All You Need for Whole Slide Image Classification
Tingting Zheng, Kui Jiang, Yi Xiao et al.
Towards Backward-Compatible Continual Learning of Image Compression
Zhihao Duan, Ming Lu, Justin Yang et al.
WaveMo: Learning Wavefront Modulations to See Through Scattering
Mingyang Xie, Haiyun Guo, Brandon Y. Feng et al.
A General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening
Jie Huang, Haorui Chen, Jiaxuan Ren et al.
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi et al.
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
Chengyou Jia, Changliang Xia, Zhuohang Dang et al.
Efficient Stitchable Task Adaptation
Haoyu He, Zizheng Pan, Jing Liu et al.
Physics-Aware Hand-Object Interaction Denoising
Haowen Luo, Yunze Liu, Li Yi
PixelRNN: In-pixel Recurrent Neural Networks for End-to-end–optimized Perception with Neural Sensors
Haley So, Laurie Bose, Piotr Dudek et al.
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili et al.
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li, Qiang Nie, Weifu Fu et al.
SuperPrimitive: Scene Reconstruction at a Primitive Level
Kirill Mazur, Gwangbin Bae, Andrew J. Davison
Scene Map-based Prompt Tuning for Navigation Instruction Generation
Sheng Fan, Rui Liu, Wenguan Wang et al.
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian, Jing Han, Chengcheng Wang et al.
Gradient-Guided Annealing for Domain Generalization
Aristotelis Ballas, Christos Diou
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
MATCHA: Towards Matching Anything
Fei Xue, Sven Elflein, Laura Leal-Taixe et al.
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
He Zhu, Quyu Kong, Kechun Xu et al.
Guiding Human-Object Interactions with Rich Geometry and Relations
Mengqing Xue, Yifei Liu, Ling Guo et al.
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Mingju Gao, Yike Pan, Huan-ang Gao et al.
Keyframe-Guided Creative Video Inpainting
Yuwei Guo, Ceyuan Yang, Anyi Rao et al.
Towards Realistic Example-based Modeling via 3D Gaussian Stitching
Xinyu Gao, Ziyi Yang, Bingchen Gong et al.
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting
Kenghong Lin, Baoquan Zhang, Demin Yu et al.
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
Tomas Soucek, Prajwal Gatti, Michael Wray et al.