Most Cited 2025 "style-content separation" Papers
22,274 papers found • Page 100 of 112
Conference
CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve et al.
PLA: Prompt Learning Attack against Text-to-Image Generative Models
XINQI LYU, Yihao LIU, Yanjie Li et al.
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
Hengyuan Zhang, Zhe Li, Xingqun Qi et al.
Toward Better Out-painting: Improving the Image Composition with Initialization Policy Model
Xuan Han, Yihao Zhao, Yanhao Ge et al.
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong, David Fan, Jiachen Zhu et al.
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
Zhuoling Li, Haoxuan Qu, Jason Kuen et al.
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos et al.
AM-Adapter: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Siyoon Jin, Jisu Nam, Jiyoung Kim et al.
Diffusion Epistemic Uncertainty with Asymmetric Learning for Diffusion-Generated Image Detection
Yingsong Huang, Hui Guo, Jing Huang et al.
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim, Seokho Ahn, Young-Duk Seo
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim, Wooseok Seo, Junwan Kim et al.
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
Zeyi Sun, Ziyang Chu, Pan Zhang et al.
AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li, Xincheng Shuai, Hao Luo et al.
EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
Zexuan Yan, Yue Ma, Chang Zou et al.
RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
Yuhan Li, Xianfeng Tan, Wenxiang Shang et al.
Instruction-based Image Editing with Planning, Reasoning, and Generation
Liya Ji, Chenyang Qi, Qifeng Chen
HDR Image Generation via Gain Map Decomposed Diffusion
Yuanshen Guan, Ruikang Xu, Yinuo Liao et al.
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee, Kyungho Bae, Kyle Min et al.
Accelerating Diffusion Transformer via Gradient-Optimized Cache
Junxiang Qiu, Lin Liu, Shuo Wang et al.
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
Ruoyu Wang, Huayang Huang, Ye Zhu et al.
Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
Aniruddha Mahapatra, Long Mai, David Bourgin et al.
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu, Linchao Zhu, Yi Yang
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.
DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
Kazuma Nagata, Naoshi Kaneko
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu, Ruonan Yu, Xinchao Wang
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni, Meet Soni, Sirisha Rambhatla
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo, Yawei Li, Taolin Zhang et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu et al.
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde, Zihao Lu, Radu Timofte
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation
You Huang, Lichao Chen, Jiayi Ji et al.
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang, Kun-Yu Lin, Chaolei Tan et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation
Xiaoling Hu, Xiangrui Zeng, Oula Puonti et al.
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko et al.
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
Yefei He, Feng Chen, Jing Liu et al.
FastJSMA: Accelerating Jacobian-based Saliency Map Attacks through Gradient Decoupling
Zhenghao Gao, Shengjie Xu, Zijing Li et al.
Federated Continuous Category Discovery and Learning
Lixu Wang, Chenxi Liu, Junfeng Guo et al.
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
Xiaoqi Wang, Clint Sebastian, Wenbin He et al.
Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong et al.
Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models
Jieun Kim, Jinmyeong Kim, Yoonji Kim et al.
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.
RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement
Hao LU, Yuting Zhang, Jiaqi Tang et al.
CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization
Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.
Robustifying Zero-Shot Vision Language Models by Subspaces Alignment
Junhao Dong, Piotr Koniusz, Liaoyuan Feng et al.
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren, Wentao Ma, Huan Yang et al.
FE-CLIP: Frequency Enhanced CLIP Model for Zero-Shot Anomaly Detection and Segmentation
Tao Gong, Qi Chu, Bin Liu et al.
Bias-Resilient Weakly Supervised Semantic Segmentation Using Normalizing Flows
Xianglin Qiu, Xiaoyang Wang, Zhen Zhang et al.
Cracking Instance Jigsaw Puzzles: A Superior Alternative to Multiple Instance Learning for Whole Slide Image Analysis
Xiwen Chen, Peijie Qiu, Wenhui Zhu et al.
DecAD: Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection
Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.
Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding
Minghang Zheng, Yuxin Peng, Benyuan Sun et al.
RA-BUSSeg: Relation-aware Semi-supervised Breast Ultrasound Image Segmentation via Adjacent Propagation and Cross-layer Alignment
Wanting ZHANG, Zhenhui Ding, Guilian Chen et al.
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.
Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens
Runpeng Yu, Xinyin Ma, Xinchao Wang
SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting
Zihui Gao, Jia-Wang Bian, Guosheng Lin et al.
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Ji Du, Xin WANG, Fangwei Hao et al.
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
Shi-Chen Zhang, Yunheng Li, Yu-Huan Wu et al.
Pseudo-SD: Pseudo Controlled Stable Diffusion for Semi-Supervised and Cross-Domain Semantic Segmentation
Dong Zhao, Qi Zang, Shuang Wang et al.
Is CLIP ideal? No. Can we fix it? Yes!
Raphaela Kang, Yue Song, Georgia Gkioxari et al.
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.
Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou, Yue Li, Shun Zhang et al.
Temporal-aware Query Routing for Real-time Video Instance Segmentation
Zesen Cheng, Kehan Li, Yian Zhao et al.
Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation
Qin Zhou, Guoyan Liang, Xindi Li et al.
Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application
Ruiyun Yu, Bingyang Guo, Haoyuan Li
Memory-Efficient 4-bit Preconditioned Stochastic Optimization
Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection
Bin Yang, Yulin Zhang, Hong-Yu Zhou et al.
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.
Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection
Xingjian Wang, Li Chai, Jiming Chen
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.
FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data
YITING LI, Fayao Liu, Jingyi Liao et al.
Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint
Wentian Cai, Weizhao Weng, Zihao Huang et al.
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation
Maximilian Ulmer, Wout Boerdijk, Rudolph Triebel et al.
Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation
Junhao Xiao, Yang Wei, Jingyu Wang et al.
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code
WU Sitong, Haoru Tan, Yukang Chen et al.
Axis-level Symmetry Detection with Group-Equivariant Representation
Wongyun Yu, Ahyun Seo, Minsu Cho
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li, Zhihao Xu, Chenming Wu et al.
Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images
Qi Xun Yeo, Yanyan Li, Gim Hee Lee
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
Chongjie Ye, Yushuang Wu, Ziteng Lu et al.
Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction
Luoxi Zhang, Pragyan Shrestha, Yu Zhou et al.
MMGeo: Multimodal Compositional Geo-Localization for UAVs
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
Large Scene Generation with Cube-Absorb Discrete Diffusion
Qianjiang Hu, Wei Hu
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration
Jongsuk Kim, Jae Young Lee, Gyojin Han et al.
Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin et al.
DAA*: Deep Angular A Star for Image-based Path Planning
Zhiwei Xu
RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
Geonho Bang, Minjae Seong, Jisong Kim et al.
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.
NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction
Soham Dasgupta, Shanthika Naik, Preet Savalia et al.
RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
Shenxing Wei, Jinxi Li, Yafei YANG et al.
Semantic-guided Camera Ray Regression for Visual Localization
Yesheng Zhang, Xu Zhao
Polarimetric Neural Field via Unified Complex-Valued Wave Representation
Chu Zhou, Yixin Yang, Junda Liao et al.
From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han et al.
Street Gaussians without 3D Object Tracker
Ruida Zhang, Chengxi Li, Chenyangguang Zhang et al.
HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity
Yida Wang, Xueyang Zhang, Kun Zhan et al.
I2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
Zhimin Liao, Ping Wei, Ruijie Zhang et al.
MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy
Wuyang Li, Wentao Pan, Xiaoyuan Liu et al.
Free-running vs Synchronous: Single-Photon Lidar for High-flux 3D Imaging
Ruangrawee Kitichotkul, Shashwath Bharadwaj, Joshua Rapp et al.
Leaps and Bounds: An Improved Point Cloud Winding Number Formulation for Fast Normal Estimation and Surface Reconstruction
Chamin Hewa Koneputugodage, Dylan Campbell, Stephen Gould
Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning
Yiyang Chen, Shanshan Zhao, Lunhao Duan et al.
OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving
Kota Shimomura, Masaki Nambata, Atsuya Ishikawa et al.
UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images
Jiamin WU, Kenkun Liu, Xiaoke Jiang et al.
TOTP: Transferable Online Pedestrian Trajectory Prediction with Temporal-Adaptive Mamba Latent Diffusion
Ziyang Ren, Ping Wei, Shangqi Deng et al.
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields
Fabian Perez, Sara Rojas Martinez, Carlos Hinojosa et al.
MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion
Zebin He, Mx Yang, Shuhui Yang et al.
Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves
Alexander Ogren, Berthy Feng, Jihoon Ahn et al.
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
WEI-JER Chang, Masayoshi Tomizuka, Wei Zhan et al.
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
Ziliang Miao, Runjian Chen, Yixi Cai et al.
GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-based Transformer
Xin Jin, Haisheng Su, Cong Ma et al.
AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
Liuyue Xie, Jiancong Guo, Ozan Cakmakci et al.
Tile-wise vs. Image-wise: Random-Tile Loss and Training Paradigm for Gaussian Splatting
Xiaoyu Zhang, Weihong Pan, Xiaojun Xiang et al.
RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation
Yuwen Du, Anning Hu, Zichen Chao et al.
LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation
Zijie Wang, Weiming Zhang, Wei Zhang et al.
Planar Affine Rectification from Local Change of Scale and Orientation
Yuval Nissan, Marc Pollefeys, Daniel Barath
ERNet: Efficient Non-Rigid Registration Network for Point Sequences
Guangzhao He, Yuxi Xiao, Zhen Xu et al.
Doppler-Aware LiDAR-RADAR Fusion for Weather-Robust 3D Detection
Yujeong Chae, Heejun Park, Hyeonseong Kim et al.
Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
Mingfang Zhang, Ryo Yonetani, Yifei Huang et al.
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Yifan Lu, Xuanchi Ren, Jiawei Yang et al.
GenFlow3D: Generative Scene Flow Estimation and Prediction on Point Cloud Sequences
Hanlin Li, Wenming Weng, Yueyi Zhang et al.
Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping
Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti et al.
AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering
Michael Steiner, Thomas Köhler, Lukas Radl et al.
SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video
David Stotko, Reinhard Klein
BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment
Tongfan Guan, Jiaxin Guo, Chen Wang et al.
Decoupled Diffusion Sparks Adaptive Scene Generation
Yunsong Zhou, Naisheng Ye, William Ljungbergh et al.
Recover Biological Structure from Sparse-View Diffraction Images with Neural Volumetric Prior
Renzhi He, Haowen Zhou, Yubei Chen et al.
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Xin Zhou, DINGKANG LIANG, Sifan Tu et al.
Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting
Zhaojie Zeng, Yuesong Wang, Chao Yang et al.
NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement
Yang Yang, Dongni Mao, Hiroaki Santo et al.
Stochastic Gradient Estimation for Higher-Order Differentiable Rendering
Zican Wang, Michael Fischer, Tobias Ritschel
Uncertainty-Aware Diffusion-Guided Refinement of 3D Scenes
Sarosij Bose, Arindam Dutta, Sayak Nag et al.
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models
YIWEN CHEN, Hieu Nguyen, Vikram Voleti et al.
Hi-Gaussian: Hierarchical Gaussians under Normalized Spherical Projection for Single-View 3D Reconstruction
Binjian Xie, Pengju Zhang, Hao Wei et al.
Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation
Tiankai Chen, Yushu Li, Adam Goodge et al.
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
Junhao Ge, Zuhong Liu, Longteng Fan et al.
Lidar Waveforms are Worth 40x128x33 Words
Dominik Scheuble, Hanno Holzhüter, Steven Peters et al.
Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation
Soumyadipta Banerjee, Jiaul Paik, Debashis Sen
SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
Maximilian Pittner, Joel Janai, Mario Faigle et al.
Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes
Mengkun She, Felix Seegräber, David Nakath et al.
HVPUNet: Hybrid-Voxel Point-cloud Upsampling Network
Juhyung Ha, Vibhas Vats, Alimoor Reza et al.
Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment
Qingqian Yang, Peishen Yan, Xiaoyu Wu et al.
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu, Peijin Wang, Hanbo Bi et al.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He, Zi-Xin Zou, Chia Hao Chen et al.
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du, Hui Li, Han Xu et al.
Spatially-Varying Autofocus
Yingsi Qin, Aswin Sankaranarayanan, Matthew O'Toole
M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.
SU-RGS: Relightable 3D Gaussian Splatting from Sparse Views under Unconstrained Illuminations
Qi Zhang, Chi Huang, Qian Zhang et al.
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad, Maha Shadaydeh, Joachim Denzler
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.
Customizing Domain Adapters for Domain Generalization
Yuyang Ji, Zeyi Huang, Haohan Wang et al.
Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning
Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King et al.
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
Jerred Chen, Ronald Clark
Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
Zixuan Hu, Dongxiao Li, Xinzhu Ma et al.
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
Hoang Phan, Tung Lam Tran, Quyen Tran et al.
Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity
Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang, Qirui Chen, Cilin Yan et al.
Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition
Meiqi Cao, Xiangbo Shu, Xin Jiang et al.
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu et al.
CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization
Yihang Liu, Ying Wen, Longzhen Yang et al.
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
Yijun Liang, Shweta Bhardwaj, Tianyi Zhou
Advancing Textual Prompt Learning with Anchored Attributes
Zheng Li, Yibing Song, Ming-Ming Cheng et al.
Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection
Qi He, Xiao Wu, Jun-Yan He et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu et al.
Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction
Mang Cao, Sanping Zhou, Yizhe Li et al.
SITE: towards Spatial Intelligence Thorough Evaluation
Wenqi Wang, Reuben Tan, Pengyue Zhu et al.
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting
Hengyu Meng, Duotun Wang, Zhijing Shao et al.
Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
Yuan Wang, Yuxin Chen, Zhongang Qi et al.
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang, Zihan Jia, Zhilin Dai et al.
MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking
Han Han, Wei Zhai, Yang Cao et al.
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset
Ruofei WANG, Peiqi Duan, Boxin Shi et al.
Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision
Yuting He, Shuo Li
Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Lujun Li, Cheng Lin, Dezhi Li et al.
Dual-level Prototype Learning for Composite Degraded Image Restoration
Zhongze Wang, Haitao Zhao, Lujian Yao et al.
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang ZHAI, Jiajun Li, Yue Liu et al.
GReg: Geometry-Aware Region Refinement for Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang et al.
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu, Jinheng Xie, Meidan Ding et al.
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu, Daniel Fu, Noah Tan et al.
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.