Most Cited CVPR "active vision" Papers
5,589 papers found • Page 23 of 28
Conference
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
Kwan Yun, Seokhyeon Hong, Chaelin Kim et al.
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Antoine Guédon, Vincent Lepetit
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
Tom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
Jingbo Zhang, Xiaoyu Li, Qi Zhang et al.
CurveCloudNet: Processing Point Clouds with 1D Structure
Colton Stearns, Alex Fu, Jiateng Liu et al.
Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics
Tahira Kazimi, Ritika Allada, Pinar Yanardag
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim et al.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng, Chenhao Lin, Jiahao Sun et al.
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar, Yuliang Guo, Xinyu Huang et al.
Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB
Nikhil Behari, Aaron Young, Siddharth Somasundaram et al.
MoML: Online Meta Adaptation for 3D Human Motion Prediction
Xiaoning Sun, Huaijiang Sun, Bin Li et al.
Let's Verify and Reinforce Image Generation Step by Step
Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao et al.
Learning with Structural Labels for Learning with Noisy Labels
Noo-ri Kim, Jin-Seop Lee, Jee-Hyong Lee
EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis
Jiahe Li, Feiyu Wang, Xiaochao Qu et al.
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang, Xiaotong Zhai, Zhongkai Zhao et al.
Optical-Flow Guided Prompt Optimization for Coherent Video Generation
Hyelin Nam, Jaemin Kim, Dohun Lee et al.
Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation
Huyong Wang, Huisi Wu, Jing Qin
Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran et al.
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
Xiaowei Zhao, Xianglong Liu, Duorui Wang et al.
Detecting Open World Objects via Partial Attribute Assignment
Muli Yang, Gabriel James Goenawan, Huaiyuan Qin et al.
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan et al.
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
Max Gutbrod, David Rauber, Danilo Weber Nunes et al.
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Boyang Peng, Sanqing Qu, Yong Wu et al.
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou, Dingkang Liang, Wei Xu et al.
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual
Chong Wang, Lanqing Guo, Zixuan Fu et al.
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Rajiv Didolkar, Andrii Zadaianchuk, Rabiul Awal et al.
On Exact Inversion of DPM-Solvers
Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon et al.
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
Bin Fu, Fanghua Yu, Anran Liu et al.
A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals
Jiangnan Tang, Jingya Wang, Kaiyang Ji et al.
LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset
Haolin Liu, Chongjie Ye, Yinyu Nie et al.
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
Jeimin Jeon, Youngmin Oh, Junghyup Lee et al.
MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
Mohamed Abdelfattah, Mariam Hassan, Alex Alahi
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama et al.
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Ryan Burgert, Brian Price, Jason Kuen et al.
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis, Vincent Sitzmann, Matthias Nießner
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang, Shengju Qian, Bohao Peng et al.
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang, Sicheng Xu, Cassie Lee Dai et al.
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Qiyao Xue, Xiangyu Yin, Boyuan Yang et al.
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
Zhengyue Zhao, Jinhao Duan, Kaidi Xu et al.
NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng, Shijie Lin, Haobo Zuo et al.
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw, Seunghyun Yoon, Fabian Caba Heilbron et al.
Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
飞 叶, Adrian Bors
FADES: Fair Disentanglement with Sensitive Relevance
Taeuk Jang, Xiaoqian Wang
Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy
Gengyu Zhang, Hao Tang, Yan Yan
Improving Depth Completion via Depth Feature Upsampling
Yufei Wang, Ge Zhang, Shaoqian Wang et al.
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian, Jun Li, Jinpeng Wang et al.
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Haotian Wang, Yuzhe Weng, Yueyan Li et al.
Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii et al.
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić, Yannis Kalantidis, Jiri Matas et al.
MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM
Vladimir Yugay, Theo Gevers, Martin R. Oswald
MRFS: Mutually Reinforcing Image Fusion and Segmentation
HAO ZHANG, Xuhui Zuo, Jie Jiang et al.
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Zongsheng Yue, Kang Liao, Chen Change Loy
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed, Anna Kukleva, Bernt Schiele
3D-LFM: Lifting Foundation Model
Mosam Dabhi, László A. Jeni, Simon Lucey
DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
Yizhilv, Xiao Lu, Hong Ding et al.
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
Yuheng Xu, Shijie Yang, Xin Liu et al.
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du, Guangyao Li, Chang Zhou et al.
LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation
Ke Guo, Zhenwei Miao, Wei Jing et al.
EgoLife: Towards Egocentric Life Assistant
Jingkang Yang, Shuai Liu, Hongming Guo et al.
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki, Vasu Agrawal, Samuel Rota Bulò et al.
Cross-Modal 3D Representation with Multi-View Images and Point Clouds
Ziyang Zhou, Pinghui Wang, Zi Liang et al.
IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
Tai Ma, zhangsuwei, Jiafeng Li et al.
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge et al.
Style Aligned Image Generation via Shared Attention
Amir Hertz, Andrey Voynov, Shlomi Fruchter et al.
NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
Zhenggang Tang, Jason Ren, Xiaoming Zhao et al.
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Bangyan Liao, Zhenjun Zhao, Haoang Li et al.
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
Mingkun Lei, Xue Song, Beier Zhu et al.
Invisible Backdoor Attack against Self-supervised Learning
Hanrong Zhang, Zhenting Wang, Boheng Li et al.
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain et al.
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Fengyuan Shi, Jiaxi Gu, Hang Xu et al.
CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis
Youngkyoon Jang, Eduardo Pérez-Pellitero
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura, Yasunori Ishii, Takayoshi Yamashita
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao, Yalun Dai, Hao Li et al.
How to Train Neural Field Representations: A Comprehensive Study and Benchmark
Samuele Papa, Riccardo Valperga, David Knigge et al.
Interactive Medical Image Analysis with Concept-based Similarity Reasoning
Ta Duc Huy, Sen Kim Tran, Phan Nguyen et al.
Steepest Descent Density Control for Compact 3D Gaussian Splatting
Peihao Wang, Yuehao Wang, Dilin Wang et al.
Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression
Zhenqi Dai, Ting Liu, Yanning Zhang
Image Quality Assessment: From Human to Machine Preference
Chunyi Li, Yuan Tian, Xiaoyue Ling et al.
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao liang, Baoquan Zhang, Zhiyuan Wen et al.
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
Sirui Xu, Hung Yu Ling, Yu-Xiong Wang et al.
EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features
Xinran Yang, Donghao Ji, Yuanqi Li et al.
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu, Xuan Wang, Xiangyu Xu et al.
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu et al.
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
Fanqi Pu, Yifan Wang, Jiru Deng et al.
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou, Manli Tao, Chaoyang Zhao et al.
Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector
Yifu Ding, Weilun Feng, Chuyan Chen et al.
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
Long Tung Vuong, Hoang Phan, Vy Vo et al.
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
Yunqi Gu, Ian Huang, Jihyeon Je et al.
FREE: Faster and Better Data-Free Meta-Learning
Yongxian Wei, Zixuan Hu, Zhenyi Wang et al.
Controllable Human Image Generation with Personalized Multi-Garments
Yisol Choi, Sangkyung Kwak, Sihyun Yu et al.
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis, Judith Fan, Yulia Gryaditskaya
Reconstructing Animals and the Wild
Peter Kulits, Michael J. Black, Silvia Zuffi
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang, Hanpeng Liu, Stephen Lin et al.
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin et al.
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
Chuangchuang Tan, Huan Liu, Yao Zhao et al.
Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception
Haoming Chen, Zhizhong Zhang, Yanyun Qu et al.
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi et al.
BoQ: A Place is Worth a Bag of Learnable Queries
Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
Chen-Chen Zong, Sheng-Jun Huang
UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing
Xiaoyang Wang, Hongping Gan
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Haoyi Jiang, Tianheng Cheng, Naiyu Gao et al.
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Wenbo Hu, Xiangjun Gao, Xiaoyu Li et al.
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation
Dekai Zhu, Yan Di, Stefan Gavranovic et al.
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, IYYAKUTTI IYAPPAN GANAPATHI et al.
Investigating the Role of Weight Decay in Enhancing Nonconvex SGD
Tao Sun, Yuhao Huang, Li Shen et al.
ActiveGAMER: Active GAussian Mapping through Efficient Rendering
Liyan Chen, Huangying Zhan, Kevin Chen et al.
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
Qiang Zou, Shuli Cheng, Jiayi Chen
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang, Zhimao Peng, Zhengyuan Xie et al.
MaskPLAN: Masked Generative Layout Planning from Partial Input
Hang Zhang, Anton Savov, Benjamin Dillenburger
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
Jinyang Liu, Wondmgezahu Teshome, Sandesh Ghimire et al.
Exploration-Driven Generative Interactive Environments
Nedko Savov, Naser Kazemi, Mohammad Mahdi et al.
Towards Memorization-Free Diffusion Models
Chen Chen, Daochang Liu, Chang Xu
Extreme Rotation Estimation in the Wild
Hana Bezalel, Dotan Ankri, Ruojin Cai et al.
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar et al.
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min, Yawei Luo, Wei Yang et al.
A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
Hanshi Wang, Zhipeng Zhang, Jin Gao et al.
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen et al.
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao, Haiping Wu, Weijian Xu et al.
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.
DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
3D Feature Tracking via Event Camera
Siqi Li, Zhou Zhikuan, Zhou Xue et al.
Frequency-aware Event-based Video Deblurring for Real-World Motion Blur
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon
FedHCA2: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu, Suizhi Huang, Yuwen Yang et al.
Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An, Yewen Li, Xu He et al.
Condensing Action Segmentation Datasets via Generative Network Inversion
Guodong Ding, Rongyu Chen, Angela Yao
Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality
Liyan Chen, Gregory P. Meyer, Zaiwei Zhang et al.
Global Latent Neural Rendering
Thomas Tanay, Matteo Maggioni
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
Erik Wallin, Fredrik Kahl, Lars Hammarstrand
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu et al.
Motion Prompting: Controlling Video Generation with Motion Trajectories
Daniel Geng, Charles Herrmann, Junhwa Hur et al.
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu, Runyu He, Gangshan Wu et al.
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Jiequan Cui, Beier Zhu, Xin Wen et al.
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu et al.
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang, Fei Pan, Junmo Kim et al.
Volumetrically Consistent 3D Gaussian Rasterization
Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi et al.
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng et al.
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang, Songpengcheng Xia, Lei Chu et al.
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Kangwei Yan, Fei Wang, Bo Qian et al.
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective
Chen Zhao, Zhizhou Chen, Yunzhe Xu et al.
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang, Kun Yang, Yilei Wang et al.
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau, Sharon Fogel, Roi Ronen et al.
FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling
Hang Ye, Xiaoxuan Ma, Hai Ci et al.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu, Juncheng Li, Longhui Wei et al.
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang et al.
Progressive Focused Transformer for Single Image Super-Resolution
Wei Long, Xingyu Zhou, Leheng Zhang et al.
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
Yunpeng Qu, Kun Yuan, Qizhi Xie et al.
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin, Haoli Bai, Zhili Liu et al.
Efficient Personalization of Quantized Diffusion Model without Backpropagation
Hoigi Seo, Wongi Jeong, Kyungryeol Lee et al.
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models
German Barquero, Nadine Bertsch, Manojkumar Marramreddy et al.
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan, Hansheng Chen, Wei Tian et al.
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang, Yeda Song, Sungryull Sohn et al.
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
Haitao Wu, Qing Li, Changqing Zhang et al.
Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images
WEI SHAO, YangYang Shi, Daoqiang Zhang et al.
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
Guangyang Wu, Xin Tao, Changlin Li et al.
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang, Xing Nie, Tong Li et al.
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou, Yutong Ye, Pengyu Zhang et al.
Fooling Polarization-Based Vision using Locally Controllable Polarizing Projection
Zhuoxiao Li, Zhihang Zhong, Shohei Nobuhara et al.
Affine Equivariant Networks Based on Differential Invariants
Yikang Li, Yeqing Qiu, Yuxuan Chen et al.
Diffusion-based Blind Text Image Super-Resolution
Yuzhe Zhang, jiawei zhang, Hao Li et al.
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
Yapeng Li, Yong Luo, Zengmao Wang et al.
Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
Dae Jun Kang, Dongsuk Kum, Sanmin Kim
FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models
Ao Luo, XIN LI, Fan Yang et al.
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian, Shaofei Wang, Marko Mihajlovic et al.
Free Lunch Enhancements for Multi-modal Crowd Counting
Haoliang Meng, Xiaopeng Hong, Zhengqin Lai et al.
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu, Chenshu Xu, Yifei Yang et al.
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
Xintian Mao, Xiwen Gao, Yan Wang
Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng, Pan Gao, Peng Zhou et al.
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang, Lei Jin, Zheng Wang et al.
Building Vision-Language Models on Solid Foundations with Masked Distillation
Sepehr Sameni, Kushal Kafle, Hao Tan et al.
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang, Tongkun Guan, Pei Fu et al.
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao, Yifan Sun, Wenhao Wang et al.
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
Alex Hanson, Allen Tu, Vasu Singla et al.
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.
Test-Time Backdoor Detection for Object Detection Models
Hangtao Zhang, Yichen Wang, Shihui Yan et al.
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Pengchong Qiao, Lei Shang, Chang Liu et al.
Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
Ye Chen, Bingbing Ni, Jinfan Liu et al.
OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang et al.
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Chenyu Yang, Xuan Dong, Xizhou Zhu et al.
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou, Zichong Chen, Hui Huang
Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
Jianan Li, Qiulei Dong
Continual SFT Matches Multimodal RLHF with Negative Supervision
Ke Zhu, Yu Wang, Yanpeng Sun et al.
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi et al.
MultiMorph: On-demand Atlas Construction
Mazdak Abulnaga, Andrew Hoopes, Neel Dey et al.
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking
Phuc Nguyen, Minh Luu, Anh Tran et al.
DarkIR: Robust Low-Light Image Restoration
Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Yunshi HUANG, Fereshteh Shakeri, Jose Dolz et al.
ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-based Few-shot Learning
Haoyuan Yang, Xiaoou Li, Jiaming Lv et al.
1-Lipschitz Layers Compared: Memory Speed and Certifiable Robustness
Bernd Prach, Fabio Brau, Giorgio Buttazzo et al.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
Liang Pan, Zeshi Yang, Zhiyang Dou et al.
DepthSplat: Connecting Gaussian Splatting and Depth
Haofei Xu, Songyou Peng, Fangjinhua Wang et al.
Towards Optimizing Large-Scale Multi-Graph Matching in Bioimaging
Max Kahl, Sebastian Stricker, Lisa Hutschenreiter et al.
PoNQ: a Neural QEM-based Mesh Representation
Nissim Maruani, Maks Ovsjanikov, Pierre Alliez et al.
M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu, Liwen Wang, Jiewen Yang et al.